[U-Boot] [PATCH 2/3] string: Provide a slimmed-down memset()

Mon Mar 27 21:16:45 UTC 2017

On 27/03/2017 17:17, Heiko Stuebner wrote:
> Am Montag, 27. März 2017, 09:14:47 CEST schrieb Alexander Graf:
>>
>> On 27/03/2017 01:38, Simon Glass wrote:
>>> Most of the time the optimised memset() is what we want. For extreme
>>> situations such as TPL it may be too large. For example on the 'rock'
>>> board, using a simple loop saves a useful 48 bytes. With gcc 4.9 and
>>> the rodata bug, this patch is enough to reduce the TPL image below the
>>> limit.
>>>
>>> Signed-off-by: Simon Glass <sjg at chromium.org>
>>> ---
>>>
>>>  lib/Kconfig  | 9 +++++++++
>>>  lib/string.c | 6 ++++--
>>>  2 files changed, 13 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/lib/Kconfig b/lib/Kconfig
>>> index 65c01573e1..5bf512d8c0 100644
>>> --- a/lib/Kconfig
>>> +++ b/lib/Kconfig
>>> @@ -52,6 +52,15 @@ config LIB_RAND
>>>  	help
>>>  	  This library provides pseudo-random number generator functions.
>>>
>>> +config FAST_MEMSET
>>> +	bool "Use an optimised memset()"
>>> +	default y
>>> +	help
>>> +	  The faster memset() is the arch-specific one (if available) enabled
>>> +	  by CONFIG_USE_ARCH_MEMSET. If that is not enabled, we can still get
>>> +	  better performance by write a word at a time. Disable this option
>>> +	  to reduce code size slightly at the cost of some speed.
>>
>> The comment sounds slightly confused - it took me a few times of reading
>> it until I grasped what it was trying to tell me :).
>>
>>> +
>>>  source lib/dhry/Kconfig
>>>
>>>  source lib/rsa/Kconfig
>>> diff --git a/lib/string.c b/lib/string.c
>>> index 67d5f6a421..159493ed17 100644
>>> --- a/lib/string.c
>>> +++ b/lib/string.c
>>> @@ -437,8 +437,10 @@ char *strswab(const char *s)
>>>  void * memset(void * s,int c,size_t count)
>>>  {
>>>  	unsigned long *sl = (unsigned long *) s;
>>> -	unsigned long cl = 0;
>>>  	char *s8;
>>> +
>>> +#ifdef CONFIG_FAST_MEMSET
>>> +	unsigned long cl = 0;
>>>  	int i;
>>>
>>>  	/* do it one word at a time (32 bits or 64 bits) while possible */
>>> @@ -452,7 +454,7 @@ void * memset(void * s,int c,size_t count)
>>>  			count -= sizeof(*sl);
>>>  		}
>>>  	}
>>> -	/* fill 8 bits at a time */
>>> +#endif	/* fill 8 bits at a time */
>>
>> So while this is all neat, a few ideas:
>>
>> 1) Would having memset in a header improve things even more? After all,
>> each external function call clobbers registers that you need to
>> save/restore...
>
> I'd guess it really depends on the size constraints. The regular
> libgeneric memset compiles on my rk3188 tpl to a total of
> 64bytes on both gcc-4.9 and gcc-6.3 while Simon's fast-memset
> comes down to 14bytes on my rk3188.
>
> On the rk3188 the only memset user is board_init_f, so here memset
> is called only once without needing to save registers and I'd guess if an
> implementation really is that size-constrained to worry about 50bytes
> this one caller will probably always be the only one?

I'm not sure I follow. If you put it into a header, the compiler has a 
better chance of evicting untaken code paths and optimize register usage 
over object linked variants (unless you use GOLD). I was mostly 
wondering whether that would already give you the savings without 
introducing a complicated #ifdef that is going to bitrot over time :).

I'm just slightly worried about the massive number of preprocessor 
excludes that happen in U-Boot in general. It seems like something 
that's really hard to ever have full testing coverage on.

>> 2) How much would GOLD save you? Have you tried? U-Boot is small enough
>> of a code base that global optimizations should be able to give
>> significant size savings.
>
> I think the issue that this is trying to solve is to allow more
> toolchains to be used and thus make rebuilds on changes work on a lot
> of boards at the same time with random toolchains.
>
> gcc-6.3 already produces way smaller results (well within the size
> constraints the rk3188 has) than for example the gcc-4.9 used by
> buildman as baseline toolchain.

Ah, I see. So 4.9 does not have -lto? There's a good chance my gut 
feeling that GOLD actually saves anything is wrong - I don't know. Has 
anyone done the numbers? Then we would have something to actually base 
gut feeling on.

Size is always a serious constraint in U-Boot, especially in SPL 
environments. If we can include one more tool in our portfolio to 
optimize size across the board, I'm all for it. This patch just feels 
slightly short-term - but I'm definitely not nack'ing it :).

Alex