[U-Boot] [PATCH 2/3] arm: relocation: clear .bss section with arch memset if defined

Pantelis Antoniou panto at antoniou-consulting.com
Mon Feb 2 18:28:14 CET 2015


Hi Tom,

> On Feb 2, 2015, at 19:25 , Tom Rini <trini at ti.com> wrote:
> 
> On Sun, Feb 01, 2015 at 03:38:42AM +0100, Albert ARIBAUD wrote:
>> Hello Przemyslaw,
>> 
>> On Wed, 28 Jan 2015 13:55:42 +0100, Przemyslaw Marczak
>> <p.marczak at samsung.com> wrote:
>>> For ARM architecture, enable the CONFIG_USE_ARCH_MEMSET/MEMCPY,
>>> will highly increase the memset/memcpy performance. This is able
>>> thanks to the ARM multiple register instructions.
>>> 
>>> Unfortunatelly the relocation is done without the cache enabled,
>>> so it takes some time, but zeroing the BSS memory takes much more
>>> longer, especially for the configs with big static buffers.
>>> 
>>> A quick test confirms, that the boot time improvement after using
>>> the arch memcpy for relocation has no significant meaning.
>>> The same test confirms that enable the memset for zeroing BSS,
>>> reduces the boot time.
>>> 
>>> So this patch enables the arch memset for zeroing the BSS after
>>> the relocation process. For ARM boards, this can be enabled
>>> in board configs by defining: 'CONFIG_USE_ARCH_MEMSET'.
>> 
>> Since the issue is that zeroing is done one word at a time, could we
>> not simply clear r3 as well as r2 (possibly even r4 and r5 too) and do
>> a double (possibly quadruple) write loop? That would avoid calling a
>> libc routine from the almost sole file in U-Boot where a C environment
>> is not necessarily granted.
> 
> So this brings up something I've wondered about for a long while.  We
> have arch/arm/lib/mem{set,cpy}.S which are old copies from the linux
> kernel.  The kernel uses them for all ARM platforms.  Why do we not
> always use these functions?  I have a very vague notion it was a size
> thing…
> 

That is a good question. Are we being hobbled cause of MLO? If so we can
use the short (and slow) methods in that case and use the fast methods
in the normal case. It seems that this is warranted in this case.

However in the particular case of dfu I think it’s best to avoid the large
static buffers. Or if we do use the large buffers let’s put them in a
linker segment that does not get zeroed on start.

> -- 
> Tom

Regards

— Pantelis



More information about the U-Boot mailing list