[U-Boot] [PATCH 2/3] arm: relocation: clear .bss section with arch memset if defined

Mon Feb 2 18:36:38 CET 2015

On Mon, Feb 02, 2015 at 07:28:14PM +0200, Pantelis Antoniou wrote:
> Hi Tom,
> 
> > On Feb 2, 2015, at 19:25 , Tom Rini <trini at ti.com> wrote:
> > 
> > On Sun, Feb 01, 2015 at 03:38:42AM +0100, Albert ARIBAUD wrote:
> >> Hello Przemyslaw,
> >> 
> >> On Wed, 28 Jan 2015 13:55:42 +0100, Przemyslaw Marczak
> >> <p.marczak at samsung.com> wrote:
> >>> For ARM architecture, enable the CONFIG_USE_ARCH_MEMSET/MEMCPY,
> >>> will highly increase the memset/memcpy performance. This is able
> >>> thanks to the ARM multiple register instructions.
> >>> 
> >>> Unfortunatelly the relocation is done without the cache enabled,
> >>> so it takes some time, but zeroing the BSS memory takes much more
> >>> longer, especially for the configs with big static buffers.
> >>> 
> >>> A quick test confirms, that the boot time improvement after using
> >>> the arch memcpy for relocation has no significant meaning.
> >>> The same test confirms that enable the memset for zeroing BSS,
> >>> reduces the boot time.
> >>> 
> >>> So this patch enables the arch memset for zeroing the BSS after
> >>> the relocation process. For ARM boards, this can be enabled
> >>> in board configs by defining: 'CONFIG_USE_ARCH_MEMSET'.
> >> 
> >> Since the issue is that zeroing is done one word at a time, could we
> >> not simply clear r3 as well as r2 (possibly even r4 and r5 too) and do
> >> a double (possibly quadruple) write loop? That would avoid calling a
> >> libc routine from the almost sole file in U-Boot where a C environment
> >> is not necessarily granted.
> > 
> > So this brings up something I've wondered about for a long while.  We
> > have arch/arm/lib/mem{set,cpy}.S which are old copies from the linux
> > kernel.  The kernel uses them for all ARM platforms.  Why do we not
> > always use these functions?  I have a very vague notion it was a size
> > thing…
> 
> That is a good question. Are we being hobbled cause of MLO? If so we can
> use the short (and slow) methods in that case and use the fast methods
> in the normal case. It seems that this is warranted in this case.

I'm not sure, but I can test easily enough.  But even then we may want
to opt a few targets in to the current (slow) path and make the default
the optimized path.

> However in the particular case of dfu I think it’s best to avoid the large
> static buffers. Or if we do use the large buffers let’s put them in a
> linker segment that does not get zeroed on start.

Yes, I owe the rest of the series my attention too :)

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20150202/a2cc4b22/attachment.sig>