[U-Boot] [PATCH 0/3] arm: reduce .bss section clear time

Przemyslaw Marczak p.marczak at samsung.com
Thu Jan 29 17:48:02 CET 2015


Hello,

On 01/28/2015 01:55 PM, Przemyslaw Marczak wrote:
> This patchset reduces the boot time for ARM architecture,
> Exynos boards, and boards with DFU enabled(ARM).
>
> For tested Trats2 device, this was done in three steps.
>
> First was enable the arch memcpy and memset.
> The second step was enable memset for .bss clear.
> The third step for reduce this operation is to keep .bss section
> small as possible.
>
> The .bss section will grow if we have a lot of static variables.
> This section is cleared before jump to the relocated U-Boot,
> and it's done word by word. To reduce the time for this step,
> we can enable arch memset, which uses multiple ARM registers.
>
> For configs with DFU enabled, we can find the dfu buffer in this section,
> which has at least 8MB (32MB for trats2). This is a lot of useless data,
> which is not required for standard boot. So this buffer should be dynamic
> allocated.
>
> Przemyslaw Marczak (3):
>    exynos: config: enable arch memcpy and arch memset
>    arm: relocation: clear .bss section with arch memset if defined
>    dfu: mmc: file buffer: remove static allocation
>
>   arch/arm/lib/crt0.S             | 10 +++++++++-
>   drivers/dfu/dfu_mmc.c           | 25 ++++++++++++++++++++++---
>   include/configs/exynos-common.h |  3 +++
>   3 files changed, 34 insertions(+), 4 deletions(-)
>

So I made some additional tests with the oscilloscope.

Quick about the measurement:
The board is Odroid X2; Exynos4412(this one have gpio header).

Time is measured between change the state of one GPIo pin.

GPIO HI - set the gpio register in "reset" label in: 
arch/arm/cpu/armv7/start.S
GPIO LO - set gpio register with "bootcmd" with setting register by 
"mw.l ..."

${bootdelay}=0

odroid_defconfig  = .bss ~32.3MB

I tested few changes:
- 850ms - no changes:
- 840ms - + CONFIG_USE_ARCH_MEMCPY/MEMSET
- 540ms - .bss memset (patch 2)
- 210ms - dynamic allocation dfu file buf (patch 3)

And the next is interesting.
  odroid_defconfig has more than 80MB for malloc (we need about 64mb for 
the DFU now, to be able write 32MB file).

This is the CONFIG_SYS_MALLOC_LEN. And the memory area for malloc is set 
to 0 in function mem_malloc_init(). So for this config that function
sets more than 80MB to zero.

This is not good, because we shouldn't expect zeroed memory returned by 
malloc pointer. This is a job for calloc.

Especially if some command expects zeroed memory after malloc, probably 
after few next calls - it can crash...

For the testing purposes I changed the memset area in mem_malloc_init().
The CONFIG_SYS_MALLOC_LEN is unchanged, so the dfu can still alloc 2x32MB...

The results:
- 158ms - malloc memset len: 40MB
- 109ms - malloc memset len:  1MB

And a quick test for Trats2 with trace clock cycle counter:
- 333ms - malloc memset len:  1MB (for the standard config it was more 
than 1520ms)

The malloc memset can't be removed now, because it requires check/change 
to calloc a lot of calls, but the board can boot if I set this to 256K.

So the final improvement which could be achieved for the odroid config 
is 850ms -> 109 ms. This is about 8 times faster.

And the tested boards difference:
- Trats2 - 800MHz
- Odroid X2 - 1000MHz
- different BL1/BL2

Now I'm not so sure about the measurement reliability using the trace.

The Trats2 has no gpios header, and now I don't have time for the 
combinations.

So enable the DFU in the board config will increase the boot time.
But the real reason is that the malloc memory area is set to zero on boot.

I think, that we should follow the malloc/calloc/realloc differences 
like in this description: http://man7.org/linux/man-pages/man3/malloc.3.html

Now I go for some holidays, and probably I will be unreachable until 
9-th February. Sorry for troubles.

Best regards,
-- 
Przemyslaw Marczak
Samsung R&D Institute Poland
Samsung Electronics
p.marczak at samsung.com


More information about the U-Boot mailing list