[U-Boot] boot-up time optimization. Where to start?

Wolfgang Denk wd at denx.de
Thu May 5 09:27:49 CEST 2011


Dear Alexander Stein,

In message <201105050906.35834.alexander.stein at systec-electronic.com> you wrote:
> 
> > Are you also still using the old environment code in your port, or is
> > the new, hash table based one?  When using the old code, there are
> > additional penalties for using a needlessly big environment as each
> > call to setenv() will recalculate the checksum.
> 
> I was digging into this problem for a short time. And yes, the CRC 
> checksumcalculation takes about 25ms each run. So setenv is called for each 
> stdin,stdout and stderr. which sums up to ~75ms.
> So you're right this is the old environment code. Here a dcache will speed up 
> the execution of course.

Even more so would reducing the environment size to some reasonable
value. Currently you are using some 2 KiB, so say you set the
environment size to 8 KiB. This would be 1/16 of your current size,
which means the ~75ms would shrink to less than 5 ms.  You are wasting
70 ms (only here - there are other places which will add to this
figure) just because this inappropriate configuration.

> But our standard startup just stars U-Boot and copies the Linux kernel into 
> RAM and starts it. There is not much use of dcache during copy here.

You are wrong. There is a huge difference between perrforming a copy
operation in single write cycles to uncached RAM versus writing to a
cached area where the cache flushes willoperate in burst mode. Also,
the U-Boot code will run faster, too, so copying and decompression is
much faster.


You repeat the same mistake again: you make assumptions about  what
may or may not be  fast or slow on your system without actually
measuring it.  Donald Knuth is right again: "Early optimization is
the root of much evil."


> > > It is using a 32-Bit RAM-Bus. So, no.
> > 
> > And your NOR flash?
> 
> It is connected 16-bit like most devices only support, but it is setup to use 
> page read mode.

Well, many systems use two 16 bit chips in parallel to give a 32 bit
bus.

> > DC of makes things awfully slow.  See comments of commits c3330e9,
> > 95c6f6d and 7e4a9e6 - for plain RAM bound operations like
> > copying/uncompressing an image from RAM to RAM switchign on the DC can
> > accelerate the system by a factor of up to >15.
> 
> Yes, from RAM to RAM, dcache will help a lot. But we neither copy from RAM to 
> RAM nor do we uncompressing.

There is still a huge diference in memory bandwith between using plain
single write cycles versus burst mode accesses.

Don't speculate.  Measure yourself!


Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
"Send lawyers, guns and money..."  - Lyrics from a Warren Zevon song


More information about the U-Boot mailing list