[U-Boot] [PATCH V2 1/3] memcpy: copy one word at a time if possible
Peter Tyser
ptyser at xes-inc.com
Thu Oct 8 21:09:48 CEST 2009
On Thu, 2009-10-08 at 20:23 +0200, Alessandro Rubini wrote:
> >> That's true, but I think the most important case is lcd scrolling,
> >> where it's usually a big power of two -- that's where we had the #ifdef,
> >> so the problem was known, I suppose.
> >
> > I think the most important case for *you* is lcd scrolling, but for 99%
> > of everyone else, it isn't at all:)
>
> Well, its a big memcpy, and it has direct effect on the user. Every
> other copy is smaller, or has no interactive value.
>
> > memcpy() and memset() are used 100 times more often in non-lcd
> > related code and most boards don't even have LCDs.
>
> That's true. But it's only a boot loader (I just looked at what Nicolas
> Pitre did in the kernel for ARM strcpy and, well....).
>
> So I made some measures (it's one of Pike's rules of programming:
>
> * Rule 2. Measure. Don't tune for speed until you've measured, and even
> then don't unless one part of the code overwhelms the rest.
>
> )
>
> I booted in u-boot, typed "setenv stdout serial" then "boot", which goes
> over the ethernet. Stopped the system after u-boot gave over control to
> the kernel. Result: 10412 memcopies so divided (number, length):
>
> 3941 4
> 1583 6
> 772 20
> 1 46
> 1 47
> 3 60
> 1024 64
> 1 815
> 1 888
> 770 1148
> 1543 1480
> 1 2283
> 1 3836
> 770 4096
>
> So I dare say non-power-of-4 is a minority anyways: 1587 calls, 12689 bytes.
> i.e. 15.2% of the calls and 0.2% of the data.
The statistics are going to be very different for different scenarios.
For example, network operations seem to be the majority of your large
memcpys, this isn't the case for everyone. If/when U-Boot runs on
64-bit hardware, the stats will change too.
In any case, my only suggestion would be that if we're improving
memcpy()/memset(), do the extra 10% of effort required to make them a
little better. That 10% of effort will improve 15.2% of all memcpy()
calls for the foreseeable future:)
I honestly don't care much what implementation you choose as I only
currently use PPC, which has their own memcpy()/memset(). I'm only
trying to be helpful, feel free to proceed however you see fit, I
promise I won't comment on future patches:)
Best,
Peter
More information about the U-Boot
mailing list