[U-Boot] [PATCH] uboot optimize memmove
Andy Green
andy.green at linaro.org
Mon Jul 29 01:39:08 CEST 2013
Dear Wolfgang Denk,
On 27 July 2013 03:06, Wolfgang Denk <wd at denx.de> wrote:
> Dear Andy Green,
>
> In message <CAAfg0W7p6rvvaMqsKsC09yQnfPod08YDTbeh_MQrz6n+B4iyYA at mail.gmail.com> you wrote:
>>
>> > Instead of making assumptions on the performance of memcpy() and
>>
>> As I wrote, I measured the performance and got a very big gain, it's
>> 3x faster on my setup to use memcpy() then default memmove().
>
> Yes, in your single test case of copying a Linux kernel image,
> i. e. a multi-megabyte file.
That "single test case" of copying a "multi-megabyte file" is the
reason d'etre of a bootloader...if it doesn't perform well doing that
then it's a problem.
However I agree the alternative arch-specific implementation for ARM
is pretty good, so this is kind of moot.
Since I traced my problem to here I "fixed" it there, but that actual
problem was were using the default implementations at all (the config
was inherited)..
>> By calling that an "assumption" you're saying that there exist
>> platforms where 32-bit linear memmove() is slower than doing it with
>> 8-bit actions?
>
> No. I said you should not assume that memcpy() is always faster than
> memmove(); a system may use optiomized versions of either.
I did not assume that, I looked at your code for both and saw, and
proved, that using 32-bit operations for mass move actions is going to
perform better than using 8-bit operations.
That's not something you can write off as an "assumption", it's a fact.
>> > adding the overhead of an additional function call (which can be
>> > expensive especially for short copy operations) it would make more
>>
>> I am not sure U-Boot is really in the business of doing small
>> memmoves, but okay...
>
> It's easy to avoid this overhead, and also get rid of the
> restrictions you built into it (otimizong only the non-overlapping
> case), so if we touch that code, we should do it right.
Given that code should perferably never be used, maybe it should print
a warning like "Using default memory ops" and leave it like it is.
The problem is not correctness just inefficiency.
>> > sense to pull the "copy a word at a time" code from memcpy() into
>> > memmove(), too.
>> >
>> > On the other hand - if you really care about performance, then why do
>>
>> I spent several hours figuring out why our NOR boot performance was
>> terrible.... it's because this default memmove code is gloriously
>> inefficient for all cases.
>>
>> If you like it like that, no worries.
>
> Don't twist my words. I asked for a different, better implementation,
> that's all.
Unfortunately I'm only looking at it because it made trouble, and we
now no longer use that code.
For the use-case I'm studying (very fast overall boot) it still may
make sense to implement the NEON stuff in which case I'll offer a
patch for that.
-Andy
>
> Best regards,
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
> The only time the world beats a path to your door is when you are in
> the bathroom.
More information about the U-Boot
mailing list