[U-Boot] [PATCH] uboot optimize memmove

Andy Green andy.green at linaro.org
Mon Jul 29 01:39:08 CEST 2013


Dear Wolfgang Denk,

On 27 July 2013 03:06, Wolfgang Denk <wd at denx.de> wrote:
> Dear Andy Green,
>
> In message <CAAfg0W7p6rvvaMqsKsC09yQnfPod08YDTbeh_MQrz6n+B4iyYA at mail.gmail.com> you wrote:
>>
>> > Instead of making assumptions on the performance of memcpy() and
>>
>> As I wrote, I measured the performance and got a very big gain, it's
>> 3x faster on my setup to use memcpy() then default memmove().
>
> Yes, in your single test case of copying a Linux kernel image,
> i. e. a multi-megabyte file.

That "single test case" of copying a "multi-megabyte file" is the
reason d'etre of a bootloader...if it doesn't perform well doing that
then it's a problem.

However I agree the alternative arch-specific implementation for ARM
is pretty good, so this is kind of moot.

Since I traced my problem to here I "fixed" it there, but that actual
problem was were using the default implementations at all (the config
was inherited)..

>> By calling that an "assumption" you're saying that there exist
>> platforms where 32-bit linear memmove() is slower than doing it with
>> 8-bit actions?
>
> No.  I said you should not assume that memcpy() is always faster than
> memmove(); a system may use optiomized versions of either.

I did not assume that, I looked at your code for both and saw, and
proved, that using 32-bit operations for mass move actions is going to
perform better than using 8-bit operations.

That's not something you can write off as an "assumption", it's a fact.

>> > adding the overhead of an additional function call (which can be
>> > expensive especially for short copy operations) it would make more
>>
>> I am not sure U-Boot is really in the business of doing small
>> memmoves, but okay...
>
> It's easy to avoid this overhead, and also get rid of the
> restrictions you built into it (otimizong only the non-overlapping
> case), so if we touch that code, we should do it right.

Given that code should perferably never be used, maybe it should print
a warning like "Using default memory ops" and leave it like it is.
The problem is not correctness just inefficiency.

>> > sense to pull the "copy a word at a time" code from memcpy() into
>> > memmove(), too.
>> >
>> > On the other hand - if you really care about performance, then why do
>>
>> I spent several hours figuring out why our NOR boot performance was
>> terrible.... it's because this default memmove code is gloriously
>> inefficient for all cases.
>>
>> If you like it like that, no worries.
>
> Don't twist my words.  I asked for a different, better implementation,
> that's all.

Unfortunately I'm only looking at it because it made trouble, and we
now no longer use that code.

For the use-case I'm studying (very fast overall boot) it still may
make sense to implement the NEON stuff in which case I'll offer a
patch for that.

-Andy

>
> Best regards,
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
> The only time the world beats a path to your door is when you are  in
> the bathroom.


More information about the U-Boot mailing list