[PATCH] string: Use memcpy() within memmove() when we can

Patrick DELAUNAY patrick.delaunay at foss.st.com
Thu Dec 10 13:04:55 CET 2020


Hi Tom,

On 12/10/20 12:57 PM, Patrick DELAUNAY wrote:
> From: Tom Rini <trini at konsulko.com>
> Sent: mardi 8 décembre 2020 17:11
>
> On Wed, Nov 25, 2020 at 01:07:43PM +0100, Rasmus Villemoes wrote:
>> On 25/11/2020 12.26, Patrick Delaunay wrote:
>>> A common use of memmove() can be handled by memcpy(). Also memcpy()
>>> includes an optimization for large sizes: it copies a word at a
>>> time. So we can get a speed-up by calling memcpy() to handle our 
>>> move in this case.
>>>
>>> Update memmove() to call also memcpy() if the source don't overlap
>>> the destination (src + count <= dest).
>>>
>>> Signed-off-by: Patrick Delaunay <patrick.delaunay at st.com>
>>> ---
>>> This patch allows to save 38ms for Kernel Image extraction (7327624
>>> Bytes) from FIT loaded at 0xC2000000 for ARMV7 board
>>> STM32MP157C-EV1, and with kernel destination = Load Address:
>>> 0xc4000000, located after the FIT without overlap, compared with
>>> destination = Load Address: 0xc0008000.
>>>
>>> -> 14,332 us vs 52,239 in bootstage report
>>>
>>> In this case the memmove funtion is called in
>>> common/image.c::memmove_wd() to handle overlap.
>>>
>>>
>>> lib/string.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/lib/string.c b/lib/string.c index
>>> ae7835f600..ef8ead976c 100644
>>> --- a/lib/string.c
>>> +++ b/lib/string.c
>>> @@ -567,7 +567,7 @@ void * memmove(void * dest,const void
>>> *src,size_t count) {
>>> char *tmp, *s;
>>> - if (dest <= src) {
>>> + if (dest <= src || (src + count) <= dest) {
>>> memcpy(dest, src, count);
>> Hm. So the condition you add (src + count <= dest) implies no overlap
>> at all, so that's ok. So this doesn't really have anything to do with
>> your patch per se.
>>
>> The existing condition relies on memcpy doing forward-copying. That's
>> the case if the implementation from lib/string.c is in use, i.e. if
>> __HAVE_ARCH_MEMCPY is not defined. And if an arch defines
>> __HAVE_ARCH_MEMMOVE, this memmove() is not used.
>>
>> But AFAICT, there's a potential problem for the case where
>> __HAVE_ARCH_MEMCPY is defined but __HAVE_ARCH_MEMMOVE is not, and e.g.
>> arch/arm/include/asm/string.h does
>>
>> #if CONFIG_IS_ENABLED(USE_ARCH_MEMCPY)
>> #define __HAVE_ARCH_MEMCPY
>> #endif
>> #undef __HAVE_ARCH_MEMMOVE
>>
>> Of course, the arch-specific implementation _may_ also do
>> forward-copying (I haven't tried to decipher it), but that seems to be
>> a rather fragile assumption. At the very least, some comments would be
>> in order.
> Looking at this deeper, today, ARM (non-64bit) can and usually but not 
> always does use the asm optimized memcpy / memset. No other optimized 
> functions were copied from Linux, and no other arches seem to use them 
> today either. I think in sum then, Patrick can you please do a v2 that 
> adds a comment here, in case we get more optimizations in the future?
> And thanks for the review here Rasmus!

I am preparing a V2 with added coment.

Patrick.



More information about the U-Boot mailing list