[U-Boot] ARM - cache and alignment

Tue Jan 17 10:54:44 CET 2017

On 01/17/2017 10:51 AM, Jean-Jacques Hiblot wrote:
> 
> 
> On 17/01/2017 10:38, Marek Vasut wrote:
>> On 01/17/2017 10:35 AM, Jean-Jacques Hiblot wrote:
>>>
>>> On 17/01/2017 10:15, Marek Vasut wrote:
>>>> On 01/17/2017 10:08 AM, Jean-Jacques Hiblot wrote:
>>>>> On 16/01/2017 20:33, Marek Vasut wrote:
>>>>>> On 01/16/2017 08:16 PM, Jean-Jacques Hiblot wrote:
>>>>>>> On 16/01/2017 17:00, Marek Vasut wrote:
>>>>>>>> On 01/16/2017 02:29 PM, Jean-Jacques Hiblot wrote:
>>>>>>>>> Tom, Marek
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>> At the moment, whenever an unaligned address is used in cache
>>>>>>>>> operations
>>>>>>>>> (invalidate_dcache_range, or flush_dcache_range), the whole
>>>>>>>>> request is
>>>>>>>>> discarded  for am926ejs. for armV7 or armV8 only the aligned
>>>>>>>>> part is
>>>>>>>>> maintained. This is probably what is causing the bug addressed in
>>>>>>>>> 8133f43d1cd. There are a lot of unaligned buffers used in DMA
>>>>>>>>> operations
>>>>>>>>> and for all of them, we're possibly handling the cached partially
>>>>>>>>> or not
>>>>>>>>> at all. I've seen this when using the environment from a file
>>>>>>>>> stored in
>>>>>>>>> a FAT partition. commit 8133f43d1cd addresses this by using a
>>>>>>>>> bounce
>>>>>>>>> buffer at the FAT level but it's only one of many cases.
>>>>>>>>>
>>>>>>>>> I think we can do better with unaligned cache operations:
>>>>>>>>>
>>>>>>>>> * flush (writeback + invalidate): Suppose we use address p
>>>>>>>>> which is
>>>>>>>>> unaligned, flush_dcache_range() can do the writeback+invalidate
>>>>>>>>> on the
>>>>>>>>> whole range [p & ~(line_sz - 1); p + length | (line_sz - 1)].
>>>>>>>>> There
>>>>>>>>> should no problem with that since writeback can happen at any
>>>>>>>>> point in
>>>>>>>>> time.
>>>>>>>>>
>>>>>>>>> * invalidation
>>>>>>>>>
>>>>>>>>> It is a bit trickier. here is a pseudo-code:
>>>>>>>>> invalidate_dcache_range(p,length)
>>>>>>>>> {
>>>>>>>>>              write_back_invalidate(first line)
>>>>>>>>>              write_back_invalidate(last line)
>>>>>>>>>              invalidate(all other lines)
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Here again this should work fine IF invalidate_dcache_range() is
>>>>>>>>> called
>>>>>>>>> BEFORE the DMA operation (again the writeback can happen at
>>>>>>>>> time so
>>>>>>>>> it's
>>>>>>>>> valid do it here). Calling it only AFTER the operation, may
>>>>>>>>> corrupt
>>>>>>>>> the
>>>>>>>>> data written by the DMA with old data from CPU. This how I used to
>>>>>>>>> handle unaligned buffers in some other projects.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> There is however one loophole: a data sitting in the first or the
>>>>>>>>> last
>>>>>>>>> line is accessed before the memory is updated by the DMA, then the
>>>>>>>>> first/line will be corrupted. But it's not highly probable as this
>>>>>>>>> data
>>>>>>>>> would have to be used in parallel of the DMA (interrupt handling,
>>>>>>>>> SMP?,
>>>>>>>>> dma mgt related variable). So it's not perfect but it would
>>>>>>>>> still be
>>>>>>>>> better than we have today.
>>>>>>>> Or just fix all the code which complains about unaligned buffers,
>>>>>>>> done.
>>>>>>>> That's the way to go without all the complications above.
>>>>>>> It's not that complex, but it's not perfect. We would need to
>>>>>>> keep the
>>>>>>> same warning as we have now, but it would make it work in more
>>>>>>> cases.
>>>>>> The warning is there for that exact reason -- to inform you
>>>>>> something's
>>>>>> wrong.
>>>>>>
>>>>>>> Tracking every possible unaligned buffer that gets invalidated is
>>>>>>> not a
>>>>>>> trivial job. Most of the time the buffer is allocated in a upper
>>>>>>> layer
>>>>>>> and passed down to a driver via layers like network stack, block
>>>>>>> layer
>>>>>>> etc.And in many cases, the warning will come and go depending on
>>>>>>> how the
>>>>>>> variable aligned on the stack or the heap.
>>>>>> I didn't observe this much in fact. I usually see the buffers
>>>>>> coming it
>>>>>> aligned or being allocated in drivers. Also, I think that's why
>>>>>> the RC
>>>>>> cycle is there, so we can test the next release and fix these issues.
>>>>> It's not commonly seen but I came across it some times.
>>>>>
>>>>> Here are two examples:
>>>>>
>>>>> Network:
>>>>> U-Boot 2016.09-rc1-00087-gd40ff0a (Jul 27 2016 - 10:04:33 -0500)
>>>>> CPU  : DRA752-HS ES2.0
>>>>> Model: TI DRA742
>>>>> [...]
>>>>> Booting from network ...
>>>>> cpsw Waiting for PHY auto negotiation to complete.... done
>>>>> link up on port 0, speed 1000, full duplex
>>>>> BOOTP broadcast 1
>>>>> CACHE: Misaligned operation at range [dffecb40, dffecc96]
>>>>> CACHE: Misaligned operation at range [dffed140, dffed17e]
>>>>> BOOTP broadcast 2
>>>>> [...]
>>>>> File transfer via NFS from server 10.0.1.26; our IP address is
>>>>> 128.247.83.128; sending through gateway 128.247.82.1
>>>>> [...]
>>>>> Load address: 0x82000000
>>>>> Loading: CACHE: Misaligned operation at range [dffebfc0, dffebfea]
>>>>>
>>>>>
>>>>> FAT: it has been fixed recently by using a bounce buffer in FAT
>>>>> code by
>>>>> "8133f43d1cd fs/fat/fat_write: Fix buffer alignments".
>>>>> I've also seen it a few years back on PowerPC platforms when
>>>>> accessing a
>>>>> USB storage.
>>>>>
>>>>> I'm sure that we could find more examples. I don't mean that they
>>>>> shouldn't be fixed, simply that we could still make things work in
>>>>> most
>>>>> cases at a low cost.
>>>>> The modifications I proposed do not change the behaviour of the
>>>>> code for
>>>>> aligned buffers, it just make it much more likely that it would work
>>>>> with unaligned buffers. I fail to see the reason why this wouldn't
>>>>> be a
>>>>> good thing.
>>>> It seems to me like "it kinda works, but it really doesn't sometimes
>>>> ...." , so it's encouraging bad practice as people will get used to
>>>> ignoring this warning because things "kinda work, in most cases".
>>> I see your point but the current situation is exactly like that: it
>>> works most of the time.
>> But in this case, it's a property of the hardware. With your patch, it's
>> also a property of the code which works "most of the time". You
>> even state that in the patch description.
> I'm not sure I understand your point.

I'm referring to the last paragraph of the commit message, "There is
however one loophole..."

> But as I said I'm not pushing.
> You're not keen to the idea and you're more knowledgeable than I am in
> u-boot, so be it. I'll drop the subject
> Thanks for taking time to discuss it.

Let's wait for what the others have to say maybe ?

>>> I'm not pushing for the changes, I just wanted to let know that we could
>>> do better if we wish.
>> Replacing one problem with another which looks like it isn't problem
>> anymore (but still is) is not better IMO.
>>
>>>>> BTW the L2 uniphier cache implements this for flush and invalidation,
>>>>> and some architectures (pxa, blackfin, openrisc, etc.) do the flush()
>>>>> this way too.
>>>>>
>>>>> Jean-Jacques
>>>>>
>>
> 

-- 
Best regards,
Marek Vasut