[U-Boot] ARM - cache and alignment

Tue Jan 17 10:51:55 CET 2017


On 17/01/2017 10:38, Marek Vasut wrote:
> On 01/17/2017 10:35 AM, Jean-Jacques Hiblot wrote:
>>
>> On 17/01/2017 10:15, Marek Vasut wrote:
>>> On 01/17/2017 10:08 AM, Jean-Jacques Hiblot wrote:
>>>> On 16/01/2017 20:33, Marek Vasut wrote:
>>>>> On 01/16/2017 08:16 PM, Jean-Jacques Hiblot wrote:
>>>>>> On 16/01/2017 17:00, Marek Vasut wrote:
>>>>>>> On 01/16/2017 02:29 PM, Jean-Jacques Hiblot wrote:
>>>>>>>> Tom, Marek
>>>>>>> Hi,
>>>>>>>
>>>>>>>> At the moment, whenever an unaligned address is used in cache
>>>>>>>> operations
>>>>>>>> (invalidate_dcache_range, or flush_dcache_range), the whole
>>>>>>>> request is
>>>>>>>> discarded  for am926ejs. for armV7 or armV8 only the aligned part is
>>>>>>>> maintained. This is probably what is causing the bug addressed in
>>>>>>>> 8133f43d1cd. There are a lot of unaligned buffers used in DMA
>>>>>>>> operations
>>>>>>>> and for all of them, we're possibly handling the cached partially
>>>>>>>> or not
>>>>>>>> at all. I've seen this when using the environment from a file
>>>>>>>> stored in
>>>>>>>> a FAT partition. commit 8133f43d1cd addresses this by using a bounce
>>>>>>>> buffer at the FAT level but it's only one of many cases.
>>>>>>>>
>>>>>>>> I think we can do better with unaligned cache operations:
>>>>>>>>
>>>>>>>> * flush (writeback + invalidate): Suppose we use address p which is
>>>>>>>> unaligned, flush_dcache_range() can do the writeback+invalidate
>>>>>>>> on the
>>>>>>>> whole range [p & ~(line_sz - 1); p + length | (line_sz - 1)]. There
>>>>>>>> should no problem with that since writeback can happen at any
>>>>>>>> point in
>>>>>>>> time.
>>>>>>>>
>>>>>>>> * invalidation
>>>>>>>>
>>>>>>>> It is a bit trickier. here is a pseudo-code:
>>>>>>>> invalidate_dcache_range(p,length)
>>>>>>>> {
>>>>>>>>              write_back_invalidate(first line)
>>>>>>>>              write_back_invalidate(last line)
>>>>>>>>              invalidate(all other lines)
>>>>>>>> }
>>>>>>>>
>>>>>>>> Here again this should work fine IF invalidate_dcache_range() is
>>>>>>>> called
>>>>>>>> BEFORE the DMA operation (again the writeback can happen at time so
>>>>>>>> it's
>>>>>>>> valid do it here). Calling it only AFTER the operation, may corrupt
>>>>>>>> the
>>>>>>>> data written by the DMA with old data from CPU. This how I used to
>>>>>>>> handle unaligned buffers in some other projects.
>>>>>>>>
>>>>>>>>
>>>>>>>> There is however one loophole: a data sitting in the first or the
>>>>>>>> last
>>>>>>>> line is accessed before the memory is updated by the DMA, then the
>>>>>>>> first/line will be corrupted. But it's not highly probable as this
>>>>>>>> data
>>>>>>>> would have to be used in parallel of the DMA (interrupt handling,
>>>>>>>> SMP?,
>>>>>>>> dma mgt related variable). So it's not perfect but it would still be
>>>>>>>> better than we have today.
>>>>>>> Or just fix all the code which complains about unaligned buffers,
>>>>>>> done.
>>>>>>> That's the way to go without all the complications above.
>>>>>> It's not that complex, but it's not perfect. We would need to keep the
>>>>>> same warning as we have now, but it would make it work in more cases.
>>>>> The warning is there for that exact reason -- to inform you something's
>>>>> wrong.
>>>>>
>>>>>> Tracking every possible unaligned buffer that gets invalidated is
>>>>>> not a
>>>>>> trivial job. Most of the time the buffer is allocated in a upper layer
>>>>>> and passed down to a driver via layers like network stack, block layer
>>>>>> etc.And in many cases, the warning will come and go depending on
>>>>>> how the
>>>>>> variable aligned on the stack or the heap.
>>>>> I didn't observe this much in fact. I usually see the buffers coming it
>>>>> aligned or being allocated in drivers. Also, I think that's why the RC
>>>>> cycle is there, so we can test the next release and fix these issues.
>>>> It's not commonly seen but I came across it some times.
>>>>
>>>> Here are two examples:
>>>>
>>>> Network:
>>>> U-Boot 2016.09-rc1-00087-gd40ff0a (Jul 27 2016 - 10:04:33 -0500)
>>>> CPU  : DRA752-HS ES2.0
>>>> Model: TI DRA742
>>>> [...]
>>>> Booting from network ...
>>>> cpsw Waiting for PHY auto negotiation to complete.... done
>>>> link up on port 0, speed 1000, full duplex
>>>> BOOTP broadcast 1
>>>> CACHE: Misaligned operation at range [dffecb40, dffecc96]
>>>> CACHE: Misaligned operation at range [dffed140, dffed17e]
>>>> BOOTP broadcast 2
>>>> [...]
>>>> File transfer via NFS from server 10.0.1.26; our IP address is
>>>> 128.247.83.128; sending through gateway 128.247.82.1
>>>> [...]
>>>> Load address: 0x82000000
>>>> Loading: CACHE: Misaligned operation at range [dffebfc0, dffebfea]
>>>>
>>>>
>>>> FAT: it has been fixed recently by using a bounce buffer in FAT code by
>>>> "8133f43d1cd fs/fat/fat_write: Fix buffer alignments".
>>>> I've also seen it a few years back on PowerPC platforms when accessing a
>>>> USB storage.
>>>>
>>>> I'm sure that we could find more examples. I don't mean that they
>>>> shouldn't be fixed, simply that we could still make things work in most
>>>> cases at a low cost.
>>>> The modifications I proposed do not change the behaviour of the code for
>>>> aligned buffers, it just make it much more likely that it would work
>>>> with unaligned buffers. I fail to see the reason why this wouldn't be a
>>>> good thing.
>>> It seems to me like "it kinda works, but it really doesn't sometimes
>>> ...." , so it's encouraging bad practice as people will get used to
>>> ignoring this warning because things "kinda work, in most cases".
>> I see your point but the current situation is exactly like that: it
>> works most of the time.
> But in this case, it's a property of the hardware. With your patch, it's
> also a property of the code which works "most of the time". You
> even state that in the patch description.
I'm not sure I understand your point. But as I said I'm not pushing. 
You're not keen to the idea and you're more knowledgeable than I am in 
u-boot, so be it. I'll drop the subject
Thanks for taking time to discuss it.
>> I'm not pushing for the changes, I just wanted to let know that we could
>> do better if we wish.
> Replacing one problem with another which looks like it isn't problem
> anymore (but still is) is not better IMO.
>
>>>> BTW the L2 uniphier cache implements this for flush and invalidation,
>>>> and some architectures (pxa, blackfin, openrisc, etc.) do the flush()
>>>> this way too.
>>>>
>>>> Jean-Jacques
>>>>
>