[U-Boot] ARM - cache and alignment

Tue Jan 17 21:54:57 CET 2017

Hi,

On Tue, Jan 17, 2017 at 10:54 AM, Marek Vasut <marex at denx.de> wrote:
> On 01/17/2017 10:51 AM, Jean-Jacques Hiblot wrote:
>>
>>
>> On 17/01/2017 10:38, Marek Vasut wrote:
>>> On 01/17/2017 10:35 AM, Jean-Jacques Hiblot wrote:
>>>>
>>>> On 17/01/2017 10:15, Marek Vasut wrote:
>>>>> On 01/17/2017 10:08 AM, Jean-Jacques Hiblot wrote:
>>>>>> On 16/01/2017 20:33, Marek Vasut wrote:
>>>>>>> On 01/16/2017 08:16 PM, Jean-Jacques Hiblot wrote:
>>>>>>>> On 16/01/2017 17:00, Marek Vasut wrote:
>>>>>>>>> On 01/16/2017 02:29 PM, Jean-Jacques Hiblot wrote:
>>>>>>>>>> Tom, Marek
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>> At the moment, whenever an unaligned address is used in cache
>>>>>>>>>> operations
>>>>>>>>>> (invalidate_dcache_range, or flush_dcache_range), the whole
>>>>>>>>>> request is
>>>>>>>>>> discarded  for am926ejs. for armV7 or armV8 only the aligned
>>>>>>>>>> part is
>>>>>>>>>> maintained. This is probably what is causing the bug addressed in
>>>>>>>>>> 8133f43d1cd. There are a lot of unaligned buffers used in DMA
>>>>>>>>>> operations
>>>>>>>>>> and for all of them, we're possibly handling the cached partially
>>>>>>>>>> or not
>>>>>>>>>> at all. I've seen this when using the environment from a file
>>>>>>>>>> stored in
>>>>>>>>>> a FAT partition. commit 8133f43d1cd addresses this by using a
>>>>>>>>>> bounce
>>>>>>>>>> buffer at the FAT level but it's only one of many cases.
>>>>>>>>>>
>>>>>>>>>> I think we can do better with unaligned cache operations:
>>>>>>>>>>
>>>>>>>>>> * flush (writeback + invalidate): Suppose we use address p
>>>>>>>>>> which is
>>>>>>>>>> unaligned, flush_dcache_range() can do the writeback+invalidate
>>>>>>>>>> on the
>>>>>>>>>> whole range [p & ~(line_sz - 1); p + length | (line_sz - 1)].
>>>>>>>>>> There
>>>>>>>>>> should no problem with that since writeback can happen at any
>>>>>>>>>> point in
>>>>>>>>>> time.
>>>>>>>>>>
>>>>>>>>>> * invalidation
>>>>>>>>>>
>>>>>>>>>> It is a bit trickier. here is a pseudo-code:
>>>>>>>>>> invalidate_dcache_range(p,length)
>>>>>>>>>> {
>>>>>>>>>>              write_back_invalidate(first line)
>>>>>>>>>>              write_back_invalidate(last line)
>>>>>>>>>>              invalidate(all other lines)
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Here again this should work fine IF invalidate_dcache_range() is
>>>>>>>>>> called
>>>>>>>>>> BEFORE the DMA operation (again the writeback can happen at
>>>>>>>>>> time so
>>>>>>>>>> it's
>>>>>>>>>> valid do it here). Calling it only AFTER the operation, may
>>>>>>>>>> corrupt
>>>>>>>>>> the
>>>>>>>>>> data written by the DMA with old data from CPU. This how I used to
>>>>>>>>>> handle unaligned buffers in some other projects.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> There is however one loophole: a data sitting in the first or the
>>>>>>>>>> last
>>>>>>>>>> line is accessed before the memory is updated by the DMA, then the
>>>>>>>>>> first/line will be corrupted. But it's not highly probable as this
>>>>>>>>>> data
>>>>>>>>>> would have to be used in parallel of the DMA (interrupt handling,
>>>>>>>>>> SMP?,
>>>>>>>>>> dma mgt related variable). So it's not perfect but it would
>>>>>>>>>> still be
>>>>>>>>>> better than we have today.
>>>>>>>>> Or just fix all the code which complains about unaligned buffers,
>>>>>>>>> done.
>>>>>>>>> That's the way to go without all the complications above.
>>>>>>>> It's not that complex, but it's not perfect. We would need to
>>>>>>>> keep the
>>>>>>>> same warning as we have now, but it would make it work in more
>>>>>>>> cases.
>>>>>>> The warning is there for that exact reason -- to inform you
>>>>>>> something's
>>>>>>> wrong.
>>>>>>>
>>>>>>>> Tracking every possible unaligned buffer that gets invalidated is
>>>>>>>> not a
>>>>>>>> trivial job. Most of the time the buffer is allocated in a upper
>>>>>>>> layer
>>>>>>>> and passed down to a driver via layers like network stack, block
>>>>>>>> layer
>>>>>>>> etc.And in many cases, the warning will come and go depending on
>>>>>>>> how the
>>>>>>>> variable aligned on the stack or the heap.
>>>>>>> I didn't observe this much in fact. I usually see the buffers
>>>>>>> coming it
>>>>>>> aligned or being allocated in drivers. Also, I think that's why
>>>>>>> the RC
>>>>>>> cycle is there, so we can test the next release and fix these issues.
>>>>>> It's not commonly seen but I came across it some times.
>>>>>>
>>>>>> Here are two examples:
>>>>>>
>>>>>> Network:
>>>>>> U-Boot 2016.09-rc1-00087-gd40ff0a (Jul 27 2016 - 10:04:33 -0500)
>>>>>> CPU  : DRA752-HS ES2.0
>>>>>> Model: TI DRA742
>>>>>> [...]
>>>>>> Booting from network ...
>>>>>> cpsw Waiting for PHY auto negotiation to complete.... done
>>>>>> link up on port 0, speed 1000, full duplex
>>>>>> BOOTP broadcast 1
>>>>>> CACHE: Misaligned operation at range [dffecb40, dffecc96]
>>>>>> CACHE: Misaligned operation at range [dffed140, dffed17e]
>>>>>> BOOTP broadcast 2
>>>>>> [...]
>>>>>> File transfer via NFS from server 10.0.1.26; our IP address is
>>>>>> 128.247.83.128; sending through gateway 128.247.82.1
>>>>>> [...]
>>>>>> Load address: 0x82000000
>>>>>> Loading: CACHE: Misaligned operation at range [dffebfc0, dffebfea]
>>>>>>
>>>>>>
>>>>>> FAT: it has been fixed recently by using a bounce buffer in FAT
>>>>>> code by
>>>>>> "8133f43d1cd fs/fat/fat_write: Fix buffer alignments".
>>>>>> I've also seen it a few years back on PowerPC platforms when
>>>>>> accessing a
>>>>>> USB storage.
>>>>>>
>>>>>> I'm sure that we could find more examples. I don't mean that they
>>>>>> shouldn't be fixed, simply that we could still make things work in
>>>>>> most
>>>>>> cases at a low cost.
>>>>>> The modifications I proposed do not change the behaviour of the
>>>>>> code for
>>>>>> aligned buffers, it just make it much more likely that it would work
>>>>>> with unaligned buffers. I fail to see the reason why this wouldn't
>>>>>> be a
>>>>>> good thing.
>>>>> It seems to me like "it kinda works, but it really doesn't sometimes
>>>>> ...." , so it's encouraging bad practice as people will get used to
>>>>> ignoring this warning because things "kinda work, in most cases".
>>>> I see your point but the current situation is exactly like that: it
>>>> works most of the time.
>>> But in this case, it's a property of the hardware. With your patch, it's
>>> also a property of the code which works "most of the time". You
>>> even state that in the patch description.
>> I'm not sure I understand your point.
>
> I'm referring to the last paragraph of the commit message, "There is
> however one loophole..."
>
>> But as I said I'm not pushing.
>> You're not keen to the idea and you're more knowledgeable than I am in
>> u-boot, so be it. I'll drop the subject
>> Thanks for taking time to discuss it.
>
> Let's wait for what the others have to say maybe ?

IMHO, a 100% reliable U-Boot is preferable, even if this means less
performance. It does not seem like there would be a way of working
around this loophole, so I would avoid this cache hack, even if the
probability of getting an issue would be small.

Maintaining only the aligned part of DMA buffers for cache operations
as Jean-Jacques says is done for armv7/8 is also not perfect. This
means that probably at least all the storage-related device drivers on
ARM can be affected in some way by unaligned buffers. That would mean
a lot of changes to move the buffer bounce mechanism from the FAT
layer to the device driver layer. That's why I'd rather move it to the
block layer. The architecture/driver combinations requiring it could
add a flag or a configuration setting somewhere, so that the
performance is not lowered when not needed. It would also only be
applied if a misalignment is detected, in which case a debug() message
could also be printed.

>>>> I'm not pushing for the changes, I just wanted to let know that we could
>>>> do better if we wish.
>>> Replacing one problem with another which looks like it isn't problem
>>> anymore (but still is) is not better IMO.
>>>
>>>>>> BTW the L2 uniphier cache implements this for flush and invalidation,
>>>>>> and some architectures (pxa, blackfin, openrisc, etc.) do the flush()
>>>>>> this way too.

Best regards,
Benoît