[U-Boot] i.MX51: FEC: Cache coherency problem?

Tue Jul 19 22:11:24 CEST 2011

On 7/19/2011 11:14 AM, Anton Staaf wrote:
> On Tue, Jul 19, 2011 at 7:36 AM, J. William Campbell
> <jwilliamcampbell at comcast.net>  wrote:
>> On 7/19/2011 2:05 AM, Albert ARIBAUD wrote:
>>> Le 19/07/2011 10:43, Aneesh V a écrit :
>>>
>>>>>> You would have to flush (before sending packets / starting external
>>>>>> memory-to-device DMA) and invalidate (before reading received packets /
>>>>>> after external device-to-memory DMA is done); using MMU and mapping
>>>>>> cached/non-cached areas is IMO overkill, and will hurt CPU accesses to
>>>>>> the xmit/receive buffers and descriptors.
>>>>> So, you say actually what I did while exploring the problem would have
>>>>> been a
>>>>> correct way of solving this problem?
>>>>>
>>>>> Like this:
>>>>>
>>>>> 587 flush_cache(&fec->tbd_base[fec->tbd_index], 4);
>>>> This is what is needed assuming the below is initiating a memory to
>>>> peripheral DMA. Is your buffer only 4 bytes long?
>>> Generally:
>>>
>>> - for sending data through a device that has its own, external, DMA
>>> engine, you'll obviously need to flush the data buffer(s) but also any
>>> DMA descriptors used by the engine, before you start the engine;
>>>
>>> - for rceiving, if you have to set up receive descriptors, you must
>>> flush that before telling the device to enter receive mode (so that the
>>> device reads the descriptors as you wrote them), and you should
>>> invalidate the receive buffers at the latest when the device signals
>>> that data has been received,
>> Hi All,
>>
>>>    or preferably long before (at the same time
>>> you flushed the read descriptor, so that cache-related actions are
>>> grouped in the same place in the code).
>> I think this last statement is incorrect. You should invalidate the
>> cache for the receive buffers just before you intend to reference them.
>> If you do it right after receive mode is entered, subsequent access to
>> items NEAR the receive buffer may reload the cache with receive buffer
>> data before the dma is done, re-creating the problem you are trying to
>> avoid. Also, I don't know if ARM cache is write-back or write -thru, but
>> if it is write-back, the only way to avoid problems is to allocate the
>> receive buffers on cache line boundaries, so no "nearby" writes can
>> cause something in the DMA buffer to be corrupted. If all receive
>> buffers are allocated on cache-line boundaries (both start and end of
>> each buffer), you can invalidate the cache "early" under the assumption
>> that there will be no read accesses to the read buffers themselves until
>> after DMA is complete. IMHO it is better, even in this case., to
>> invalidate cache after dma is done but before referencing the read data.
> This is a critical observation, and one I was going to make if I had
> made it to the end of the thread and no one had already pointed it
> out.  In fact, there is no way with the current implementation (that I
> can see) of the v7_dcache_inval_range function to correctly implement
> a cache enabled DMA driven driver without aligning buffers to cache
> line sizes.  Below is the commit message from a recent patch I made
> locally to fix the Tegra MMC driver.  I wanted to start a discussion
> on the list about forcing all buffers to be aligned to cache line
> sizes.  The problem is that many buffers are stack allocated or part
> of structs that are of unknown alignment.
>
>      mmc: Tegra2: Enable dcache support by bouncing unaligned requests.
>
>      Dcache support was disabled due to dcache/dma interactions with
>      unaligned buffers.  When an unaligned buffer is used for DMA the
>      first and last few bytes of the buffer will be clobbered by the
>      dcache invalidate call that is required to make the contents of
>      the buffer visible to the CPU post DMA.  The reason that these
>      bytes are clobbered is that the dcache invalidate code (which is
>      the same as the linux kernels) checks for unaligned invalidates
>      and first flushes the cache lines that are being partially
>      invalidated.  This is required because the cache lines may be
>      shared with other variables, and may be dirty.
Hi Anton,
>    This flush however
>      writes over the first and last few bytes of the unaligned buffer
>      with whatever happened to be in the cache.
If this is true, then it means that the cache is of type write-back (as 
opposed to write-thru). From a (very brief) look at the arm7 manuals, it 
appears that both types of cache may be present in the cpu. Do you know 
how this operates? If all that we have is write-back, you are correct. 
There is no safe way to do DMA to non-cache aligned buffers. This means 
both ends of the buffer must be cache aligned, so you may have to 
allocate it bigger than the transfer requires. For unfamiliar readers, 
write-back cache has only a single dirty bit for each cache line. When a 
dirty line is flushed, the cpu doesn't know which data changed, so it 
writes all locations back into memory, thus write-back. Write-thru cache 
updates cache on a write, and also schedules a write to memory 
immediately of the new data.  That way, no writes are actually necessary 
to flush the cache. The downside is, many updates of the same memory 
location will produce a lot of main-memory writes that may not actually 
be required.

Best Regards,
Bill Campbell
>
>      There are a number of possible solutions:
>
>      1) Modify the invalidate code to first read the partial cache line
>      and then invalidate and then write back just the valid part of the
>      line.  This suffers from a race condition with concurrent code in
>      interrupt handlers or other CPUs.
>
>      2) Modify all of U-Boot to allocate buffers from a block allocation
>      mechanism that ensures they are alligned on cache line boundaries.
>      While this is possible, there are some cases where the public API
>      passes down a buffer pointer all the way to the block read interface.
>      So all stand alone U-Boot programs would have to be fixed as well.
>
>      3) Use a bounce buffer that is known to be alligned.  Allocate the
>      buffer in the Tegra MMC code and ensure it is large enough for the
>      read/write as well as aligned correctly.  This solution requires an
>      extra memcpy of the data read or written and has a high water mark
>      on memory consumption.  It can be conditionally used only when the
>      buffer passed in is unaligned.
>
>      This patch implements the third solution.
>
> -Anton
>
>