[U-Boot] i.MX51: FEC: Cache coherency problem?

Wed Jul 20 17:36:12 CEST 2011

On 7/20/2011 7:35 AM, Albert ARIBAUD wrote:
> Le 20/07/2011 16:01, J. William Campbell a écrit :
>> On 7/20/2011 6:02 AM, Albert ARIBAUD wrote:
>>> Le 19/07/2011 22:11, J. William Campbell a écrit :
>>>
>>>> If this is true, then it means that the cache is of type write-back 
>>>> (as
>>>> opposed to write-thru). From a (very brief) look at the arm7 
>>>> manuals, it
>>>> appears that both types of cache may be present in the cpu. Do you 
>>>> know
>>>> how this operates?
>>> Usually, copyback (rather than writeback) and writethough are modes of
>>> operation, not cache types.
>> Hi Albert,
>> One some CPUs both cache modes are available. On many other CPUs (I
>> would guess most), you have one fixed mode available, but not both. I
>> have always seen the two modes described as write-back and
>> write-through, but I am sure we are talking about the same things.
>
> We are. Copy-back is another name for write-back, not used by ARM but 
> by some others.
>
>> The
>> examples that have both modes that I am familiar with have the mode as a
>> "global" setting. It is not controlled by bits in the TLB or anything
>> like that. How does it work on ARM? Is it fixed, globally, globally
>> controlled, or controlled by memory management?
>
> Well, it's a bit complicated, because it depends on the architecture 
> version *and* implementation -- ARM themselves do not mandate things, 
> and it is up to the SoC designer to specify what cache they want and 
> what mode it supports, both at L1 and L2, in their specific instance 
> of ARM cores. And yes, you can have memory areas that are write-back 
> and others that are write-through in the same system.
>
>> If it is controlled by memory management, it looks to me like lots of
>> problems could be avoided by operating with input type buffers set as
>> write-through. One probably isn't going to be writing to input buffers
>> much under program control anyway, so the performance loss should be
>> minimal. This gets rid of the alignment restrictions on these buffers
>> but not the invalidate/flush requirements.
>
> There's not much you can do about alignment issues except align to 
> cache line boundaries.
>
>> However, if memory management
>> is required to set the cache mode, it might be best to operate with the
>> buffers and descriptors un-cached. That gets rid of the flush/invalidate
>> requirement at the expense of slowing down copying from read buffers.
>
> That makes 'best' a subjective choice, doesn't it? :)
Hi All,
         Yes,it probably depends on the usage.
>
>> Probably a reasonable price to pay for the associated simplicity.
>
> Others would say that spending some time setting up alignments and 
> flushes and invalidates is a reasonable price to pay for increased 
> performance... That's an open debate where no solution is The Right 
> One(tm).
>
> For instance, consider the TFTP image reading. People would like the 
> image to end up in cached memory because we'll do some checksumming on 
> it before we give it control, and having it cached makes this step 
> quite faster; but we'll lose that if we put it in non-cached memory 
> because it comes through the Ethernet controller's DMA; and it would 
> be worse to receive packets in non-cached memory only to move their 
> contents into cached memory later on.
>
> I think properly aligning descriptors and buffers is enough to avoid 
> the mixed flush/invalidate line issue, and wisely putting instruction 
> barriers should be enough to get the added performance of cache 
> without too much of the hassle of memory management.
I am pretty sure that all the drivers read the input data into 
intermediate buffers in all cases. There is no practical way to be sure 
the next packet received is the "right one" for the tftp. Plus there are 
headers involved, and furthermore there is no way to ensure that a tftp 
destination is located on a sector boundary. In short, you are going to 
copy from an input buffer to a destination.
However, it is still correct that copying from an non-cached area is 
slower than from cached areas, because of burst reads vs. individual 
reads. However, I doubt that the u-boot user can tell the difference, as 
the network latency will far exceed the difference in copy time. The 
question is, which is easier to do, and that is probably a matter of 
opinion. However, it is safe to say that so far a cached solution has 
eluded us. That may be changing, but it would still be nice to know how 
to allocate a section of un-cached RAM in the ARM processor, in so far 
as the question has a single answer! That would allow easy portability 
of drivers that do not know about caches, of which there seems to be many.

Best Regards,
Bill Campbell
>
>> Best Regards,
>> Bill Campbell
>
> Amicalement,