[U-Boot] i.MX51: FEC: Cache coherency problem?

Thu Jul 21 08:48:03 CEST 2011

On Wed, 20 Jul 2011 08:36:12 -0700
"J. William Campbell" <jwilliamcampbell at comcast.net> wrote:

> On 7/20/2011 7:35 AM, Albert ARIBAUD wrote:
> > Le 20/07/2011 16:01, J. William Campbell a écrit :
> >> On 7/20/2011 6:02 AM, Albert ARIBAUD wrote:
> >>> Le 19/07/2011 22:11, J. William Campbell a écrit :
> >>>
> >>>> If this is true, then it means that the cache is of type write-back 
> >>>> (as
> >>>> opposed to write-thru). From a (very brief) look at the arm7 
> >>>> manuals, it
> >>>> appears that both types of cache may be present in the cpu. Do you 
> >>>> know
> >>>> how this operates?
> >>> Usually, copyback (rather than writeback) and writethough are modes of
> >>> operation, not cache types.
> >> Hi Albert,
> >> One some CPUs both cache modes are available. On many other CPUs (I
> >> would guess most), you have one fixed mode available, but not both. I
> >> have always seen the two modes described as write-back and
> >> write-through, but I am sure we are talking about the same things.
> >
> > We are. Copy-back is another name for write-back, not used by ARM but 
> > by some others.
> >
> >> The
> >> examples that have both modes that I am familiar with have the mode as a
> >> "global" setting. It is not controlled by bits in the TLB or anything
> >> like that. How does it work on ARM? Is it fixed, globally, globally
> >> controlled, or controlled by memory management?
> >
> > Well, it's a bit complicated, because it depends on the architecture 
> > version *and* implementation -- ARM themselves do not mandate things, 
> > and it is up to the SoC designer to specify what cache they want and 
> > what mode it supports, both at L1 and L2, in their specific instance 
> > of ARM cores. And yes, you can have memory areas that are write-back 
> > and others that are write-through in the same system.
> >
> >> If it is controlled by memory management, it looks to me like lots of
> >> problems could be avoided by operating with input type buffers set as
> >> write-through. One probably isn't going to be writing to input buffers
> >> much under program control anyway, so the performance loss should be
> >> minimal. This gets rid of the alignment restrictions on these buffers
> >> but not the invalidate/flush requirements.
> >
> > There's not much you can do about alignment issues except align to 
> > cache line boundaries.
> >
> >> However, if memory management
> >> is required to set the cache mode, it might be best to operate with the
> >> buffers and descriptors un-cached. That gets rid of the flush/invalidate
> >> requirement at the expense of slowing down copying from read buffers.
> >
> > That makes 'best' a subjective choice, doesn't it? :)
> Hi All,
>          Yes,it probably depends on the usage.
> >
> >> Probably a reasonable price to pay for the associated simplicity.
> >
> > Others would say that spending some time setting up alignments and 
> > flushes and invalidates is a reasonable price to pay for increased 
> > performance... That's an open debate where no solution is The Right 
> > One(tm).
> >
> > For instance, consider the TFTP image reading. People would like the 
> > image to end up in cached memory because we'll do some checksumming on 
> > it before we give it control, and having it cached makes this step 
> > quite faster; but we'll lose that if we put it in non-cached memory 
> > because it comes through the Ethernet controller's DMA; and it would 
> > be worse to receive packets in non-cached memory only to move their 
> > contents into cached memory later on.
> >
> > I think properly aligning descriptors and buffers is enough to avoid 
> > the mixed flush/invalidate line issue, and wisely putting instruction 
> > barriers should be enough to get the added performance of cache 
> > without too much of the hassle of memory management.
> I am pretty sure that all the drivers read the input data into 
> intermediate buffers in all cases. There is no practical way to be sure 
> the next packet received is the "right one" for the tftp. Plus there are 
> headers involved, and furthermore there is no way to ensure that a tftp 
> destination is located on a sector boundary. In short, you are going to 
> copy from an input buffer to a destination.
> However, it is still correct that copying from an non-cached area is 
> slower than from cached areas, because of burst reads vs. individual 
> reads. However, I doubt that the u-boot user can tell the difference, as 
> the network latency will far exceed the difference in copy time. The 
> question is, which is easier to do, and that is probably a matter of 
> opinion. However, it is safe to say that so far a cached solution has 
> eluded us. That may be changing, but it would still be nice to know how 
> to allocate a section of un-cached RAM in the ARM processor, in so far 
> as the question has a single answer! That would allow easy portability 
> of drivers that do not know about caches, of which there seems to be many.

I agree. Unfortunately, my time is up for now, and I can't go on with trying
to fix this driver. Maybe I'll pick up after my vacation.
As for now I settled for the ugly solution of keeping dcache disabled while
ethernet is being used :-(
IMHO, doing cache maintenance all over the driver is not an easy or nice
solution. Implementing a non-cached memory pool in the MMU and a corresponding
dma_malloc() sounds like much more universally applicable to any driver.

Best regards,

-- 
David Jander
Protonic Holland.