[U-Boot] i.MX51: FEC: Cache coherency problem?

David Jander david.jander at protonic.nl
Tue Jul 19 10:58:57 CEST 2011


Dear Aneesh,

Thanks a lot for your replies.

On Tue, 19 Jul 2011 14:13:34 +0530
Aneesh V <aneesh at ti.com> wrote:
> On Tuesday 19 July 2011 02:07 PM, David Jander wrote:
> > On Tue, 19 Jul 2011 10:21:12 +0200
> > Albert ARIBAUD<albert.u.boot at aribaud.net>  wrote:
> >
> >> Hi David,
> >>
> >> Le 19/07/2011 09:44, David Jander a écrit :
> >>>
> >>> Hi Stefano,
> >>>
> >>> On Mon, 18 Jul 2011 18:55:05 +0200
> >>> Stefano Babic<sbabic at denx.de>   wrote:
> >>>
> >>>> On 07/18/2011 05:18 PM, David Jander wrote:
> >>>>>
> >>>>> Hi all,
> >>>>
> >>>> Hi David,
> >>>>
> >>>>> What is going on here? Why did this work with caches enabled before??
> >>>>
> >>>> I think cache was always disabled..
> >>>
> >>> I had even L2-caches enabled in u-boot (copied/adapted some code from
> >>> OMAP cache.S), and called i/dcache_enable() from board code like this:
> >>>
> >>> int board_late_init(void)
> >>> {
> >>>           power_init();
> >>>           probe_board_type();
> >>>           icache_enable();
> >>>           dcache_enable();
> >>>
> >>>           return 0;
> >>> }
> >>>
> >>> Is there a reason this wouldn't have worked before?
> >>>
> >>> Suppose it didn't. Does that mean we need to use the MMU to properly mark
> >>> regions of register space and specially FEC BD's as not-cached? Or do we
> >>> need to flash caches manually each time such a memory region is accessed?
> >>> I am kind of a CPU-speed-junkie, so I am not sure I want to live without
> >>> caches enabled in u-boot ;-)
> 
> Are you talking about some memory-mapped IO region(register space). If
> that is the case, that region won't be cached. ARM mmu implementation
> makes only the SDRAM region cached. Rest is non-cached non-buffered.

Ah, thanks for pointing out. I already suspected something like that...

> >> You would have to flush (before sending packets / starting external
> >> memory-to-device DMA) and invalidate (before reading received packets /
> >> after external device-to-memory DMA is done); using MMU and mapping
> >> cached/non-cached areas is IMO overkill, and will hurt CPU accesses to
> >> the xmit/receive buffers and descriptors.
> >
> > So, you say actually what I did while exploring the problem would have
> > been a correct way of solving this problem?
> >
> > Like this:
> >
> > 587         flush_cache(&fec->tbd_base[fec->tbd_index], 4);
> 
> This is what is needed assuming the below is initiating a memory to
> peripheral DMA. Is your buffer only 4 bytes long?

No it isn't. I know, I should flush the whole buffer area, but this was just
enough to get the status field flushed, so the FEC started transmitting, and
the while loop ended eventually. The result was still not correct, but at
least it won't hang.
What would be more expensive, flushing just the buffer area, or
flush_dcache_all()?

> Also, please check if flush_cache() is correctly supported for your
> CPU. The default implementation in in arch/arm/lib/cache.c has support
> for only a handful of cpus. AFAIK, only armv7 is over-riding this
> default implementation at the moment.

There is cache_v7.c which implements these... they are supposed to work
correctly I guess?

> The fact that it's helping you indicates that it may be working for
> you. But still worth a check.
> 
> > 588         fec_tx_task_enable(fec);
> > 589         flush_dcache_all();
> 
> This should not be needed.

I agree, but without it the while loop below still gets stuck.

> > 590
> > 591         /*
> > 592          * wait until frame is sent .
> > 593          */
> > 594         while (readw(&fec->tbd_base[fec->tbd_index].status)&
> > FEC_TBD_READY) {
> > 595                 udelay(1);
> > 596         }
> >
> > I am still not sure why I need line 587 above.
> 
> Did you try keeping 587 and removing 589?

Yes, I did. These two lines really is the minimum necessary to exit the while
loop. It won't work if I leave out either of those.
Now that I know a little more, I guess this is because of the status field in
the buffer-descriptor being checked in the while loop, and that is still in
cache. So the only thing line 589 does, is invalidate the caches, so the next
readw() returns the value stored by the FEC, which is apparently faster than
this piece of code :-)

Best regards,

-- 
David Jander
Protonic Holland.


More information about the U-Boot mailing list