[U-Boot] [PATCH V2] net: fec_mxc: allow use with cache enabled

Marek Vasut marex at denx.de
Tue Mar 6 22:22:54 CET 2012


Dear Eric Nelson,

> On 03/06/2012 12:45 PM, Marek Vasut wrote:
> > Dear Eric Nelson,
> > 
> >> On 03/05/2012 01:06 PM, Marek Vasut wrote:
> >>> Dear Eric Nelson,
> >>> 
> >>>> 	ensure that transmit and receive buffers are cache-line aligned
> >>>> 	
> >>>>           invalidate cache after each packet received
> >>>>           flush cache before transmitting
> >>>> 	
> >>>> 	Original patch by Marek:
> >>>> 		http://lists.denx.de/pipermail/u-boot/2012-February/117695.html
> >>> 
> >>> Would be cool to Cc me :-p
> >> 
> >> Sorry about that.
> > 
> > It's ok, don't worry about it ;-)
> > 
> > [...]
> > 
> >>> I think this "writel()" call is bogus and should be removed in
> >>> subsequent patch and replaced with simple assignment. It was here
> >>> probably due to cache issues on PPC?
> >> 
> >> The RBD has me puzzled. Do we treat them like registers and use
> >> readx/writex or like in-RAM data structures...
> > 
> > I'd go for the in-RAM data structures, but such conversion should happen
> > in a separate patch only AFTER the cache support is in.
> > 
> > [...]
> 
> Sounds good.
> 
> >>>> +		if (!fec->rbd_base) {
> >>>> +			ret = -ENOMEM;
> >>>> +			goto err2;
> >>>> +		}
> >>>> +		memset(fec->rbd_base, 0, size);
> >>>> +	}
> >>> 
> >>> We might want to flush the descriptors to memory after they have been
> >>> inited?
> >> 
> >> Again, good catch.
> >> 
> >> On this topic (initialization of RBD), I had a bit of a concern
> >> regarding the initialization of things.
> >> 
> >> In fec_open, the receive buffer descriptors are initialized and the
> >> last one set is to 'wrap'. If this code were to execute when the
> >> controller is live, bad things would surely happen.
> >> 
> >> I traced through all of the paths I can see, and I believe that
> >> we're safe. It appears that fec_halt() will be called prior to
> >> any call to fec_init() and fec_open().
> > 
> > Yes, this will only happen if something went wrong.
> > 
> >> In fec_open() a number of calls to fec_rbd_clean() are made and
> >> a single flush_dcache() is made afterwards.
> >> 
> >> While this works and causes less thrashing than per-RBD flushes,
> >> I think the code would be simpler if fec_rbd_clean just did the
> >> flush itself. This would also remove the need for a separate
> >> flush in fec_recv.
> > 
> > Not really, rbd might be (and likely is) smaller than cache line,
> > therefore you won't be able to flush single rbd, unless it's cacheline
> > aligned, which wastes space.
> > 
> > [...]
> 
> Yeah. Please disregard my comments. I wrote that before I fully
> appreciated what was being done in fec_recv().
> 
> >>>> +	invalidate_dcache_range(addr, addr + size);
> >>>> +
> >>> 
> >>> The idea here is the following (demo uses 32byte long cachelines):
> >>> 
> >>> [DESC1][DESC2][DESC3][DESC4][DESC5][DESC6]
> >>> `------- cacheline --------'
> >>> 
> >>> We want to start retrieving contents of DESC3, therefore "addr" points
> >>> to DESC1, "size" is size of cacheline (I hope there's no hardware with
> >>> 8 byte cachelines, but this should be ok here).
> >>> 
> >>> NOTE[5]: Here we can invalidate the whole cacheline, because the
> >>> descriptors so far were modified only be hardware, not by us. We are
> >>> not writing anything there so we won't loose any information.
> >>> 
> >>> NOTE[6]: This invalidation ensures that we always have a fresh copy of
> >>> the cacheline containing all the descriptors, therefore we always have
> >>> a fresh status of the descriptors we are about to pick. Since this is
> >>> a sequential execution, the cache eviction should not kick in here (or
> >>> might it?).
> >> 
> >> Another way to look at this is this:
> >> 	After fec_open(), the hardware owns the rbd, and all we should do
> >> 	is read it. In order to make sure we don't have a stale copy, we
> >> 	need to call invalidate() before looking at the values.
> >> 
> >> Tracing the code to find out whether this is true, the only write I see
> >> is within fec_recv() when the last descriptor is full, when the driver
> >> takes ownership of **all** of the descriptors, calling fec_rbd_clean()
> >> on each.
> >> 
> >> The only thing that looks funky is this:
> >> 		size = (CONFIG_FEC_ALIGN / sizeof(struct fec_bd)) - 1;
> >> 		if ((fec->rbd_index&  size) == size) {
> >> 
> >> Wouldn't a test of rbd_index against FEC_RBD_NUM be more appropriate?
> >> i.e.
> >> 
> >> 		if (fec->rbd_index == FEC_RBD_NUM-1) {
> > 
> > I believe the FEC doesn't always start from rbd_index == 0, and if you
> > were to receive more than 64 rbds between open() and close(), this
> > implementation works, your would fail.
> 
> Yep. Disregard that too.
> 
> <snip>
> 
> >>> The solution is the following:
> >>> 
> >>> 1) Compute how many descriptors are per-cache line
> >>> 2) Make sure FEC_RBD_NUM * sizeof(struct fec_bd) is at least 2 *
> >>> CONFIG_FEC_DATA_ALIGNMENT in size, see NOTE[11].
> >>> 3) Once the last RX buffer in the cacheline is processed, mark them all
> >>> clean and flush them all, see NOTE[10].
> >>> 
> >>> NOTE[10]: This is legal, because the hardware won't use RX descriptors
> >>> that it marked dirty (which means not picked up by software yet). We
> >>> clean the desciptors in an order the hardware would pick them up again
> >>> so there's no problem with race condition either. The only possible
> >>> issue here is if there was hardware with cacheline size smaller than
> >>> descriptor size (we should add a check for this at the begining of the
> >>> file).
> >>> 
> >>> NOTE[11]: This is because we want the FEC to overwrite descriptors
> >>> below the other cacheline while we're marking the one containing
> >>> retrieved descriptors clean.
> >> 
> >> Ahah! Now I see what the size calculation is doing.
> >> 
> >> A well-named constant, maybe "RXDESC_PER_CACHELINE" would be useful
> >> here.
> > 
> > Yes
> > 
> >> 	#define RXDESC_PER_CACHELINE	(CONFIG_FEC_ALIGN/sizeof(struct fec_bd))
> >> 	
> >>>> -		fec_rbd_clean(fec->rbd_index == (FEC_RBD_NUM - 1) ? 1 : 
0, rbd);
> >>>> +		size = (CONFIG_FEC_DATA_ALIGNMENT / sizeof(struct 
fec_bd)) - 1;
> >>>> 
> >> 		size = RXDESC_PER_CACHELINE-1;
> >> 		
> >>>> +		if ((fec->rbd_index&   size) == size) {
> >> 
> >> The line above only works if RXDESC_PER_CACHELINE is a multiple of 2,
> >> which is likely to work because sizeof(struct fec_bd) == 8.
> > 
> > Adding such a comment (and maybe CPP check) won't hurt.
> 
> I'm struggling getting the CPP to do the work at the moment...
> 
> >>>> +			i = fec->rbd_index - size;
> >>>> +			addr = (uint32_t)&fec->rbd_base[i];
> >>>> +			for (; i<= fec->rbd_index + size; i++) {
> >> 
> >> This flushes too many descriptors! This should be:
> >> 			for (; i<= fec->rbd_index; i++) {
> > 
> > Agreed
> 
> V3 patch forthcoming.
> 
> >>> Uh, this was tough.
> >> 
> >> How bad do we want cache?
> > 
> > We're almost there, why do you ask? :-)
> 
> I was just bein' snarky...
> 
> I'm making a couple of other small changes in V3:
> 
> - 	change fec_rbd_clean to only have a single
> 	call to write() and make it clearer that there's
> 	only one additional flag iff 'last'
> -	use comparison to supply 'last' parameter in
> 	the call to fec_rbd_clean
> 
> I think each of these makes the intent clearer.
> 
> diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c
> index d5d0d5e..f8691d4 100644
> --- a/drivers/net/fec_mxc.c
> +++ b/drivers/net/fec_mxc.c
> @@ -355,16 +355,10 @@ static void fec_tbd_init(struct fec_priv *fec)
>    */
>   static void fec_rbd_clean(int last, struct fec_bd *pRbd)
>   {
> -       /*
> -        * Reset buffer descriptor as empty
> -        */
> +       unsigned short flags = FEC_RBD_EMPTY;
>          if (last)
> -               writew(FEC_RBD_WRAP | FEC_RBD_EMPTY, &pRbd->status);
> -       else
> -               writew(FEC_RBD_EMPTY, &pRbd->status);
> -       /*
> -        * no data in it
> -        */
> +               flags |= FEC_RBD_WRAP;
> +       writew(flags, &pRbd->status);
>          writew(0, &pRbd->data_length);
>   }
> 
> @@ -880,10 +874,8 @@ static int fec_recv(struct eth_device *dev)
>                          i = fec->rbd_index - size;
>                          addr = (uint32_t)&fec->rbd_base[i];
>                          for (; i <= fec->rbd_index ; i++) {
> -                               if (i == (FEC_RBD_NUM - 1))
> -                                       fec_rbd_clean(1,
> &fec->rbd_base[i]); -                               else
> -                                       fec_rbd_clean(0,
> &fec->rbd_base[i]); +                               fec_rbd_clean(i ==
> (FEC_RBD_NUM - 1), +                                            
> &fec->rbd_base[i]);
>                          }
>                          flush_dcache_range(addr,
>                                  addr + CONFIG_FEC_ALIGN);

Looking forward to V3!


More information about the U-Boot mailing list