[U-Boot] [PATCH V2] net: fec_mxc: allow use with cache enabled

Tue Mar 6 20:45:58 CET 2012

Dear Eric Nelson,

> On 03/05/2012 01:06 PM, Marek Vasut wrote:
> > Dear Eric Nelson,
> > 
> >> 	ensure that transmit and receive buffers are cache-line aligned
> >> 	
> >>          invalidate cache after each packet received
> >>          flush cache before transmitting
> >> 	
> >> 	Original patch by Marek:
> >> 		http://lists.denx.de/pipermail/u-boot/2012-February/117695.html
> > 
> > Would be cool to Cc me :-p
> 
> Sorry about that.

It's ok, don't worry about it ;-)

[...]

> > 
> > I think this "writel()" call is bogus and should be removed in subsequent
> > patch and replaced with simple assignment. It was here probably due to
> > cache issues on PPC?
> 
> The RBD has me puzzled. Do we treat them like registers and use
> readx/writex or like in-RAM data structures...

I'd go for the in-RAM data structures, but such conversion should happen in a 
separate patch only AFTER the cache support is in.

[...]

> >> +		if (!fec->rbd_base) {
> >> +			ret = -ENOMEM;
> >> +			goto err2;
> >> +		}
> >> +		memset(fec->rbd_base, 0, size);
> >> +	}
> > 
> > We might want to flush the descriptors to memory after they have been
> > inited?
> 
> Again, good catch.
> 
> On this topic (initialization of RBD), I had a bit of a concern
> regarding the initialization of things.
> 
> In fec_open, the receive buffer descriptors are initialized and the
> last one set is to 'wrap'. If this code were to execute when the
> controller is live, bad things would surely happen.
> 
> I traced through all of the paths I can see, and I believe that
> we're safe. It appears that fec_halt() will be called prior to
> any call to fec_init() and fec_open().

Yes, this will only happen if something went wrong.

> 
> In fec_open() a number of calls to fec_rbd_clean() are made and
> a single flush_dcache() is made afterwards.
> 
> While this works and causes less thrashing than per-RBD flushes,
> I think the code would be simpler if fec_rbd_clean just did the
> flush itself. This would also remove the need for a separate
> flush in fec_recv.

Not really, rbd might be (and likely is) smaller than cache line, therefore you 
won't be able to flush single rbd, unless it's cacheline aligned, which wastes 
space.

[...]

> >> +	invalidate_dcache_range(addr, addr + size);
> >> +
> > 
> > The idea here is the following (demo uses 32byte long cachelines):
> > 
> > [DESC1][DESC2][DESC3][DESC4][DESC5][DESC6]
> > `------- cacheline --------'
> > 
> > We want to start retrieving contents of DESC3, therefore "addr" points to
> > DESC1, "size" is size of cacheline (I hope there's no hardware with 8
> > byte cachelines, but this should be ok here).
> > 
> > NOTE[5]: Here we can invalidate the whole cacheline, because the
> > descriptors so far were modified only be hardware, not by us. We are not
> > writing anything there so we won't loose any information.
> > 
> > NOTE[6]: This invalidation ensures that we always have a fresh copy of
> > the cacheline containing all the descriptors, therefore we always have a
> > fresh status of the descriptors we are about to pick. Since this is a
> > sequential execution, the cache eviction should not kick in here (or
> > might it?).
> 
> Another way to look at this is this:
> 
> 	After fec_open(), the hardware owns the rbd, and all we should do
> 	is read it. In order to make sure we don't have a stale copy, we
> 	need to call invalidate() before looking at the values.
> 
> Tracing the code to find out whether this is true, the only write I see
> is within fec_recv() when the last descriptor is full, when the driver
> takes ownership of **all** of the descriptors, calling fec_rbd_clean()
> on each.
> 
> The only thing that looks funky is this:
> 
> 		size = (CONFIG_FEC_ALIGN / sizeof(struct fec_bd)) - 1;
> 		if ((fec->rbd_index & size) == size) {
> 
> Wouldn't a test of rbd_index against FEC_RBD_NUM be more appropriate?
> i.e.
> 		if (fec->rbd_index == FEC_RBD_NUM-1) {

I believe the FEC doesn't always start from rbd_index == 0, and if you were to 
receive more than 64 rbds between open() and close(), this implementation works, 
your would fail.

> 
> >>   	bd_status = readw(&rbd->status);
> >>   	debug("fec_recv: status 0x%x\n", bd_status);
> >> 
> >> @@ -751,6 +851,13 @@ static int fec_recv(struct eth_device *dev)
> >> 
> >>   			frame = (struct nbuf *)readl(&rbd->data_pointer);
> >>   			frame_length = readw(&rbd->data_length) - 4;
> > 
> > NOTE[7]: We received a frame, we know how long it is. The frame is loaded
> > into the rbd->data_pointer, which IS CACHE ALIGNED.
> > 
> > I detect a problem here that frame_length might overflow, but that's not
> > related to caches and might be just false accusation.
> > 
> >>   			/*
> >> 
> >> +			 * Invalidate data cache over the buffer
> >> +			 */
> >> +			addr = (uint32_t)frame;
> >> +			size = roundup(frame_length, CONFIG_FEC_DATA_ALIGNMENT);
> >> +			invalidate_dcache_range(addr, addr + size);
> > 
> > NOTE[8]: The roundup here is a valid operation, we can flush up to the
> > size of rbd->data_pointer, which is cache aligned and considering
> > frame_length is less or equal to the memory allocated for
> > rbd->data_pointer, the invalidation of cache here IS SAFE.
> > 
> >> +
> >> +			/*
> >> 
> >>   			 *  Fill the buffer and pass it to upper layers
> >>   			 */
> >>   
> >>   #ifdef	CONFIG_FEC_MXC_SWAP_PACKET
> >> 
> >> @@ -765,11 +872,27 @@ static int fec_recv(struct eth_device *dev)
> >> 
> >>   						(ulong)rbd->data_pointer,
> >>   						bd_status);
> >>   		
> >>   		}
> >> 
> >> +
> >> 
> >>   		/*
> >> 
> >> -		 * free the current buffer, restart the engine
> >> -		 * and move forward to the next buffer
> >> +		 * Free the current buffer, restart the engine and move forward
> >> +		 * to the next buffer. Here we check if the whole cacheline of
> >> +		 * descriptors was already processed and if so, we mark it free
> >> +		 * as whole.
> >> 
> >>   		 */
> > 
> > NOTE[9]: This is the most problematic part, handling the marking of RX
> > descriptors with a flag that they are ready again. Obviously, directly
> > writing to the desciptor won't work, because:
> > 
> > 1) There are multiple descriptors on a cacheline, therefore by writing
> > one, we'd need to flush the whole cacheline back into DRAM immediatelly
> > so the hardware knows about it. But that'd mean we can loose some
> > information from the hardware, refer to demo before NOTE[5]:
> > 
> > We modify DESC2 and hardware is concurently changing DESC3. We flush
> > DESC2 and the whole cacheline, we loose part of DESC3.
> > 
> > 2) Cache eviction algorithm might flush data for us, therefore causing
> > loss of information as well.
> > 
> > The solution is the following:
> > 
> > 1) Compute how many descriptors are per-cache line
> > 2) Make sure FEC_RBD_NUM * sizeof(struct fec_bd) is at least 2 *
> > CONFIG_FEC_DATA_ALIGNMENT in size, see NOTE[11].
> > 3) Once the last RX buffer in the cacheline is processed, mark them all
> > clean and flush them all, see NOTE[10].
> > 
> > NOTE[10]: This is legal, because the hardware won't use RX descriptors
> > that it marked dirty (which means not picked up by software yet). We
> > clean the desciptors in an order the hardware would pick them up again
> > so there's no problem with race condition either. The only possible
> > issue here is if there was hardware with cacheline size smaller than
> > descriptor size (we should add a check for this at the begining of the
> > file).
> > 
> > NOTE[11]: This is because we want the FEC to overwrite descriptors below
> > the other cacheline while we're marking the one containing retrieved
> > descriptors clean.
> 
> Ahah! Now I see what the size calculation is doing.
> 
> A well-named constant, maybe "RXDESC_PER_CACHELINE" would be useful here.

Yes

> 
> 	#define RXDESC_PER_CACHELINE	(CONFIG_FEC_ALIGN/sizeof(struct fec_bd))
> 
> >> -		fec_rbd_clean(fec->rbd_index == (FEC_RBD_NUM - 1) ? 1 : 0, rbd);
> >> +		size = (CONFIG_FEC_DATA_ALIGNMENT / sizeof(struct fec_bd)) - 1;
> 
> 		size = RXDESC_PER_CACHELINE-1;
> 
> >> +		if ((fec->rbd_index&  size) == size) {
> 
> The line above only works if RXDESC_PER_CACHELINE is a multiple of 2, which
> is likely to work because sizeof(struct fec_bd) == 8.

Adding such a comment (and maybe CPP check) won't hurt.
> 
> >> +			i = fec->rbd_index - size;
> >> +			addr = (uint32_t)&fec->rbd_base[i];
> >> +			for (; i<= fec->rbd_index + size; i++) {
> 
> This flushes too many descriptors! This should be:
> 			for (; i<= fec->rbd_index; i++) {

Agreed

> 
> >> +				if (i == (FEC_RBD_NUM - 1))
> >> +					fec_rbd_clean(1,&fec->rbd_base[i]);
> >> +				else
> >> +					fec_rbd_clean(0,&fec->rbd_base[i]);
> >> +			}
> >> +			flush_dcache_range(addr,
> >> +				addr + CONFIG_FEC_DATA_ALIGNMENT);
> >> +		}
> >> +
> >> 
> >>   		fec_rx_task_enable(fec);
> >>   		fec->rbd_index = (fec->rbd_index + 1) % FEC_RBD_NUM;
> >>   	
> >>   	}
> >> 
> >> diff --git a/drivers/net/fec_mxc.h b/drivers/net/fec_mxc.h
> >> index 2eb7803..852b2e0 100644
> >> --- a/drivers/net/fec_mxc.h
> >> +++ b/drivers/net/fec_mxc.h
> >> @@ -234,22 +234,6 @@ struct ethernet_regs {
> >> 
> >>   #endif
> >>   
> >>   /**
> >> 
> >> - * @brief Descriptor buffer alignment
> >> - *
> >> - * i.MX27 requires a 16 byte alignment (but for the first element only)
> >> - */
> >> -#define DB_ALIGNMENT		16
> >> -
> >> -/**
> >> - * @brief Data buffer alignment
> >> - *
> >> - * i.MX27 requires a four byte alignment for transmit and 16 bits
> >> - * alignment for receive so take 16
> >> - * Note: Valid for member data_pointer in struct buffer_descriptor
> >> - */
> >> -#define DB_DATA_ALIGNMENT	16
> >> -
> >> -/**
> >> 
> >>    * @brief Receive&  Transmit Buffer Descriptor definitions
> >>    *
> >>    * Note: The first BD must be aligned (see DB_ALIGNMENT)
> >> 
> >> @@ -282,8 +266,7 @@ struct fec_priv {
> >> 
> >>   	struct fec_bd *tbd_base;	/* TBD ring */
> >>   	int tbd_index;			/* next transmit BD to write */
> >>   	bd_t *bd;
> >> 
> >> -	void *rdb_ptr;
> >> -	void *base_ptr;
> >> +	uint8_t *tdb_ptr;
> >> 
> >>   	int dev_id;
> >>   	int phy_id;
> >>   	struct mii_dev *bus;
> > 
> > Uh, this was tough.
> 
> How bad do we want cache?

We're almost there, why do you ask? :-)