[U-Boot] [PATCH 2/4] net: fec_mxc: allow use with cache enabled

Marek Vasut marex at denx.de
Mon Mar 5 02:49:33 CET 2012


> On 03/02/2012 04:39 PM, Marek Vasut wrote:
> >> 	ensure that transmit and receive buffers are cache-line aligned
> >> 	
> >>          invalidate cache after each packet received
> >>          flush cache before transmitting
> >> 
> >> Signed-off-by: Eric Nelson<eric.nelson at boundarydevices.com>
> >> 
>  >> ---
>  >> 
>  >>   drivers/net/fec_mxc.c |  248
>  >> 
>  >> ++++++++++++++++++++++++++++++++++++------------- drivers/net/fec_mxc.h
>  >> |
>  >> 
>  >>   19 +----
>  >>   2 files changed, 184 insertions(+), 83 deletions(-)
>  >> 
>  >> diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c
>  >> index 1fdd071..f72304b 100644
>  >> --- a/drivers/net/fec_mxc.c
>  >> +++ b/drivers/net/fec_mxc.c
>  >> 
>  >> <snip>
>  >> 
> >>   	/*
> >>   	
> >>   	 * This routine transmits one frame.  This routine only accepts
> >> 
> >> @@ -650,15 +708,21 @@ static int fec_send(struct eth_device *dev,
> >> volatile void* packet, int length) }
> >> 
> >>   	/*
> >> 
> >> -	 * Setup the transmit buffer
> >> -	 * Note: We are always using the first buffer for transmission,
> >> -	 * the second will be empty and only used to stop the DMA engine
> >> +	 * Setup the transmit buffer. We are always using the first buffer for
> >> +	 * transmission, the second will be empty and only used to stop the
> >> DMA +	 * engine. We also flush the packet to RAM here to avoid cache
> >> trouble.
> >> 
> >>   	 */
> >>   
> >>   #ifdef	CONFIG_FEC_MXC_SWAP_PACKET
> >>   
> >>   	swap_packet((uint32_t *)packet, length);
> >>   
> >>   #endif
> >> 
> >> -	writew(length,&fec->tbd_base[fec->tbd_index].data_length);
> >> -	writel((uint32_t)packet,&fec->tbd_base[fec->tbd_index].data_pointer);
> >> +
> >> +	addr = (uint32_t)packet;
> >> +	size = roundup(length, CONFIG_FEC_DATA_ALIGNMENT);
> >> +	flush_dcache_range(addr, addr + size);
> > 
> > What if size of the packet isn't aligned on cacheline boundary? Won't it
> > choke here?
> 
> Here's a quandary...
> 
> flush_dcache_range() can potentially force writes into unintended areas,
> which could conceivably include areas owned by the ethernet receiver if
> the source object weren't aligned to a cache-line boundary and size.
> 
> In practice, it appears that net/net.c only transmits from one of two
> places:

You can also invalidate memory and loose some information. Though I'm not quite 
sure this driver can be the case.

> 	- a dedicated transmit buffer (NetTxPacket), which is guaranteed
> 	to be aligned to PKTALIGN (32).
> 	- a receive buffer (ICMP and ARP replies)
> 
> The networking API certainly allows for transmits from arbitrary
> areas in memory, but I'm not seeing where or how this can be invoked.

OK
> 
> Because there's no way to query how the memory for a packet is
> allocated, the only way around this I can see is to memcpy to
> a dedicated transmit buffer within fec_mxc.c, which would defeat
> any benefit of cache.

Indeed ... a bounce buffer. But such a bounce buffer should be implemented close 
to the place where the misalignment originates. If the net core can align 
packets everywhere well, we're safe and happy.
> 
> AFAICT, all of the other calls to 'flush_dcache_range()' are okay because
> they're dealing with buffers allocated by the driver.
> 

I'd love to see someone else review this. This is very important to be made 
right.

> >> +
> >> +	fec->tbd_base[fec->tbd_index].data_length = length;
> >> +	fec->tbd_base[fec->tbd_index].data_pointer = addr;
> >> +
> >> 
> >>   	/*
> >>   	
> >>   	 * update BD's status now
> >> 
> >>   	 * This block:
> >> @@ -667,9 +731,18 @@ static int fec_send(struct eth_device *dev,
> >> volatile void* packet, int length) * - might be the last BD in the
> >> list, so the address counter should *   wrap (->  keep the WRAP flag)
> >> 
> >>   	 */
> >> 
> >> -	status = readw(&fec->tbd_base[fec->tbd_index].status)&  FEC_TBD_WRAP;
> >> +	status = fec->tbd_base[fec->tbd_index].status&  FEC_TBD_WRAP;
> >> 
> >>   	status |= FEC_TBD_LAST | FEC_TBD_TC | FEC_TBD_READY;
> >> 
> >> -	writew(status,&fec->tbd_base[fec->tbd_index].status);
> >> +	fec->tbd_base[fec->tbd_index].status = status;
> >> +
> >> +	/*
> >> +	 * Flush data cache. This code flushes both TX descriptors to RAM.
> >> +	 * After this code this code, the descritors will be safely in RAM
> >> +	 * and we can start DMA.
> >> +	 */
> >> +	size = roundup(2 * sizeof(struct fec_bd), CONFIG_FEC_DATA_ALIGNMENT);
> >> +	addr = (uint32_t)fec->tbd_base;
> >> +	flush_dcache_range(addr, addr + size);
> > 
> > Same concern here and everywhere else ... I believe aligning it like this
> > is quite unsafe :-(
> 
> This one looks safe because you've controlled the allocation of tbd_base.
> 
> >> <snip>
> >> 
> >> @@ -751,6 +846,13 @@ static int fec_recv(struct eth_device *dev)
> >> 
> >>   			frame = (struct nbuf *)readl(&rbd->data_pointer);
> >>   			frame_length = readw(&rbd->data_length) - 4;
> >>   			/*
> >> 
> >> +			 * Invalidate data cache over the buffer
> >> +			 */
> >> +			addr = (uint32_t)frame;
> >> +			size = roundup(frame_length, CONFIG_FEC_DATA_ALIGNMENT);
> >> +			invalidate_dcache_range(addr, addr + size);
> > 
> > DTTO here, frame length might not be aligned properly, or will it be?
> > Network stack must be properly analyzed here.
> 
> The hardware won't return an unaligned value here, so this should be good.

Are you sure? You can't receive frame aligned to 8 bytes boundary?
> 
>  >> <snip>
> >> 
> >> @@ -765,11 +867,27 @@ static int fec_recv(struct eth_device *dev)
> >> 
> >>   						(ulong)rbd->data_pointer,
> >>   						bd_status);
> >>   		
> >>   		}
> >> 
> >> +
> >> 
> >>   		/*
> >> 
> >> -		 * free the current buffer, restart the engine
> >> -		 * and move forward to the next buffer
> >> +		 * Free the current buffer, restart the engine and move forward
> >> +		 * to the next buffer. Here we check if the whole cacheline of
> >> +		 * descriptors was already processed and if so, we mark it free
> >> +		 * as whole.
> >> 
> >>   		 */
> >> 
> >> -		fec_rbd_clean(fec->rbd_index == (FEC_RBD_NUM - 1) ? 1 : 0, rbd);
> >> +		size = (CONFIG_FEC_DATA_ALIGNMENT / sizeof(struct fec_bd)) - 1;
> >> +		if ((fec->rbd_index&  size) == size) {
> >> +			i = fec->rbd_index - size;
> >> +			addr = (uint32_t)&fec->rbd_base[i];
> >> +			for (; i<= fec->rbd_index + size; i++) {
> >> +				if (i == (FEC_RBD_NUM - 1))
> >> +					fec_rbd_clean(1,&fec->rbd_base[i]);
> >> +				else
> >> +					fec_rbd_clean(0,&fec->rbd_base[i]);
> >> +			}
> >> +			flush_dcache_range(addr,
> >> +				addr + CONFIG_FEC_DATA_ALIGNMENT);
> >> +		}
> >> +
> > 
> > This was the worst part. I don't quite remember how and why I did those
> > alignment decisions (well it's obvious here, you flush rxdesc once whole
> > cacheline is done), but some of the pieces are far less obvious now that
> > I read the code.
> 
> I'm not grokking this one either. Definitely needs fresher eyes than I have
> at the moment.

Indeed, I'm thinking the same thing. Let's sleep on this!

Thanks for your good work on putting this into shape!

M


More information about the U-Boot mailing list