[RFC PATCH 2/2] net: Add Support for GENET Ethernet controller

Andre Przywara andre.przywara at arm.com
Mon Dec 16 11:21:33 CET 2019


On Fri, 13 Dec 2019 15:53:24 -0800
Florian Fainelli <f.fainelli at gmail.com> wrote:

Hi Florian,

many thanks for your reply!

> On 12/13/2019 8:50 AM, Andre Przywara wrote:
> >> +/* total number of Buffer Descriptors, same for Rx/Tx */
> >> +#define TOTAL_DESC			 256
> >> +
> >> +#define DEFAULT_Q                        0x10  
> > 
> > I think that deserves a comment.
> > Maybe Florian can comment on what this odd "queue 16" is really about? It seems weird to support *17* queues and make the last one the default.  
> 
> In the mode we operate (descriptor mode) this is the background/default
> queue, I have no other explanation to provide than that, sorry if this
> is a bit "compact".

No worries, I just found that odd. I can live with queue 16 ;-)

> You could bring up the other priority queues, but
> that seems pointless in u-boot.

Exactly. Actually sending in U-Boot is completely synchronous: send() is only supposed to return when the packet has been sent. So for the TX side we would just need one descriptor, and priority support would be totally pointless. For RX not even Linux seems to use the priority queues, I guess that would require to program filters in the MAC for assigning packets to queues?

[ ... ]

> >> +static void reset_umac(struct bcmgenet_eth_priv *priv)  
> > 
> > What is the difference to the function above? The name is confusingly similar.  
> 
> And so is the version in the kernel arguably. Note that Doug recently
> made a fair amount of changes to how GENET is reset, you have some of
> those changes with the loopback enabling, but please check you have the
> latest programming sequence since this is a source of hard to debug
> problems with the UniMAC receiver stuck/replicating packets incorrectly
> if not properly reset.

Ah, thanks for the heads up, we will have a look.

> >> +static int _bcmgenet_gmac_eth_send(struct bcmgenet_eth_priv *priv, void *packet,
> >> +				   int len)
> >> +{
> >> +	u32 size, len_stat, prod_index, cons;
> >> +	u32 tries = 100;
> >> +
> >> +	void *desc_base = (priv->tx_desc_base +	(priv->tx_index * DMA_DESC_SIZE));
> >> +
> >> +	len_stat = (len << DMA_BUFLENGTH_SHIFT) | (0x3F << DMA_TX_QTAG_SHIFT);
> >> +	len_stat |= (DMA_TX_APPEND_CRC | DMA_SOP | DMA_EOP);
> >> +
> >> +	prod_index = readl(priv->mac_reg + TDMA_RING_REG_BASE(DEFAULT_Q) + TDMA_PROD_INDEX);
> >> +
> >> +	size = roundup(len, ARCH_DMA_MINALIGN);  
> > 
> > I think more important than aligning the size is aligning the base.
> > Not that it really matters architecturally ...  
> 
> Agreed, and if your flush_dcache_range() is not able to figure out
> whether it needs to span adjacent cachelines, it is mildy broken at that
> point.

It actually does - at least for arm64. But this whole cache maintenance handling in U-Boot is a bit of a mystery to me still, I think that stems from different implementations by various architectures and revisions.

> >> +	flush_dcache_range((ulong)packet, (ulong)packet + size);
> >> +
> >> +	/* Set-up packet for transmission */
> >> +	writel(lower_32_bits((ulong)packet), (desc_base + DMA_DESC_ADDRESS_LO));
> >> +	writel(upper_32_bits((ulong)packet), (desc_base + DMA_DESC_ADDRESS_HI));
> >> +	writel(len_stat, (desc_base + DMA_DESC_LENGTH_STATUS));
> >> +
> >> +	/* Increment index and wrap-up */
> >> +	priv->tx_index++;
> >> +	if (!(priv->tx_index % TOTAL_DESC)) {
> >> +		priv->tx_index = 0;  
> > 
> > This looks weird to read. I guess you just want to wrap around?
> > Maybe:
> > 	if (++priv->tx_index >= TOTAL_DESC)
> > 		....
> >   
> >> +	}
> >> +
> >> +	prod_index++;
> >> +
> >> +	/* Start Transmisson */
> >> +	writel(prod_index, (priv->mac_reg + TDMA_RING_REG_BASE(DEFAULT_Q) + TDMA_PROD_INDEX));
> >> +
> >> +	do {
> >> +		cons = readl(priv->mac_reg + TDMA_RING_REG_BASE(DEFAULT_Q) + TDMA_CONS_INDEX);
> >> +	} while ((cons & 0xffff) < prod_index && --tries);  
> 
> Could add an equivalent of cpu_relax() from Linux and/or a timeout to
> make sure you don't wait forever on a misconfigured platform.

Well, U-Boot is single-threaded and synchronous anyway, so the send() function is supposed to wait until the packet has been sent out, then return. So any yielding would be pointless. And for the timeout there is "tries" variable, which should prevent an endless loop?

> 
> >> +	if (!tries)
> >> +		return -ETIMEDOUT;
> >> +
> >> +	return 0;
> >> +}
> >> +
> >> +static int _bcmgenet_gmac_eth_recv(struct bcmgenet_eth_priv *priv, uchar **packetp)
> >> +{
> >> +	u32 len_stat;
> >> +	u32 len;
> >> +	u32 addr;
> >> +	u32 length = 0;
> >> +	void *desc_base = (priv->rx_desc_base + (priv->rx_index * DMA_DESC_SIZE));
> >> +
> >> +	len_stat = readl(desc_base + DMA_DESC_LENGTH_STATUS);
> >> +
> >> +	if (!(len_stat & DMA_OWN)) {  
> > 
> > I think it would be cleaner to negate this and just bail out early.
> > But also: Is this bit safe to use? Does the MAC update this?  
> 
> This is not really safe, you should be comparing the producer/consumer
> indices from the last run to figure out if there is something for you to
> receive.

OK, we changed this now. Unfortunately this didn't fix the problem.

> >> +		len  = readl(desc_base + DMA_DESC_LENGTH_STATUS);
> >> +		addr = readl(desc_base + DMA_DESC_ADDRESS_LO);
> >> +
> >> +		length = (len >> DMA_BUFLENGTH_SHIFT) & DMA_BUFLENGTH_MASK;
> >> +
> >> +		invalidate_dcache_range((uintptr_t)addr,
> >> +					(addr + RX_TOTAL_BUFSIZE));  
> > 
> > There is no need to invalidate *all* buffers, so just use RX_BUF_LENGTH.
> > Actually it looks like we might need lose data (our indexes!), since you invalidate cache lines beyond the buffers.  
> 
> Yes indeed.
> 
> >   
> >> +
> >> +		/*
> >> +		 * two dummy bytes are added for IP alignment, this can be
> >> +		 * avoided by not programming RBUF_ALIGN_2B bit in RBUF_CTRL
> >> +		 */
> >> +		*packetp = (uchar *)(ulong)addr + RX_BUF_OFFSET;
> >> +
> >> +		return (length - RX_BUF_OFFSET);
> >> +	}
> >> +
> >> +	return -EAGAIN;
> >> +}
> >> +
> >> +static int _bcmgenet_free_pkt(struct bcmgenet_eth_priv *priv, int len)
> >> +{
> >> +	priv->c_index = (priv->c_index + 1) & 0xFFFF;
> >> +	writel(priv->c_index,
> >> +	       priv->mac_reg + RDMA_RING_REG_BASE(DEFAULT_Q) + RDMA_CONS_INDEX);
> >> +	udelay(1000);  
> > 
> > What is this udelay about?
> >   
> >> +
> >> +	priv->rx_index++;
> >> +	if (!(priv->rx_index % TOTAL_DESC)) {
> >> +		priv->rx_index = 0;
> >> +	}  
> > 
> > Same comment for the wrap-around as above.
> >   
> >> +
> >> +	return 0;
> >> +}
> >> +
> >> +static void rx_descs_init(struct bcmgenet_eth_priv *priv)
> >> +{
> >> +	char *rxbuffs = &priv->rxbuffer[0];
> >> +	u32 len_stat, i;
> >> +	void *desc_base = priv->rx_desc_base;
> >> +
> >> +	priv->c_index = 0;
> >> +
> >> +	len_stat = ((RX_BUF_LENGTH << DMA_BUFLENGTH_SHIFT) | DMA_OWN);  
> > 
> > Is setting DMA_OWN necessary? I had the impression this bit was read-only?  
> 
> It is not necessary, you should not set it, the hardware in that version
> of the block would not process it.

Ah, thanks for the confirmation.

> >> +
> >> +static void bcmgenet_adjust_link(struct bcmgenet_eth_priv *priv)
> >> +{
> >> +	struct phy_device *phy_dev = priv->phydev;
> >> +	u32 reg = 0, reg_rgmii;
> >> +
> >> +	switch (phy_dev->speed) {
> >> +	case SPEED_1000:
> >> +		reg = UMAC_SPEED_1000;
> >> +		break;
> >> +	case SPEED_100:
> >> +		reg = UMAC_SPEED_100;
> >> +		break;
> >> +	case SPEED_10:
> >> +		reg = UMAC_SPEED_10;
> >> +		break;
> >> +	}
> >> +
> >> +	reg <<= CMD_SPEED_SHIFT;
> >> +
> >> +	reg_rgmii = readl(priv->mac_reg + EXT_RGMII_OOB_CTRL);
> >> +	reg_rgmii &= ~OOB_DISABLE;
> >> +	reg_rgmii |= (RGMII_LINK | RGMII_MODE_EN | ID_MODE_DIS);  
> 
> You need to look at the PHY interface/mode here to sett the ID_MODE_DIS,
> also even if there may not be any plans to use this driver beyond v5 and
> the Pi4, we might as well make sure we are following what Linux does and
> be a bit more specific to RGMII versus other modes in the programming.
> In fact, it might just be safer for now to error out if the PHY mode is
> different than any of the 4 RGMII variants so we don't attempt to bring
> up something we have not added the programming for.

Yes, good point. I think for now just returning a proper error would be enough. For all the years nobody apparently needed GENET support in U-Boot, and without test hardware and a use case I would not bother supporting anything before v5.

> >> +	writel(reg_rgmii, priv->mac_reg + EXT_RGMII_OOB_CTRL);
> >> +
> >> +	writel(reg, (priv->mac_reg + UMAC_CMD));  
> > 
> > Those parentheses around the second argument are redundant. Please remove them, also in a lot of other places.
> >   
> >> +}
> >> +
> >> +static int _bcmgenet_eth_init(struct bcmgenet_eth_priv *priv, u8 *enetaddr)
> >> +{
> >> +	u32 reg, dma_ctrl;
> >> +
> >> +	priv->tx_desc_base = priv->mac_reg + 0x4000;
> >> +	priv->rx_desc_base = priv->mac_reg + 0x2000;  
> 
> Can you use the definitions instead of open coding those?
> 
> [snip]
> 
> >> +static const struct udevice_id bcmgenet_eth_ids[] = {
> >> +	{.compatible = "brcm,genet-v5"},  
> > 
> > I think you need the new RPi4 specific compatible string here as well, to cope with mainline DTs.  
> 
> Agreed.
> 
> > 
> > That's it for my first pass. I will keep debugging this.  
> 
> Thanks, let me know if you need help, you can also ping me on IRC
> ([florian] on freenode).

Many thanks!
Amit found a bug with programming the queue parameters in [rt]x_ring_init, which takes the pointers in number of (4-byte) words, not in bytes. And apparently this works for him now, but I couldn't verify this yet.

Will keep you posted!

Cheers,
Andre.


More information about the U-Boot mailing list