[RFC PATCH 2/2] net: Add Support for GENET Ethernet controller

Florian Fainelli f.fainelli at gmail.com
Sat Dec 14 00:53:24 CET 2019



On 12/13/2019 8:50 AM, Andre Przywara wrote:
>> +/* total number of Buffer Descriptors, same for Rx/Tx */
>> +#define TOTAL_DESC			 256
>> +
>> +#define DEFAULT_Q                        0x10
> 
> I think that deserves a comment.
> Maybe Florian can comment on what this odd "queue 16" is really about? It seems weird to support *17* queues and make the last one the default.

In the mode we operate (descriptor mode) this is the background/default
queue, I have no other explanation to provide than that, sorry if this
is a bit "compact". You could bring up the other priority queues, but
that seems pointless in u-boot.

> 
>> +
>> +/* Body(1500) + EH_SIZE(14) + VLANTAG(4) + BRCMTAG(6) + FCS(4) = 1528.
>> + * 1536 is multiple of 256 bytes
>> + */
>> +#define ENET_BRCM_TAG_LEN		 6
>> +#define ENET_PAD			 8
>> +#define ENET_MAX_MTU_SIZE		 (ETH_DATA_LEN + ETH_HLEN + VLAN_HLEN + \
>> +					  ENET_BRCM_TAG_LEN + ETH_FCS_LEN + ENET_PAD)
>> +
>> +/* Tx/Rx Dma Descriptor common bits*/
>> +#define DMA_EN                           BIT(0)
>> +#define DMA_RING_BUF_EN_SHIFT            0x01
>> +#define DMA_RING_BUF_EN_MASK             0xFFFF
>> +#define DMA_BUFLENGTH_MASK		 0x0fff
>> +#define DMA_BUFLENGTH_SHIFT		 16
>> +#define DMA_RING_SIZE_SHIFT		 16
>> +#define DMA_OWN				 0x8000
>> +#define DMA_EOP				 0x4000
>> +#define DMA_SOP				 0x2000
>> +#define DMA_WRAP			 0x1000
>> +#define DMA_MAX_BURST_LENGTH             0x8
> 
> I think it deserves mentioning that this is the RPi4 specific value, all the other GENET hardware can use 0x10 here. It looks to me like 0x8 is probably safe for all to use, it would just be less efficient. But I guess we don't care in U-Boot.

Correct.

> 
>> +/* Tx specific Dma descriptor bits */
>> +#define DMA_TX_UNDERRUN			 0x0200
>> +#define DMA_TX_APPEND_CRC		 0x0040
>> +#define DMA_TX_OW_CRC			 0x0020
>> +#define DMA_TX_DO_CSUM			 0x0010
>> +#define DMA_TX_QTAG_SHIFT		 7
>> +#define GENET_TDMA_REG_OFF		 (0x4000 + \
>> +					  TOTAL_DESC * DMA_DESC_SIZE)
>> +#define GENET_RDMA_REG_OFF		 (0x2000 + \
>> +					  TOTAL_DESC * DMA_DESC_SIZE)
>> +
>> +/* DMA rings size */
>> +#define DMA_RING_SIZE			 (0x40)
>> +#define DMA_RINGS_SIZE			 (DMA_RING_SIZE * (DEFAULT_Q + 1))
>> +
>> +/* DMA Descriptor */
>> +#define DMA_DESC_LENGTH_STATUS		 0x00
>> +#define DMA_DESC_ADDRESS_LO		 0x04
>> +#define DMA_DESC_ADDRESS_HI		 0x08
>> +#define DMA_DESC_SIZE                    0xc
>> +
>> +#define DMA_FC_THRESH_HI		 (TOTAL_DESC >> 4)
>> +#define DMA_FC_THRESH_LO		 5
>> +#define DMA_XOFF_THRESHOLD_SHIFT	 16
>> +
>> +#define TDMA_RING_REG_BASE(QUEUE_NUMBER) (GENET_TDMA_REG_OFF \
>> +			   + (DMA_RING_SIZE * (QUEUE_NUMBER)))
> 
> So it seems like we are *always* using the default queue number. For improved readability we could drop this parameter and hardcode this here, so that users just need to use TDMA_RING_REG_BASE.

Agreed. If you were to use the other TX or RX queues, you would have to
incur additional configuration that is not worth the trouble here.

[snip]

>> +static void reset_umac(struct bcmgenet_eth_priv *priv)
> 
> What is the difference to the function above? The name is confusingly similar.

And so is the version in the kernel arguably. Note that Doug recently
made a fair amount of changes to how GENET is reset, you have some of
those changes with the loopback enabling, but please check you have the
latest programming sequence since this is a source of hard to debug
problems with the UniMAC receiver stuck/replicating packets incorrectly
if not properly reset.

[snip]

>> +static int _bcmgenet_gmac_eth_send(struct bcmgenet_eth_priv *priv, void *packet,
>> +				   int len)
>> +{
>> +	u32 size, len_stat, prod_index, cons;
>> +	u32 tries = 100;
>> +
>> +	void *desc_base = (priv->tx_desc_base +	(priv->tx_index * DMA_DESC_SIZE));
>> +
>> +	len_stat = (len << DMA_BUFLENGTH_SHIFT) | (0x3F << DMA_TX_QTAG_SHIFT);
>> +	len_stat |= (DMA_TX_APPEND_CRC | DMA_SOP | DMA_EOP);
>> +
>> +	prod_index = readl(priv->mac_reg + TDMA_RING_REG_BASE(DEFAULT_Q) + TDMA_PROD_INDEX);
>> +
>> +	size = roundup(len, ARCH_DMA_MINALIGN);
> 
> I think more important than aligning the size is aligning the base.
> Not that it really matters architecturally ...

Agreed, and if your flush_dcache_range() is not able to figure out
whether it needs to span adjacent cachelines, it is mildy broken at that
point.

> 
>> +	flush_dcache_range((ulong)packet, (ulong)packet + size);
>> +
>> +	/* Set-up packet for transmission */
>> +	writel(lower_32_bits((ulong)packet), (desc_base + DMA_DESC_ADDRESS_LO));
>> +	writel(upper_32_bits((ulong)packet), (desc_base + DMA_DESC_ADDRESS_HI));
>> +	writel(len_stat, (desc_base + DMA_DESC_LENGTH_STATUS));
>> +
>> +	/* Increment index and wrap-up */
>> +	priv->tx_index++;
>> +	if (!(priv->tx_index % TOTAL_DESC)) {
>> +		priv->tx_index = 0;
> 
> This looks weird to read. I guess you just want to wrap around?
> Maybe:
> 	if (++priv->tx_index >= TOTAL_DESC)
> 		....
> 
>> +	}
>> +
>> +	prod_index++;
>> +
>> +	/* Start Transmisson */
>> +	writel(prod_index, (priv->mac_reg + TDMA_RING_REG_BASE(DEFAULT_Q) + TDMA_PROD_INDEX));
>> +
>> +	do {
>> +		cons = readl(priv->mac_reg + TDMA_RING_REG_BASE(DEFAULT_Q) + TDMA_CONS_INDEX);
>> +	} while ((cons & 0xffff) < prod_index && --tries);

Could add an equivalent of cpu_relax() from Linux and/or a timeout to
make sure you don't wait forever on a misconfigured platform.

>> +	if (!tries)
>> +		return -ETIMEDOUT;
>> +
>> +	return 0;
>> +}
>> +
>> +static int _bcmgenet_gmac_eth_recv(struct bcmgenet_eth_priv *priv, uchar **packetp)
>> +{
>> +	u32 len_stat;
>> +	u32 len;
>> +	u32 addr;
>> +	u32 length = 0;
>> +	void *desc_base = (priv->rx_desc_base + (priv->rx_index * DMA_DESC_SIZE));
>> +
>> +	len_stat = readl(desc_base + DMA_DESC_LENGTH_STATUS);
>> +
>> +	if (!(len_stat & DMA_OWN)) {
> 
> I think it would be cleaner to negate this and just bail out early.
> But also: Is this bit safe to use? Does the MAC update this?

This is not really safe, you should be comparing the producer/consumer
indices from the last run to figure out if there is something for you to
receive.

> 
>> +		len  = readl(desc_base + DMA_DESC_LENGTH_STATUS);
>> +		addr = readl(desc_base + DMA_DESC_ADDRESS_LO);
>> +
>> +		length = (len >> DMA_BUFLENGTH_SHIFT) & DMA_BUFLENGTH_MASK;
>> +
>> +		invalidate_dcache_range((uintptr_t)addr,
>> +					(addr + RX_TOTAL_BUFSIZE));
> 
> There is no need to invalidate *all* buffers, so just use RX_BUF_LENGTH.
> Actually it looks like we might need lose data (our indexes!), since you invalidate cache lines beyond the buffers.

Yes indeed.

> 
>> +
>> +		/*
>> +		 * two dummy bytes are added for IP alignment, this can be
>> +		 * avoided by not programming RBUF_ALIGN_2B bit in RBUF_CTRL
>> +		 */
>> +		*packetp = (uchar *)(ulong)addr + RX_BUF_OFFSET;
>> +
>> +		return (length - RX_BUF_OFFSET);
>> +	}
>> +
>> +	return -EAGAIN;
>> +}
>> +
>> +static int _bcmgenet_free_pkt(struct bcmgenet_eth_priv *priv, int len)
>> +{
>> +	priv->c_index = (priv->c_index + 1) & 0xFFFF;
>> +	writel(priv->c_index,
>> +	       priv->mac_reg + RDMA_RING_REG_BASE(DEFAULT_Q) + RDMA_CONS_INDEX);
>> +	udelay(1000);
> 
> What is this udelay about?
> 
>> +
>> +	priv->rx_index++;
>> +	if (!(priv->rx_index % TOTAL_DESC)) {
>> +		priv->rx_index = 0;
>> +	}
> 
> Same comment for the wrap-around as above.
> 
>> +
>> +	return 0;
>> +}
>> +
>> +static void rx_descs_init(struct bcmgenet_eth_priv *priv)
>> +{
>> +	char *rxbuffs = &priv->rxbuffer[0];
>> +	u32 len_stat, i;
>> +	void *desc_base = priv->rx_desc_base;
>> +
>> +	priv->c_index = 0;
>> +
>> +	len_stat = ((RX_BUF_LENGTH << DMA_BUFLENGTH_SHIFT) | DMA_OWN);
> 
> Is setting DMA_OWN necessary? I had the impression this bit was read-only?

It is not necessary, you should not set it, the hardware in that version
of the block would not process it.

[snip]

>> +
>> +static void bcmgenet_adjust_link(struct bcmgenet_eth_priv *priv)
>> +{
>> +	struct phy_device *phy_dev = priv->phydev;
>> +	u32 reg = 0, reg_rgmii;
>> +
>> +	switch (phy_dev->speed) {
>> +	case SPEED_1000:
>> +		reg = UMAC_SPEED_1000;
>> +		break;
>> +	case SPEED_100:
>> +		reg = UMAC_SPEED_100;
>> +		break;
>> +	case SPEED_10:
>> +		reg = UMAC_SPEED_10;
>> +		break;
>> +	}
>> +
>> +	reg <<= CMD_SPEED_SHIFT;
>> +
>> +	reg_rgmii = readl(priv->mac_reg + EXT_RGMII_OOB_CTRL);
>> +	reg_rgmii &= ~OOB_DISABLE;
>> +	reg_rgmii |= (RGMII_LINK | RGMII_MODE_EN | ID_MODE_DIS);

You need to look at the PHY interface/mode here to sett the ID_MODE_DIS,
also even if there may not be any plans to use this driver beyond v5 and
the Pi4, we might as well make sure we are following what Linux does and
be a bit more specific to RGMII versus other modes in the programming.
In fact, it might just be safer for now to error out if the PHY mode is
different than any of the 4 RGMII variants so we don't attempt to bring
up something we have not added the programming for.

>> +	writel(reg_rgmii, priv->mac_reg + EXT_RGMII_OOB_CTRL);
>> +
>> +	writel(reg, (priv->mac_reg + UMAC_CMD));
> 
> Those parentheses around the second argument are redundant. Please remove them, also in a lot of other places.
> 
>> +}
>> +
>> +static int _bcmgenet_eth_init(struct bcmgenet_eth_priv *priv, u8 *enetaddr)
>> +{
>> +	u32 reg, dma_ctrl;
>> +
>> +	priv->tx_desc_base = priv->mac_reg + 0x4000;
>> +	priv->rx_desc_base = priv->mac_reg + 0x2000;

Can you use the definitions instead of open coding those?

[snip]

>> +static const struct udevice_id bcmgenet_eth_ids[] = {
>> +	{.compatible = "brcm,genet-v5"},
> 
> I think you need the new RPi4 specific compatible string here as well, to cope with mainline DTs.

Agreed.

> 
> That's it for my first pass. I will keep debugging this.

Thanks, let me know if you need help, you can also ping me on IRC
([florian] on freenode).
-- 
Florian


More information about the U-Boot mailing list