[PATCH v2] pci: Work around PCIe link training failures

Stefan Roese sr at denx.de
Tue Nov 16 13:44:30 CET 2021


Hi Maciej,

On 11/16/21 12:35, Maciej W. Rozycki wrote:
> Attempt to handle cases with a downstream port of a PCIe switch where
> link training never completes and the link continues switching between
> speeds indefinitely with the data link layer never reaching the active
> state.
> 
> It has been observed with a downstream port of the ASMedia ASM2824 Gen 3
> switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2
> switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device,
> P/N 41433, wired to a SiFive HiFive Unmatched board.  In this setup the
> switches are supposed to negotiate the link speed of preferably 5.0GT/s,
> falling back to 2.5GT/s.
> 
> However the link continues oscillating between the two speeds, at the
> rate of 34-35 times per second, with link training reported active ~84%
> of the time repeatedly, e.g.:

Interesting topic and approach. Please find a few questions / comments
below.

> 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
> [...]
> 	Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
> [...]
> 	Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
> [...]
> 		LnkSta:	Speed 5GT/s (downgraded), Width x1 (ok)
> 			TrErr- Train+ SlotClk+ DLActive- BWMgmt+ ABWMgmt-
> [...]
> 		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> [...]
> 
> Forcibly limiting the target link speed to 2.5GT/s with the upstream
> ASM2824 device makes the two switches communicate correctly however:
> 
> 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
> [...]
> 	Bus: primary=02, secondary=05, subordinate=09, sec-latency=0
> [...]
> 	Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
> [...]
> 		LnkSta:	Speed 2.5GT/s (downgraded), Width x1 (ok)
> 			TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
> [...]
> 		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> [...]
> 
> and then:
> 
> 05:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch [12d8:2304] (rev 05) (prog-if 00 [Normal decode])
> [...]
> 	Bus: primary=05, secondary=06, subordinate=09, sec-latency=0
> [...]
> 	Capabilities: [c0] Express (v2) Upstream Port, MSI 00
> [...]
> 		LnkSta:	Speed 2.5GT/s (downgraded), Width x1 (downgraded)
> 			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> [...]
> 		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> [...]
> 
> Make use of this observation then and attempt to detect the inability to
> negotiate the link speed automatically, and then handle it by hand.  Use
> the Data Link Layer Link Active status flag as the primary indicator of
> successful link speed negotiation, but given that the flag is optional
> by hardware to implement (the ASM2824 does have it though), resort to
> checking for the mandatory Link Bandwidth Management Status flag showing
> that the link speed or width has been changed in an attempt to correct
> unreliable link operation (the ASM2824 does set it too).
> 
> If these checks indicate that link may not operate correctly, then poll
> the Data Link Layer Link Active status flag along with the Link Training
> flag for the duration of 200ms to see if the link has stabilised, that
> is either that the Data Link Layer Link Active status flag has been set
> or that Link Training has been inactive during at least the second half
> of the inteval.
> 
> If that has indicated failure, reduce the target speed, request a link
> retrain and check again if the link has stabilised.  Repeat until either
> successful or the link speeds supported by the downstream port have been
> exhausted.

So in such cases, the link speed will be downgraded? I would expect at
least a big warning in such cases.

Did you try to change some other configuration options for the link
establishment? I remember that on some hardware we were able to get
better "link-up results" by setting the de-emphasis level to -3.5dB
instead of -6dB (in the link control status register 2), before trying
to re-estblish the link. Did you also test with "tuning" such
parameters. There might be other, which I'm missing right now.

Thanks,
Stefan

> Signed-off-by: Maciej W. Rozycki <macro at orcam.me.uk>
> ---
> Hi,
> 
>   Thank you, Pali, for your valuable feedback.  Here's v2 of the change
> with your suggestions taken into account.  I believe I have addressed all
> your concerns and I haven't heard from anyone else, so I consider this
> version final unless someone speaks out.
> 
>   Please apply then.
> 
>    Maciej
> 
> Changes from v1:
> 
> - also handle root complex and PCI/PCI-X to PCIe bridge devices,
> 
> - update comments throughout accordingly,
> 
> - likewise the change description; mention the rate of oscillations
>    observed with the ASM2824 device.
> ---
>   drivers/pci/pci_auto.c |  157 +++++++++++++++++++++++++++++++++++++++++++++++++
>   include/pci.h          |   18 +++++
>   2 files changed, 174 insertions(+), 1 deletion(-)
> 
> u-boot-pcie-manual-retrain.diff
> Index: u-boot/drivers/pci/pci_auto.c
> ===================================================================
> --- u-boot.orig/drivers/pci/pci_auto.c
> +++ u-boot/drivers/pci/pci_auto.c
> @@ -5,6 +5,7 @@
>    * Author: Matt Porter <mporter at mvista.com>
>    *
>    * Copyright 2000 MontaVista Software Inc.
> + * Copyright (c) 2021  Maciej W. Rozycki <macro at orcam.me.uk>
>    */
>   
>   #include <common.h>
> @@ -12,6 +13,7 @@
>   #include <errno.h>
>   #include <log.h>
>   #include <pci.h>
> +#include <time.h>
>   #include "pci_internal.h"
>   
>   /* the user can define CONFIG_SYS_PCI_CACHE_LINE_SIZE to avoid problems */
> @@ -180,6 +182,155 @@ static void dm_pciauto_setup_device(stru
>   	dm_pci_write_config8(dev, PCI_LATENCY_TIMER, 0x80);
>   }
>   
> +/*
> + * Check if the link of a downstream PCIe port operates correctly.
> + *
> + * For that check if the optional Data Link Layer Link Active status gets
> + * on within a 200ms period or failing that wait until the completion of
> + * that period and check if link training has shown the completed status
> + * continuously throughout the second half of that period.
> + *
> + * Observation with the ASMedia ASM2824 Gen 3 switch indicates it takes
> + * 11-44ms to indicate the Data Link Layer Link Active status at 2.5GT/s,
> + * though it may take a couple of link training iterations.
> + */
> +static bool dm_pciauto_exp_link_stable(struct udevice *dev, int pcie_off)
> +{
> +	u64 loops = 0, trcount = 0, ntrcount = 0, flips = 0;
> +	bool dllla, lnktr, plnktr;
> +	u16 exp_lnksta;
> +	pci_dev_t bdf;
> +	u64 end;
> +
> +	dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKSTA, &exp_lnksta);
> +	plnktr = !!(exp_lnksta & PCI_EXP_LNKSTA_LNKTR);
> +
> +	end = get_ticks() + usec_to_tick(200000);
> +	do {
> +		dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKSTA,
> +				     &exp_lnksta);
> +		dllla = !!(exp_lnksta & PCI_EXP_LNKSTA_DLLLA);
> +		lnktr = !!(exp_lnksta & PCI_EXP_LNKSTA_LNKTR);
> +
> +		flips += plnktr ^ lnktr;
> +		if (lnktr) {
> +			ntrcount = 0;
> +			trcount++;
> +		} else {
> +			ntrcount++;
> +		}
> +		loops++;
> +
> +		plnktr = lnktr;
> +	} while (!dllla && get_ticks() < end);
> +
> +	bdf = dm_pci_get_bdf(dev);
> +	debug("PCI Autoconfig: %02x.%02x.%02x: Fixup link: DL active: %u; "
> +	      "%3llu flips, %6llu loops of which %6llu while training, "
> +	      "final %6llu stable\n",
> +	      PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf),
> +	      (unsigned int)dllla,
> +	      (unsigned long long)flips, (unsigned long long)loops,
> +	      (unsigned long long)trcount, (unsigned long long)ntrcount);
> +
> +	return dllla || ntrcount >= loops / 2;
> +}
> +
> +/*
> + * Retrain the link of a downstream PCIe port by hand if necessary.
> + *
> + * This is needed at least where a downstream port of the ASMedia ASM2824
> + * Gen 3 switch is wired to the upstream port of the Pericom PI7C9X2G304
> + * Gen 2 switch, and observed with the Delock Riser Card PCI Express x1 >
> + * 2 x PCIe x1 device, P/N 41433, plugged into the SiFive HiFive Unmatched
> + * board.
> + *
> + * In such a configuration the switches are supposed to negotiate the link
> + * speed of preferably 5.0GT/s, falling back to 2.5GT/s.  However the link
> + * continues switching between the two speeds indefinitely and the data
> + * link layer never reaches the active state, with link training reported
> + * active ~84% of the time repeatedly.  Forcing the target link speed to
> + * 2.5GT/s with the upstream ASM2824 device makes the two switches talk to
> + * each other correctly however.
> + *
> + * As this can potentially happen with any device and is cheap in the case
> + * of correctly operating hardware, let's do it for all downstream ports,
> + * for root complexes, PCIe switches and PCI/PCI-X to PCIe bridges, and if
> + * automatic link training may have failed to complete, as indicated by
> + * the optional Data Link Layer Link Active status being off and the Link
> + * Bandwidth Management Status showing that hardware has changed the link
> + * speed or width in an attempt to correct unreliable link operation, then
> + * try lower and lower speeds one by one until one has succeeded or we
> + * have run out of speeds.
> + *
> + * This requires the presence of the Link Control 2 register, so make sure
> + * the PCI Express Capability Version is at least 2.  Also don't try, for
> + * obvious reasons, to limit the speed if 2.5GT/s is the only link speed
> + * supported.
> + */
> +static void dm_pciauto_exp_fixup_link(struct udevice *dev, int pcie_off)
> +{
> +	u16 exp_lnksta, exp_lnkctl, exp_lnkctl2;
> +	u16 exp_flags, exp_type, exp_version;
> +	u32 exp_lnkcap, exp_lnkcap2;
> +	unsigned int speed;
> +	pci_dev_t bdf;
> +
> +	dm_pci_read_config16(dev, pcie_off + PCI_EXP_FLAGS, &exp_flags);
> +	exp_version = exp_flags & PCI_EXP_FLAGS_VERSION;
> +	if (exp_version < 2)
> +		return;
> +	exp_type = (exp_flags & PCI_EXP_FLAGS_TYPE) >> 4;
> +	switch (exp_type) {
> +	case PCI_EXP_TYPE_ROOT_PORT:
> +	case PCI_EXP_TYPE_DOWN_PORT:
> +	case PCI_EXP_TYPE_PCIE_BRDG:
> +		break;
> +	default:
> +		return;
> +	}
> +
> +	dm_pci_read_config32(dev, pcie_off + PCI_EXP_LNKCAP, &exp_lnkcap);
> +	if ((exp_lnkcap & PCI_EXP_LNKCAP_SLS) <= PCI_EXP_LNKCAP_SLS_2_5GB)
> +		return;
> +
> +	dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKSTA, &exp_lnksta);
> +	if ((exp_lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) !=
> +	    PCI_EXP_LNKSTA_LBMS)
> +		return;
> +
> +	if (dm_pciauto_exp_link_stable(dev, pcie_off))
> +		return;
> +
> +	bdf = dm_pci_get_bdf(dev);
> +	debug("PCI Autoconfig: %02x.%02x.%02x: Downgrading speed by hand\n",
> +	      PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf));
> +
> +	dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKCTL, &exp_lnkctl);
> +	dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKCTL2, &exp_lnkctl2);
> +	exp_lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
> +
> +	dm_pci_read_config32(dev, pcie_off + PCI_EXP_LNKCAP, &exp_lnkcap);
> +	dm_pci_read_config32(dev, pcie_off + PCI_EXP_LNKCAP2, &exp_lnkcap2);
> +	for (speed = (exp_lnkcap & PCI_EXP_LNKCAP_SLS) - 1;
> +	     speed >= PCI_EXP_LNKCAP_SLS_2_5GB;
> +	     speed--) {
> +		if (exp_lnkcap2 && (exp_lnkcap2 & 1 << speed) == 0)
> +			continue;
> +
> +		debug("PCI Autoconfig: %02x.%02x.%02x: Trying speed: %u\n",
> +		      PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf), speed);
> +
> +		dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL2,
> +				      exp_lnkctl2 | speed);
> +		dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL,
> +				      exp_lnkctl | PCI_EXP_LNKCTL_RETRLNK);
> +
> +		if (dm_pciauto_exp_link_stable(dev, pcie_off))
> +			return;
> +	}
> +}
> +
>   void dm_pciauto_prescan_setup_bridge(struct udevice *dev, int sub_bus)
>   {
>   	struct pci_region *pci_mem;
> @@ -189,6 +340,7 @@ void dm_pciauto_prescan_setup_bridge(str
>   	u8 io_32;
>   	struct udevice *ctlr = pci_get_controller(dev);
>   	struct pci_controller *ctlr_hose = dev_get_uclass_priv(ctlr);
> +	int pcie_off;
>   
>   	pci_mem = ctlr_hose->pci_mem;
>   	pci_prefetch = ctlr_hose->pci_prefetch;
> @@ -267,6 +419,11 @@ void dm_pciauto_prescan_setup_bridge(str
>   		cmdstat |= PCI_COMMAND_IO;
>   	}
>   
> +	/* For PCIe devices see if we need to retrain the link by hand */
> +	pcie_off = dm_pci_find_capability(dev, PCI_CAP_ID_EXP);
> +	if (pcie_off)
> +		dm_pciauto_exp_fixup_link(dev, pcie_off);
> +
>   	/* Enable memory and I/O accesses, enable bus master */
>   	dm_pci_write_config16(dev, PCI_COMMAND, cmdstat | PCI_COMMAND_MASTER);
>   }
> Index: u-boot/include/pci.h
> ===================================================================
> --- u-boot.orig/include/pci.h
> +++ u-boot/include/pci.h
> @@ -5,6 +5,7 @@
>    *
>    * (C) Copyright 2002
>    * Wolfgang Denk, DENX Software Engineering, wd at denx.de.
> + * Copyright (c) 2021  Maciej W. Rozycki <macro at orcam.me.uk>
>    */
>   
>   #ifndef _PCI_H
> @@ -475,16 +476,24 @@
>   
>   /* PCI Express capabilities */
>   #define PCI_EXP_FLAGS		2	/* Capabilities register */
> +#define  PCI_EXP_FLAGS_VERSION	0x000f	/* Capability Version */
>   #define  PCI_EXP_FLAGS_TYPE	0x00f0	/* Device/Port type */
>   #define  PCI_EXP_TYPE_ROOT_PORT 0x4	/* Root Port */
> +#define  PCI_EXP_TYPE_DOWN_PORT 0x6	/* Downstream Port */
> +#define  PCI_EXP_TYPE_PCIE_BRDG 0x8	/* PCI/PCI-X to PCI Express Bridge */
>   #define PCI_EXP_DEVCAP		4	/* Device capabilities */
>   #define  PCI_EXP_DEVCAP_FLR	0x10000000 /* Function Level Reset */
>   #define PCI_EXP_DEVCTL		8	/* Device Control */
>   #define  PCI_EXP_DEVCTL_BCR_FLR	0x8000  /* Bridge Configuration Retry / FLR */
>   #define PCI_EXP_LNKCAP		12	/* Link Capabilities */
>   #define  PCI_EXP_LNKCAP_SLS	0x0000000f /* Supported Link Speeds */
> +#define  PCI_EXP_LNKCAP_SLS_2_5GB 0x0001 /* Supported Link Speed 2.5GT/s */
> +#define  PCI_EXP_LNKCAP_SLS_5_0GB 0x0002 /* Supported Link Speed 5.0GT/s */
> +#define  PCI_EXP_LNKCAP_SLS_8_0GB 0x0003 /* Supported Link Speed 8.0GT/s */
>   #define  PCI_EXP_LNKCAP_MLW	0x000003f0 /* Maximum Link Width */
>   #define  PCI_EXP_LNKCAP_DLLLARC	0x00100000 /* Data Link Layer Link Active Reporting Capable */
> +#define PCI_EXP_LNKCTL		16	/* Link Control */
> +#define  PCI_EXP_LNKCTL_RETRLNK	0x0020	/* Retrain Link */
>   #define PCI_EXP_LNKSTA		18	/* Link Status */
>   #define  PCI_EXP_LNKSTA_CLS	0x000f	/* Current Link Speed */
>   #define  PCI_EXP_LNKSTA_CLS_2_5GB 0x0001 /* Current Link Speed 2.5GT/s */
> @@ -492,7 +501,9 @@
>   #define  PCI_EXP_LNKSTA_CLS_8_0GB 0x0003 /* Current Link Speed 8.0GT/s */
>   #define  PCI_EXP_LNKSTA_NLW	0x03f0	/* Negotiated Link Width */
>   #define  PCI_EXP_LNKSTA_NLW_SHIFT 4	/* start of NLW mask in link status */
> +#define  PCI_EXP_LNKSTA_LNKTR	0x0800	/* Link Training */
>   #define  PCI_EXP_LNKSTA_DLLLA	0x2000	/* Data Link Layer Link Active */
> +#define  PCI_EXP_LNKSTA_LBMS	0x4000	/* Link Bandwidth Management Status */
>   #define PCI_EXP_SLTCAP		20	/* Slot Capabilities */
>   #define  PCI_EXP_SLTCAP_PSN	0xfff80000 /* Physical Slot Number */
>   #define PCI_EXP_RTCTL		28	/* Root Control */
> @@ -503,8 +514,13 @@
>   #define  PCI_EXP_DEVCAP2_ARI	0x00000020 /* ARI Forwarding Supported */
>   #define PCI_EXP_DEVCTL2		40	/* Device Control 2 */
>   #define  PCI_EXP_DEVCTL2_ARI	0x0020 /* Alternative Routing-ID */
> -
> +#define PCI_EXP_LNKCAP2		44	/* Link Capability 2 */
> +#define  PCI_EXP_LNKCAP2_SLSV	0x000000fe /* Supported Link Speeds Vector */
>   #define PCI_EXP_LNKCTL2		48	/* Link Control 2 */
> +#define  PCI_EXP_LNKCTL2_TLS	0x000f	/* Target Link Speed */
> +#define  PCI_EXP_LNKSTA_TLS_2_5GB 0x0001 /* Target Link Speed 2.5GT/s */
> +#define  PCI_EXP_LNKSTA_TLS_5_0GB 0x0002 /* Target Link Speed 5.0GT/s */
> +#define  PCI_EXP_LNKSTA_TLS_8_0GB 0x0003 /* Target Link Speed 8.0GT/s */
>   /* Single Root I/O Virtualization Registers */
>   #define PCI_SRIOV_CAP		0x04	/* SR-IOV Capabilities */
>   #define PCI_SRIOV_CTRL		0x08	/* SR-IOV Control */
> 

Viele Grüße,
Stefan Roese

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-51 Fax: (+49)-8142-66989-80 Email: sr at denx.de


More information about the U-Boot mailing list