[PATCH v3] pci: Work around PCIe link training failures

Stefan Roese sr at denx.de
Thu Jan 13 13:57:42 CET 2022


On 11/21/21 00:03, Maciej W. Rozycki wrote:
> Attempt to handle cases with a downstream port of a PCIe switch where
> link training never completes and the link continues switching between
> speeds indefinitely with the data link layer never reaching the active
> state.
> 
> It has been observed with a downstream port of the ASMedia ASM2824 Gen 3
> switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2
> switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device,
> P/N 41433, wired to a SiFive HiFive Unmatched board.  In this setup the
> switches are supposed to negotiate the link speed of preferably 5.0GT/s,
> falling back to 2.5GT/s.
> 
> However the link continues oscillating between the two speeds, at the
> rate of 34-35 times per second, with link training reported repeatedly
> active ~84% of the time, e.g.:
> 
> 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
> [...]
> 	Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
> [...]
> 	Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
> [...]
> 		LnkSta:	Speed 5GT/s (downgraded), Width x1 (ok)
> 			TrErr- Train+ SlotClk+ DLActive- BWMgmt+ ABWMgmt-
> [...]
> 		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> [...]
> 
> Forcibly limiting the target link speed to 2.5GT/s with the upstream
> ASM2824 device makes the two switches communicate correctly however:
> 
> 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
> [...]
> 	Bus: primary=02, secondary=05, subordinate=09, sec-latency=0
> [...]
> 	Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
> [...]
> 		LnkSta:	Speed 2.5GT/s (downgraded), Width x1 (ok)
> 			TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
> [...]
> 		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> [...]
> 
> and then:
> 
> 05:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch [12d8:2304] (rev 05) (prog-if 00 [Normal decode])
> [...]
> 	Bus: primary=05, secondary=06, subordinate=09, sec-latency=0
> [...]
> 	Capabilities: [c0] Express (v2) Upstream Port, MSI 00
> [...]
> 		LnkSta:	Speed 2.5GT/s (downgraded), Width x1 (downgraded)
> 			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> [...]
> 		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> [...]
> 
> Make use of this observation then and attempt to detect the inability to
> negotiate the link speed automatically, and then handle it by hand.  Use
> the Data Link Layer Link Active status flag as the primary indicator of
> successful link speed negotiation, but given that the flag is optional
> by hardware to implement (the ASM2824 does have it though), resort to
> checking for the mandatory Link Bandwidth Management Status flag showing
> that the link speed or width has been changed in an attempt to correct
> unreliable link operation (the ASM2824 does set it too).
> 
> If these checks indicate that link may not operate correctly, then poll
> the Data Link Layer Link Active status flag along with the Link Training
> flag for the duration of 200ms to see if the link has stabilised, that
> is either that the Data Link Layer Link Active status flag has been set
> or that Link Training has been inactive during at least the second half
> of the interval.
> 
> If that has indicated failure, restrict the target speed to 2.5GT/s,
> request a link retrain and check again if the link has stabilised.  If
> that does not work either, then restore the original speed setting and
> claim defeat, otherwise we are done.
> 
> NB interestingly enough with the ASM2824 vs PI7C9X2G304 configuration
> referred above asking the ASM2824 to retrain with a higher target link
> speed once the 2.5GT/s speed has been negotiated makes the two devices
> successfully negotiate 5.0GT/s.  Lifting the 2.5GT/s speed restriction
> would however prevent our workaround from working with an OS that issues
> a reset and that is unaware of the problem.  This is because the devices
> would then try to negotiate a higher link speed from scratch and fail,
> while the sticky property of the Target Link Speed setting will keep the
> 2.5GT/s speed restriction across a reset.
> 
> Keep the 2.5GT/s speed restriction then, conservatively, if functional
> once applied.
> 
> Signed-off-by: Maciej W. Rozycki <macro at orcam.me.uk>
> ---
> Hi,
> 
>   I believe this version has addressed all concerns raised in the review
> thus far.  With the nature of a problem better understood now I'm sending
> a corresponding update for Linux as well.
> 
>   One case that has been nurturing me however is the reverse scenario, that
> is where the Pericom PI7C9X2G304 switch is upstream while the ASMedia
> ASM2824 switch is downstream.  Presumably the same situation will happen,
> so it's clear that this generic approach I chose for U-Boot is going to
> handle both cases.  For Linux, where I chose to match on the ASM2824 ID it
> would be a problem, though our conservative approach will make both cases
> work in the OS.
> 
>   Option hardware with M.2 slots is commercially available with the ASM2824
> onboard, so a test environment can be in principle arranged, though I'm
> not sure if just for the sake of such an experiment I'm willing to spend
> money that will ultimately go to a manufacturer that cannot be bothered to
> take responsibility for their faults and at the very least respond to a
> problem report.
> 
>   Thank you, Stefan, in particular for mentioning the de-emphasis setting;
> while not directly relevant it has prompted me to further experiment with
> the hardware involved and consequently discover that the 5GT/s speed does
> actually work with these devices if arranged in a particular if not
> peculiar way.
> 
>   Further questions, comments or concerns?  Otherwise please apply.
> 
>    Maciej
> 
> Changes from v2:
> 
> - only restrict the link speed to 2.5GT/s and keep the restriction; do
>    not try other speeds; update the comments and change description
>    accordingly,
> 
> - print informational messages where a problematic link has been found,
> 
> - keep the names of macros in include/pci.h consistent with Linux (NB it
>    may make sense to synchronise the file on a routine basis, just as many
>    free software projects do that reuse code maintained in other projects),
> 
> - rephrase comments and the change description here and there for style
>    and clarity.
> 
> Changes from v1:
> 
> - also handle root complex and PCI/PCI-X to PCIe bridge devices,
> 
> - update comments throughout accordingly,
> 
> - likewise the change description; mention the rate of oscillations
>    observed with the ASM2824 device.
> ---
>   drivers/pci/pci_auto.c |  170 +++++++++++++++++++++++++++++++++++++++++++++++++
>   include/pci.h          |   21 +++++-
>   2 files changed, 189 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Roese <sr at denx.de>

Thanks,
Stefan

> 
> u-boot-pcie-manual-retrain.diff
> Index: u-boot/drivers/pci/pci_auto.c
> ===================================================================
> --- u-boot.orig/drivers/pci/pci_auto.c
> +++ u-boot/drivers/pci/pci_auto.c
> @@ -5,6 +5,7 @@
>    * Author: Matt Porter <mporter at mvista.com>
>    *
>    * Copyright 2000 MontaVista Software Inc.
> + * Copyright (c) 2021  Maciej W. Rozycki <macro at orcam.me.uk>
>    */
>   
>   #include <common.h>
> @@ -12,6 +13,7 @@
>   #include <errno.h>
>   #include <log.h>
>   #include <pci.h>
> +#include <time.h>
>   #include "pci_internal.h"
>   
>   /* the user can define CONFIG_SYS_PCI_CACHE_LINE_SIZE to avoid problems */
> @@ -180,6 +182,168 @@ static void dm_pciauto_setup_device(stru
>   	dm_pci_write_config8(dev, PCI_LATENCY_TIMER, 0x80);
>   }
>   
> +/*
> + * Check if the link of a downstream PCIe port operates correctly.
> + *
> + * For that check if the optional Data Link Layer Link Active status gets
> + * on within a 200ms period or failing that wait until the completion of
> + * that period and check if link training has shown the completed status
> + * continuously throughout the second half of that period.
> + *
> + * Observation with the ASMedia ASM2824 Gen 3 switch indicates it takes
> + * 11-44ms to indicate the Data Link Layer Link Active status at 2.5GT/s,
> + * though it may take a couple of link training iterations.
> + */
> +static bool dm_pciauto_exp_link_stable(struct udevice *dev, int pcie_off)
> +{
> +	u64 loops = 0, trcount = 0, ntrcount = 0, flips = 0;
> +	bool dllla, lnktr, plnktr;
> +	u16 exp_lnksta;
> +	pci_dev_t bdf;
> +	u64 end;
> +
> +	dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKSTA, &exp_lnksta);
> +	plnktr = !!(exp_lnksta & PCI_EXP_LNKSTA_LT);
> +
> +	end = get_ticks() + usec_to_tick(200000);
> +	do {
> +		dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKSTA,
> +				     &exp_lnksta);
> +		dllla = !!(exp_lnksta & PCI_EXP_LNKSTA_DLLLA);
> +		lnktr = !!(exp_lnksta & PCI_EXP_LNKSTA_LT);
> +
> +		flips += plnktr ^ lnktr;
> +		if (lnktr) {
> +			ntrcount = 0;
> +			trcount++;
> +		} else {
> +			ntrcount++;
> +		}
> +		loops++;
> +
> +		plnktr = lnktr;
> +	} while (!dllla && get_ticks() < end);
> +
> +	bdf = dm_pci_get_bdf(dev);
> +	debug("PCI Autoconfig: %02x.%02x.%02x: Fixup link: DL active: %u; "
> +	      "%3llu flips, %6llu loops of which %6llu while training, "
> +	      "final %6llu stable\n",
> +	      PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf),
> +	      (unsigned int)dllla,
> +	      (unsigned long long)flips, (unsigned long long)loops,
> +	      (unsigned long long)trcount, (unsigned long long)ntrcount);
> +
> +	return dllla || ntrcount >= loops / 2;
> +}
> +
> +/*
> + * Retrain the link of a downstream PCIe port by hand if necessary.
> + *
> + * This is needed at least where a downstream port of the ASMedia ASM2824
> + * Gen 3 switch is wired to the upstream port of the Pericom PI7C9X2G304
> + * Gen 2 switch, and observed with the Delock Riser Card PCI Express x1 >
> + * 2 x PCIe x1 device, P/N 41433, plugged into the SiFive HiFive Unmatched
> + * board.
> + *
> + * In such a configuration the switches are supposed to negotiate the link
> + * speed of preferably 5.0GT/s, falling back to 2.5GT/s.  However the link
> + * continues switching between the two speeds indefinitely and the data
> + * link layer never reaches the active state, with link training reported
> + * repeatedly active ~84% of the time.  Forcing the target link speed to
> + * 2.5GT/s with the upstream ASM2824 device makes the two switches talk to
> + * each other correctly however.  And more interestingly retraining with a
> + * higher target link speed afterwards lets the two successfully negotiate
> + * 5.0GT/s.
> + *
> + * As this can potentially happen with any device and is cheap in the case
> + * of correctly operating hardware, let's do it for all downstream ports,
> + * for root complexes, PCIe switches and PCI/PCI-X to PCIe bridges.
> + *
> + * First check if automatic link training may have failed to complete, as
> + * indicated by the optional Data Link Layer Link Active status being off
> + * and the Link Bandwidth Management Status indicating that hardware has
> + * changed the link speed or width in an attempt to correct unreliable
> + * link operation.  If this is the case, then check if the link operates
> + * correctly by seeing whether it is being trained excessively.  If it is,
> + * then conclude the link is broken.
> + *
> + * In that case restrict the speed to 2.5GT/s, observing that the Target
> + * Link Speed field is sticky and therefore the link will stay restricted
> + * even after a device reset is later made by an OS that is unaware of the
> + * problem.  With the speed restricted request that the link be retrained
> + * and check again if the link operates correctly.  If not, then set the
> + * Target Link Speed back to the original value.
> + *
> + * This requires the presence of the Link Control 2 register, so make sure
> + * the PCI Express Capability Version is at least 2.  Also don't try, for
> + * obvious reasons, to limit the speed if 2.5GT/s is the only link speed
> + * supported.
> + */
> +static void dm_pciauto_exp_fixup_link(struct udevice *dev, int pcie_off)
> +{
> +	u16 exp_lnksta, exp_lnkctl, exp_lnkctl2;
> +	u16 exp_flags, exp_type, exp_version;
> +	u32 exp_lnkcap;
> +	pci_dev_t bdf;
> +
> +	dm_pci_read_config16(dev, pcie_off + PCI_EXP_FLAGS, &exp_flags);
> +	exp_version = exp_flags & PCI_EXP_FLAGS_VERS;
> +	if (exp_version < 2)
> +		return;
> +
> +	exp_type = (exp_flags & PCI_EXP_FLAGS_TYPE) >> 4;
> +	switch (exp_type) {
> +	case PCI_EXP_TYPE_ROOT_PORT:
> +	case PCI_EXP_TYPE_DOWNSTREAM:
> +	case PCI_EXP_TYPE_PCIE_BRIDGE:
> +		break;
> +	default:
> +		return;
> +	}
> +
> +	dm_pci_read_config32(dev, pcie_off + PCI_EXP_LNKCAP, &exp_lnkcap);
> +	if ((exp_lnkcap & PCI_EXP_LNKCAP_SLS) <= PCI_EXP_LNKCAP_SLS_2_5GB)
> +		return;
> +
> +	dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKSTA, &exp_lnksta);
> +	if ((exp_lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) !=
> +	    PCI_EXP_LNKSTA_LBMS)
> +		return;
> +
> +	if (dm_pciauto_exp_link_stable(dev, pcie_off))
> +		return;
> +
> +	bdf = dm_pci_get_bdf(dev);
> +	printf("PCI Autoconfig: %02x.%02x.%02x: "
> +	       "Downstream link non-functional\n",
> +	       PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf));
> +	printf("PCI Autoconfig: %02x.%02x.%02x: "
> +	       "Retrying with speed restricted to 2.5GT/s...\n",
> +	       PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf));
> +
> +	dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKCTL, &exp_lnkctl);
> +	dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKCTL2, &exp_lnkctl2);
> +
> +	dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL2,
> +			      (exp_lnkctl2 & ~PCI_EXP_LNKCTL2_TLS) |
> +			      PCI_EXP_LNKCTL2_TLS_2_5GT);
> +	dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL,
> +			      exp_lnkctl | PCI_EXP_LNKCTL_RL);
> +
> +	if (dm_pciauto_exp_link_stable(dev, pcie_off)) {
> +		printf("PCI Autoconfig: %02x.%02x.%02x: Succeeded!\n",
> +		       PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf));
> +	} else {
> +		printf("PCI Autoconfig: %02x.%02x.%02x: Failed!\n",
> +		       PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf));
> +
> +		dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL2,
> +				      exp_lnkctl2);
> +		dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL,
> +				      exp_lnkctl | PCI_EXP_LNKCTL_RL);
> +	}
> +}
> +
>   void dm_pciauto_prescan_setup_bridge(struct udevice *dev, int sub_bus)
>   {
>   	struct pci_region *pci_mem;
> @@ -189,6 +353,7 @@ void dm_pciauto_prescan_setup_bridge(str
>   	u8 io_32;
>   	struct udevice *ctlr = pci_get_controller(dev);
>   	struct pci_controller *ctlr_hose = dev_get_uclass_priv(ctlr);
> +	int pcie_off;
>   
>   	pci_mem = ctlr_hose->pci_mem;
>   	pci_prefetch = ctlr_hose->pci_prefetch;
> @@ -267,6 +432,11 @@ void dm_pciauto_prescan_setup_bridge(str
>   		cmdstat |= PCI_COMMAND_IO;
>   	}
>   
> +	/* For PCIe devices see if we need to retrain the link by hand */
> +	pcie_off = dm_pci_find_capability(dev, PCI_CAP_ID_EXP);
> +	if (pcie_off)
> +		dm_pciauto_exp_fixup_link(dev, pcie_off);
> +
>   	/* Enable memory and I/O accesses, enable bus master */
>   	dm_pci_write_config16(dev, PCI_COMMAND, cmdstat | PCI_COMMAND_MASTER);
>   }
> Index: u-boot/include/pci.h
> ===================================================================
> --- u-boot.orig/include/pci.h
> +++ u-boot/include/pci.h
> @@ -5,6 +5,7 @@
>    *
>    * (C) Copyright 2002
>    * Wolfgang Denk, DENX Software Engineering, wd at denx.de.
> + * Copyright (c) 2021  Maciej W. Rozycki <macro at orcam.me.uk>
>    */
>   
>   #ifndef _PCI_H
> @@ -475,16 +476,24 @@
>   
>   /* PCI Express capabilities */
>   #define PCI_EXP_FLAGS		2	/* Capabilities register */
> +#define  PCI_EXP_FLAGS_VERS	0x000f	/* Capability Version */
>   #define  PCI_EXP_FLAGS_TYPE	0x00f0	/* Device/Port type */
> -#define  PCI_EXP_TYPE_ROOT_PORT 0x4	/* Root Port */
> +#define   PCI_EXP_TYPE_ROOT_PORT   0x4	/* Root Port */
> +#define   PCI_EXP_TYPE_DOWNSTREAM  0x6	/* Downstream Port */
> +#define   PCI_EXP_TYPE_PCIE_BRIDGE 0x8	/* PCI/PCI-X to PCIe Bridge */
>   #define PCI_EXP_DEVCAP		4	/* Device capabilities */
>   #define  PCI_EXP_DEVCAP_FLR	0x10000000 /* Function Level Reset */
>   #define PCI_EXP_DEVCTL		8	/* Device Control */
>   #define  PCI_EXP_DEVCTL_BCR_FLR	0x8000  /* Bridge Configuration Retry / FLR */
>   #define PCI_EXP_LNKCAP		12	/* Link Capabilities */
>   #define  PCI_EXP_LNKCAP_SLS	0x0000000f /* Supported Link Speeds */
> +#define  PCI_EXP_LNKCAP_SLS_2_5GB 0x00000001 /* LNKCAP2 SLS Vector bit 0 */
> +#define  PCI_EXP_LNKCAP_SLS_5_0GB 0x00000002 /* LNKCAP2 SLS Vector bit 1 */
> +#define  PCI_EXP_LNKCAP_SLS_8_0GB 0x00000003 /* LNKCAP2 SLS Vector bit 2 */
>   #define  PCI_EXP_LNKCAP_MLW	0x000003f0 /* Maximum Link Width */
>   #define  PCI_EXP_LNKCAP_DLLLARC	0x00100000 /* Data Link Layer Link Active Reporting Capable */
> +#define PCI_EXP_LNKCTL		16	/* Link Control */
> +#define  PCI_EXP_LNKCTL_RL	0x0020	/* Retrain Link */
>   #define PCI_EXP_LNKSTA		18	/* Link Status */
>   #define  PCI_EXP_LNKSTA_CLS	0x000f	/* Current Link Speed */
>   #define  PCI_EXP_LNKSTA_CLS_2_5GB 0x0001 /* Current Link Speed 2.5GT/s */
> @@ -492,7 +501,9 @@
>   #define  PCI_EXP_LNKSTA_CLS_8_0GB 0x0003 /* Current Link Speed 8.0GT/s */
>   #define  PCI_EXP_LNKSTA_NLW	0x03f0	/* Negotiated Link Width */
>   #define  PCI_EXP_LNKSTA_NLW_SHIFT 4	/* start of NLW mask in link status */
> +#define  PCI_EXP_LNKSTA_LT	0x0800	/* Link Training */
>   #define  PCI_EXP_LNKSTA_DLLLA	0x2000	/* Data Link Layer Link Active */
> +#define  PCI_EXP_LNKSTA_LBMS	0x4000	/* Link Bandwidth Management Status */
>   #define PCI_EXP_SLTCAP		20	/* Slot Capabilities */
>   #define  PCI_EXP_SLTCAP_PSN	0xfff80000 /* Physical Slot Number */
>   #define PCI_EXP_RTCTL		28	/* Root Control */
> @@ -503,8 +514,14 @@
>   #define  PCI_EXP_DEVCAP2_ARI	0x00000020 /* ARI Forwarding Supported */
>   #define PCI_EXP_DEVCTL2		40	/* Device Control 2 */
>   #define  PCI_EXP_DEVCTL2_ARI	0x0020 /* Alternative Routing-ID */
> -
> +#define PCI_EXP_LNKCAP2		44	/* Link Capability 2 */
> +#define  PCI_EXP_LNKCAP2_SLS	0x000000fe /* Supported Link Speeds Vector */
>   #define PCI_EXP_LNKCTL2		48	/* Link Control 2 */
> +#define  PCI_EXP_LNKCTL2_TLS	0x000f	/* Target Link Speed */
> +#define  PCI_EXP_LNKCTL2_TLS_2_5GT 0x0001 /* Target Link Speed 2.5GT/s */
> +#define  PCI_EXP_LNKCTL2_TLS_5_0GT 0x0002 /* Target Link Speed 5.0GT/s */
> +#define  PCI_EXP_LNKCTL2_TLS_8_0GT 0x0003 /* Target Link Speed 8.0GT/s */
> +
>   /* Single Root I/O Virtualization Registers */
>   #define PCI_SRIOV_CAP		0x04	/* SR-IOV Capabilities */
>   #define PCI_SRIOV_CTRL		0x08	/* SR-IOV Control */
> 

Viele Grüße,
Stefan Roese

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-51 Fax: (+49)-8142-66989-80 Email: sr at denx.de


More information about the U-Boot mailing list