[PATCH] pci: Work around PCIe link training failures
Maciej W. Rozycki
macro at orcam.me.uk
Sun Nov 14 16:16:47 CET 2021
Attempt to handle cases with a downstream port of a PCIe switch where
link training never completes and the link continues switching between
speeds indefinitely with the data link layer never reaching the active
state.
It has been observed with a downstream port of the ASMedia ASM2824 Gen 3
switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2
switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device,
P/N 41433, wired to a SiFive HiFive Unmatched board. In this setup the
switches are supposed to negotiate the link speed of preferably 5.0GT/s,
falling back to 2.5GT/s.
However the link continues oscillating between the two speeds, with link
training reported active ~84% of the time repeatedly, e.g.:
02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
[...]
Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
[...]
Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
[...]
LnkSta: Speed 5GT/s (downgraded), Width x1 (ok)
TrErr- Train+ SlotClk+ DLActive- BWMgmt+ ABWMgmt-
[...]
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
[...]
Forcibly limiting the target link speed to 2.5GT/s with the upstream
ASM2824 device makes the two switches communicate correctly however:
02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
[...]
Bus: primary=02, secondary=05, subordinate=09, sec-latency=0
[...]
Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
[...]
LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok)
TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
[...]
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
[...]
and then:
05:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch [12d8:2304] (rev 05) (prog-if 00 [Normal decode])
[...]
Bus: primary=05, secondary=06, subordinate=09, sec-latency=0
[...]
Capabilities: [c0] Express (v2) Upstream Port, MSI 00
[...]
LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
[...]
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
[...]
Make use of this observation then and attempt to detect the inability to
negotiate the link speed automatically, and then handle it by hand. Use
the Data Link Layer Link Active status flag as the primary indicator of
successful link speed negotiation, but given that the flag is optional
by hardware to implement (the ASM2824 does have it though), resort to
checking for the mandatory Link Bandwidth Management Status flag showing
that the link speed or width has been changed in an attempt to correct
unreliable link operation (the ASM2824 does set it too).
If these checks indicate that link may not operate correctly, then poll
the Data Link Layer Link Active status flag along with the Link Training
flag for the duration of 200ms to see if the link has stabilised, that
is either that the Data Link Layer Link Active status flag has been set
or that Link Training has been inactive during at least the second half
of the inteval.
If that has indicated failure, reduce the target speed, request a link
retrain and check again if the link has stabilised. Repeat until either
successful or the link speeds supported by the downstream port have been
exhausted.
Signed-off-by: Maciej W. Rozycki <macro at orcam.me.uk>
---
Hi,
I came across this issue when building the system based on my recently
acquired SiFive HiFive Unmatched board. One of the project objectives was
to have a pair of legacy PCI options wired and in the course of the design
I came up with a solution including an intermediate Delock Riser Card PCI
Express x1 > 2 x PCIe x1 device, which provides an extra PCIe slot beyond
one taken by the downstream PCIe-PCI bridge device, a worthwhile addition
(<https://www.delock.com/produkt/41433/merkmale.html>, now discontinued).
It has quickly turned out that the Delock device does not want to talk to
the Unmatched board, even though both devices operate correctly in other
topologies. So I have asked all the involved parties for assistance, but
that did not help. ASMedia declined to answer, someone from Diodes (for
the Pericom part) did reply, but quickly concluded they could only give
support for silicon design, and Delock said that they do not provide such
a level of support and if a device does not work, then I am on my own.
The guys from SiFive were at least sympathetic (thank you!), but this
level of hardware debug was beyond their skills and/or resources. So I
was left on my own really.
I chose not to give up though and tried to poke at the upstream ASMedia
switch based on general PCIe technical information (sadly the details of
the switch's programming interface remain undocumented, unlike with the
downstream Pericom part). That has revealed that the ASMedia part can be
persuaded to talk to the downstream Pericom part by forcing the link speed
to 2.5GT/s, e.g. by entering these commands at the U-Boot prompt:
=> pci write.w 02.03.0 0xb0 0x0061
=> pci write.w 02.03.0 0x90 0x0020
This could do as a workaround, but one complication with the Unmatched is
that it does not have a writable persistent environment storage, so it
would have to be hardcoded with a patched U-Boot rebuild, quite a hack.
Then I thought perhaps we could do better and handle this automatically
in U-Boot. Firmware is after all the place to fix up any hardware errata
if possible. Not only this would relieve the user from having to know
which PCIe port to apply the workaround to (the Unmatched has 3 external
PCIe slots available total in a different form factor each), but it would
be completely transparent regardless of the devices involved, possibly
different from ones I have observed this issue with.
Looking into it I soon realised that the problematic pair of switches
actually continues link negotiation indefinitely, as indicated by the Link
Training flag flipping and the Current Link Speed indicator switching
between 5GT/s and 2.5GT/s repeatedly. I have then used this information
to make the workaround offered here. It works reliably in my case, though
the polling period of 200ms is a bit of heuristics, especially as I cannot
claim PCIe expertise really, so any input will be appreciated here.
Otherwise I think the solution is ready to include with U-Boot.
Questions, comments, concerns? Otherwise please apply.
NB I have bcc-ed this submission to ASMedia and Diodes in a half-hearted
hope someone actually involved with the design of these devices chooses to
chime in after all. I find it particularly regrettable that the switches
have no failure counter or suchlike that would prevent link negotiation
from oscillating between 5GT/s and 2.5GT/s indefinitely and after a number
of iterations would instead make them settle on the lower speed, which
appears fully reliable in this setup. The lack of documentation for the
ASM2824 does not help either.
Maciej
---
drivers/pci/pci_auto.c | 148 +++++++++++++++++++++++++++++++++++++++++++++++++
include/pci.h | 17 +++++
2 files changed, 164 insertions(+), 1 deletion(-)
u-boot-pcie-manual-retrain.diff
Index: u-boot/drivers/pci/pci_auto.c
===================================================================
--- u-boot.orig/drivers/pci/pci_auto.c
+++ u-boot/drivers/pci/pci_auto.c
@@ -5,6 +5,7 @@
* Author: Matt Porter <mporter at mvista.com>
*
* Copyright 2000 MontaVista Software Inc.
+ * Copyright (c) 2021 Maciej W. Rozycki <macro at orcam.me.uk>
*/
#include <common.h>
@@ -12,6 +13,7 @@
#include <errno.h>
#include <log.h>
#include <pci.h>
+#include <time.h>
#include "pci_internal.h"
/* the user can define CONFIG_SYS_PCI_CACHE_LINE_SIZE to avoid problems */
@@ -180,6 +182,146 @@ static void dm_pciauto_setup_device(stru
dm_pci_write_config8(dev, PCI_LATENCY_TIMER, 0x80);
}
+/*
+ * Check if the link of a downstream PCIe switch port operates correctly.
+ *
+ * For that check if the optional Data Link Layer Link Active status gets
+ * on within a 200ms period or failing that wait until the completion of
+ * that period and check if link training has shown the completed status
+ * continuously throughout the second half of that period.
+ *
+ * Observation with the ASMedia ASM2824 Gen 3 switch indicates it takes
+ * 11-44ms to indicate the Data Link Layer Link Active status at 2.5GT/s,
+ * though it may take a couple of link training iterations.
+ */
+static bool dm_pciauto_exp_link_stable(struct udevice *dev, int pcie_off)
+{
+ u64 loops = 0, trcount = 0, ntrcount = 0, flips = 0;
+ bool dllla, lnktr, plnktr;
+ u16 exp_lnksta;
+ pci_dev_t bdf;
+ u64 end;
+
+ dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKSTA, &exp_lnksta);
+ plnktr = !!(exp_lnksta & PCI_EXP_LNKSTA_LNKTR);
+
+ end = get_ticks() + usec_to_tick(200000);
+ do {
+ dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKSTA,
+ &exp_lnksta);
+ dllla = !!(exp_lnksta & PCI_EXP_LNKSTA_DLLLA);
+ lnktr = !!(exp_lnksta & PCI_EXP_LNKSTA_LNKTR);
+
+ flips += plnktr ^ lnktr;
+ if (lnktr) {
+ ntrcount = 0;
+ trcount++;
+ } else {
+ ntrcount++;
+ }
+ loops++;
+
+ plnktr = lnktr;
+ } while (!dllla && get_ticks() < end);
+
+ bdf = dm_pci_get_bdf(dev);
+ debug("PCI Autoconfig: %02x.%02x.%02x: Fixup link: DL active: %u; "
+ "%3llu flips, %6llu loops of which %6llu while training, "
+ "final %6llu stable\n",
+ PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf),
+ (unsigned int)dllla,
+ (unsigned long long)flips, (unsigned long long)loops,
+ (unsigned long long)trcount, (unsigned long long)ntrcount);
+
+ return dllla || ntrcount >= loops / 2;
+}
+
+/*
+ * Retrain the link of a downstream PCIe switch port by hand if necessary.
+ *
+ * This is needed at least where a downstream port of the ASMedia ASM2824
+ * Gen 3 switch is wired to the upstream port of the Pericom PI7C9X2G304
+ * Gen 2 switch, and observed with the Delock Riser Card PCI Express x1 >
+ * 2 x PCIe x1 device, P/N 41433, plugged into the SiFive HiFive Unmatched
+ * board.
+ *
+ * In such a configuration the switches are supposed to negotiate the link
+ * speed of preferably 5.0GT/s, falling back to 2.5GT/s. However the link
+ * continues switching between the two speeds indefinitely and the data
+ * link layer never reaches the active state, with link training reported
+ * active ~84% of the time repeatedly. Forcing the target link speed to
+ * 2.5GT/s with the upstream ASM2824 device makes the two switches talk to
+ * each other correctly however.
+ *
+ * As this can potentially happen with any devices and is cheap in the
+ * case of correctly functioning hardware, let's do it for all downstream
+ * switch ports and if automatic link training may have failed to complete,
+ * as indicated by the optional Data Link Layer Link Active status being
+ * off and the Link Bandwidth Management Status showing that hardware has
+ * changed the link speed or width in an attempt to correct unreliable
+ * link operation then try lower and lower speeds one by one until one has
+ * succeeded or we have run out of speeds.
+ *
+ * This requires the presence of the Link Control 2 register, so make sure
+ * the PCI Express Capability Version is at least 2. Also don't try, for
+ * obvious reasons, to limit the speed if 2.5GT/s is the only link speed
+ * supported.
+ */
+static void dm_pciauto_exp_fixup_link(struct udevice *dev, int pcie_off)
+{
+ u16 exp_lnksta, exp_lnkctl, exp_lnkctl2;
+ u16 exp_flags, exp_type, exp_version;
+ u32 exp_lnkcap, exp_lnkcap2;
+ unsigned int speed;
+ pci_dev_t bdf;
+
+ dm_pci_read_config16(dev, pcie_off + PCI_EXP_FLAGS, &exp_flags);
+ exp_version = exp_flags & PCI_EXP_FLAGS_VERSION;
+ exp_type = exp_flags & PCI_EXP_FLAGS_TYPE;
+ if (exp_version < 2 || exp_type != PCI_EXP_TYPE_DOWN_PORT << 4)
+ return;
+
+ dm_pci_read_config32(dev, pcie_off + PCI_EXP_LNKCAP, &exp_lnkcap);
+ if ((exp_lnkcap & PCI_EXP_LNKCAP_SLS) <= PCI_EXP_LNKCAP_SLS_2_5GB)
+ return;
+
+ dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKSTA, &exp_lnksta);
+ if ((exp_lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) !=
+ PCI_EXP_LNKSTA_LBMS)
+ return;
+
+ if (dm_pciauto_exp_link_stable(dev, pcie_off))
+ return;
+
+ bdf = dm_pci_get_bdf(dev);
+ debug("PCI Autoconfig: %02x.%02x.%02x: Downgrading speed by hand\n",
+ PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf));
+
+ dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKCTL, &exp_lnkctl);
+ dm_pci_read_config16(dev, pcie_off + PCI_EXP_LNKCTL2, &exp_lnkctl2);
+ exp_lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
+
+ dm_pci_read_config32(dev, pcie_off + PCI_EXP_LNKCAP, &exp_lnkcap);
+ dm_pci_read_config32(dev, pcie_off + PCI_EXP_LNKCAP2, &exp_lnkcap2);
+ for (speed = (exp_lnkcap & PCI_EXP_LNKCAP_SLS) - 1;
+ speed >= PCI_EXP_LNKCAP_SLS_2_5GB;
+ speed--) {
+ if (exp_lnkcap2 && (exp_lnkcap2 & 1 << speed) == 0)
+ continue;
+
+ debug("PCI Autoconfig: %02x.%02x.%02x: Trying speed: %u\n",
+ PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf), speed);
+
+ dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL2,
+ exp_lnkctl2 | speed);
+ dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL,
+ exp_lnkctl | PCI_EXP_LNKCTL_RETRLNK);
+
+ if (dm_pciauto_exp_link_stable(dev, pcie_off))
+ return;
+ }
+}
+
void dm_pciauto_prescan_setup_bridge(struct udevice *dev, int sub_bus)
{
struct pci_region *pci_mem;
@@ -189,6 +331,7 @@ void dm_pciauto_prescan_setup_bridge(str
u8 io_32;
struct udevice *ctlr = pci_get_controller(dev);
struct pci_controller *ctlr_hose = dev_get_uclass_priv(ctlr);
+ int pcie_off;
pci_mem = ctlr_hose->pci_mem;
pci_prefetch = ctlr_hose->pci_prefetch;
@@ -267,6 +410,11 @@ void dm_pciauto_prescan_setup_bridge(str
cmdstat |= PCI_COMMAND_IO;
}
+ /* For PCIe devices see if we need to retrain the link by hand */
+ pcie_off = dm_pci_find_capability(dev, PCI_CAP_ID_EXP);
+ if (pcie_off)
+ dm_pciauto_exp_fixup_link(dev, pcie_off);
+
/* Enable memory and I/O accesses, enable bus master */
dm_pci_write_config16(dev, PCI_COMMAND, cmdstat | PCI_COMMAND_MASTER);
}
Index: u-boot/include/pci.h
===================================================================
--- u-boot.orig/include/pci.h
+++ u-boot/include/pci.h
@@ -5,6 +5,7 @@
*
* (C) Copyright 2002
* Wolfgang Denk, DENX Software Engineering, wd at denx.de.
+ * Copyright (c) 2021 Maciej W. Rozycki <macro at orcam.me.uk>
*/
#ifndef _PCI_H
@@ -475,16 +476,23 @@
/* PCI Express capabilities */
#define PCI_EXP_FLAGS 2 /* Capabilities register */
+#define PCI_EXP_FLAGS_VERSION 0x000f /* Capability Version */
#define PCI_EXP_FLAGS_TYPE 0x00f0 /* Device/Port type */
#define PCI_EXP_TYPE_ROOT_PORT 0x4 /* Root Port */
+#define PCI_EXP_TYPE_DOWN_PORT 0x6 /* Downstream Port */
#define PCI_EXP_DEVCAP 4 /* Device capabilities */
#define PCI_EXP_DEVCAP_FLR 0x10000000 /* Function Level Reset */
#define PCI_EXP_DEVCTL 8 /* Device Control */
#define PCI_EXP_DEVCTL_BCR_FLR 0x8000 /* Bridge Configuration Retry / FLR */
#define PCI_EXP_LNKCAP 12 /* Link Capabilities */
#define PCI_EXP_LNKCAP_SLS 0x0000000f /* Supported Link Speeds */
+#define PCI_EXP_LNKCAP_SLS_2_5GB 0x0001 /* Supported Link Speed 2.5GT/s */
+#define PCI_EXP_LNKCAP_SLS_5_0GB 0x0002 /* Supported Link Speed 5.0GT/s */
+#define PCI_EXP_LNKCAP_SLS_8_0GB 0x0003 /* Supported Link Speed 8.0GT/s */
#define PCI_EXP_LNKCAP_MLW 0x000003f0 /* Maximum Link Width */
#define PCI_EXP_LNKCAP_DLLLARC 0x00100000 /* Data Link Layer Link Active Reporting Capable */
+#define PCI_EXP_LNKCTL 16 /* Link Control */
+#define PCI_EXP_LNKCTL_RETRLNK 0x0020 /* Retrain Link */
#define PCI_EXP_LNKSTA 18 /* Link Status */
#define PCI_EXP_LNKSTA_CLS 0x000f /* Current Link Speed */
#define PCI_EXP_LNKSTA_CLS_2_5GB 0x0001 /* Current Link Speed 2.5GT/s */
@@ -492,7 +500,9 @@
#define PCI_EXP_LNKSTA_CLS_8_0GB 0x0003 /* Current Link Speed 8.0GT/s */
#define PCI_EXP_LNKSTA_NLW 0x03f0 /* Negotiated Link Width */
#define PCI_EXP_LNKSTA_NLW_SHIFT 4 /* start of NLW mask in link status */
+#define PCI_EXP_LNKSTA_LNKTR 0x0800 /* Link Training */
#define PCI_EXP_LNKSTA_DLLLA 0x2000 /* Data Link Layer Link Active */
+#define PCI_EXP_LNKSTA_LBMS 0x4000 /* Link Bandwidth Management Status */
#define PCI_EXP_SLTCAP 20 /* Slot Capabilities */
#define PCI_EXP_SLTCAP_PSN 0xfff80000 /* Physical Slot Number */
#define PCI_EXP_RTCTL 28 /* Root Control */
@@ -503,8 +513,13 @@
#define PCI_EXP_DEVCAP2_ARI 0x00000020 /* ARI Forwarding Supported */
#define PCI_EXP_DEVCTL2 40 /* Device Control 2 */
#define PCI_EXP_DEVCTL2_ARI 0x0020 /* Alternative Routing-ID */
-
+#define PCI_EXP_LNKCAP2 44 /* Link Capability 2 */
+#define PCI_EXP_LNKCAP2_SLSV 0x000000fe /* Supported Link Speeds Vector */
#define PCI_EXP_LNKCTL2 48 /* Link Control 2 */
+#define PCI_EXP_LNKCTL2_TLS 0x000f /* Target Link Speed */
+#define PCI_EXP_LNKSTA_TLS_2_5GB 0x0001 /* Target Link Speed 2.5GT/s */
+#define PCI_EXP_LNKSTA_TLS_5_0GB 0x0002 /* Target Link Speed 5.0GT/s */
+#define PCI_EXP_LNKSTA_TLS_8_0GB 0x0003 /* Target Link Speed 8.0GT/s */
/* Single Root I/O Virtualization Registers */
#define PCI_SRIOV_CAP 0x04 /* SR-IOV Capabilities */
#define PCI_SRIOV_CTRL 0x08 /* SR-IOV Control */
More information about the U-Boot
mailing list