Odd error with cn9130, asix88179 and xhci

Chris Packham judge.packham at gmail.com
Mon Nov 25 01:09:23 CET 2024


On Sat, Nov 23, 2024 at 3:40 PM Tom Rini <trini at konsulko.com> wrote:
>
> On Wed, Nov 20, 2024 at 11:29:43AM +1300, Chris Packham wrote:
> > Hi U-Boot,
> >
> > We've hit a weird problem at $dayjob with a board using the Marvell
> > CN9130 SoC and using the asix88179 USB-Eth adapter.
> >
> > The problem is after enabling and unrelated feature in u-boot the
> > asix88179 fails to receive data (I can confirm that the link partner
> > does see packets in the transmit direction)
> >
> > => version
> > U-Boot 2022.01 (Nov 08 2024 - 09:45:44 +0000)
> > => usb start
> > starting USB...
> > Bus usb3 at 500000: Register 2000120 NbrPorts 2
> > Starting the controller
> > USB XHCI 1.00
> > scanning bus usb3 at 500000 for devices... 2 USB Device(s) found
> >        scanning usb for storage devices... 0 Storage Device(s) found
> > => ping ${serverip}
> > Waiting for Ethernet connection... unable to connect.
> > Reset Ethernet Device
> > Waiting for Ethernet connection... done.
> > Using ax88179_eth device
> > Rx: failed to receive: -5
> > Rx: failed to receive: -5
> > Rx: failed to receive: -5
> > Rx: failed to receive: -5
> > Rx: failed to receive: -5
> > Rx: failed to receive: -5
> > Rx: failed to receive: -5
> > Rx: failed to receive: -5
> > Rx: failed to receive: -5
> > Rx: failed to receive: -5
> > Rx: failed to receive: -5
> > Rx: failed to receive: -5
> > Rx: failed to receive: -5
> >
> > Abort
> > ping failed; host 10.37.233.65 is not alive
> > => <INTERRUPT>
> >
> > Debugging a little we can see that the -EIO is actually because
> > xhci_bulk_tx() hits a timeout from xhci_wait_for_event().
> >
> > We think this is triggered by the u-boot image size crossing some
> > boundary (the problem seems to start when .bss_end crosses
> > 0x00000000000f0000) although I've so far been unable to find
> > specifically why that might be. As far as I can tell u-boot is being
> > built relocatably and nothing is overlapping. I also considered that
> > ATF might be preventing access to something but so far I see no
> > evidence of this.
> >
> > If I turn off some features to reduce the build size the problem goes
> > away. That is actually how we've avoided the immediate issue, although
> > that means the problem will likely come back and an inopportune time.
> >
> > Does anyone have any ideas as to what the true root cause might be?
> > I'm a bit stumped.
>
> Hummmm. Since you note it seems to be when a threshold is crossed in BSS
> size, add something to the BSS of a variable size that you control, and
> after confirming that you can replicate the problem this way, grow it
> just past the limit and compare u-boot.map files in the works/fails
> cases to see just what's being moved around?

So I tried a little experiment

diff --git a/net/net.c b/net/net.c
index b003b84b3537..a6def9785133 100644
--- a/net/net.c
+++ b/net/net.c
@@ -180,6 +180,10 @@ u32 net_boot_file_size;
 /* Boot file size in blocks as reported by the DHCP server */
 u32 net_boot_file_expected_size_in_blocks;

+#define DUMMY_SIZE (1 << 11)
+
+int dummy[DUMMY_SIZE] = {0};
+
 static uchar net_pkt_buf[(PKTBUFSRX+1) * PKTSIZE_ALIGN + PKTALIGN];
 /* Receive packets */
 uchar *net_rx_packets[PKTBUFSRX];
@@ -211,6 +215,7 @@ int __maybe_unused net_busy_flag;
 static int on_ipaddr(const char *name, const char *value, enum env_op op,
        int flags)
 {
+       dummy[DUMMY_SIZE - 1] = -1;
        if (flags & H_PROGRAMMATIC)
                return 0;


If I make DUMMY_SIZE (1 << 10) I don't see the problem. With
DUMMY_SIZE (1 << 11) I can see the problem. If I make it DUMMY_SIZE (1
<< 14) then the problem goes away again.

The obvious things that are moving are net_rx_packet,
net_rx_packet_len and net_rx_packets. I'll see if I can narrow things
down to specifically which of these is being problematic.

>
> --
> Tom


More information about the U-Boot mailing list