[SPAM] Re: [PATCH] cmd: usb: Prevent reset in usb tree/info command

Simon Glass sjg at chromium.org
Mon Jun 12 23:17:38 CEST 2023


Hi,

On Sun, 11 Jun 2023 at 13:29, Marek Vasut <marex at denx.de> wrote:
>
> On 6/9/23 20:52, Xavier Drudis Ferran wrote:
> > Sorry, I replied to Marek only but meant to reply to all.
> >
> > El Fri, Jun 09, 2023 at 03:20:33AM +0200, Marek Vasut deia:
> >
> >>> No. Well, in some tests yes and some no, but I got the error in all cases.
> >>
> >> This is doubtful. It is mandatory to run 'usb start' or 'usb reset' before
> >> you would get any meaningful result out of 'usb info'. Without either, you
> >> would get 'USB is stopped.' message. Could it be there are some extra
> >> scripts in your environment that manipulate the USB ?
> >>
> >
> > I saw usb_boootdev_hunt() calls usb_init() in drivers/usb/host/usb_bootdev.c
> >
> > But maybe I don't get that called and it's really something silly in
> > my setup as you say later... Maybe it doesn't get called unless it
> > finds nothing else useful to boot.
> >
> >>
> >> Can you test with stock U-Boot ?
> >>
> >
> > I don't know. I'll see if I have time ...
> > I'd rather read the code to understand what's the condition for finding bootdevices...
> >
> >> Can you test with another USB stick, i.e. is the issue specific to this USB
> >> stick ?
> >>
> >
> > I could test this, yes.
> >
> >> Is the issue specific to this partition layout of this USB stick, i.e. if
> >> you clone (dd if=... of=...) the content of the USB stick to another USB
> >> stick, does the error still occur.
> >>
> >
> > I'll try to partition and flash a new USB.
> >
> >> [...]
> >>
> >>> Model: Radxa ROCK Pi 4B
> >>> Net:   eth0: ethernet at fe300000
> >>> Hit any key to stop autoboot:  2     1     0
> >>> rockchip_pcie pcie at f8000000: PCIe link training gen1 timeout!
> >>> Bus usb at fe380000: USB EHCI 1.00
> >>> Bus usb at fe3c0000: USB EHCI 1.00
> >>> Bus usb at fe800000: Register 2000140 NbrPorts 2
> >>> Starting the controller
> >>> USB XHCI 1.10
> >>> Bus usb at fe900000: Register 2000140 NbrPorts 2
> >>> Starting the controller
> >>> USB XHCI 1.10
> >>> scanning bus usb at fe380000 for devices... 1 USB Device(s) found
> >>> scanning bus usb at fe3c0000 for devices... 2 USB Device(s) found
> >>> scanning bus usb at fe800000 for devices... 1 USB Device(s) found
> >>> scanning bus usb at fe900000 for devices... 1 USB Device(s) found
> >>> rockchip_pcie pcie at f8000000: failed to find ep-gpios property
> >>> ethernet at fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
> >>> Could not initialize PHY ethernet at fe300000
> >>> rockchip_pcie pcie at f8000000: failed to find ep-gpios property
> >>> ethernet at fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
> >>> Could not initialize PHY ethernet at fe300000
> >>
> >> Is this some $preboot script which initializes your hardware ?
> >>
> >
> > Mmm... yes, I used to have it... I thought not in this test, but I'd better recheck
> >
> > Anyway, one should be allowed to stop the boot, call usb start and usb tree
> > and don't get a reset, shouldn't one?
> >
> >> => printenv preboot
> >>
> >
> > I'll send this later when I repeat the test. I'd like to find a
> > minimal test case or something...
>
> Thank you
>
> >>> => usb tree
> >>> USB device tree:
> >>>     1  Hub (480 Mb/s, 0mA)
> >>>        u-boot EHCI Host Controller
> >>>     1  Hub (480 Mb/s, 0mA)
> >>>     |  u-boot EHCI Host Controller
> >>>     |
> >>>                          uclass_id=64
> >>>     | +-2  Mass Storage (480 Mb/s, 200mA)
> >>>          TDK LoR TF10 07032B6B1D0ACB96
> >>>                          uclass_id=22
> >>>                          uclass_id=25
> >>>        "Synchronous Abort" handler, esr 0x96000010, far 0x948
> >>> elr: 00000000002157d4 lr : 00000000002157d4 (reloc)
> >>> elr: 00000000f3f4f7d4 lr : 00000000f3f4f7d4
> >>
> >> Take the u-boot (unstripped elf) which matches this binary, and run
> >> aarch64-...objdump -lSD on it, then search for the $lr value, see
> >> doc/develop/crash_dumps.rst for details. That should tell you where exactly
> >> the crash occurred. Where did it occur ?
> >>
> >
> > I didn't do it exactly so, but from u-boot.map I gathered that it was
> > in cmd/usb.c and the fact that my patch fixed it implies the problem
> > is the functions usb_show_tree_graph() or usb_show_info() get called
> > recursively with null as a first parameter.
> >
> > Now I don't have that u-boot.map anymore and would have to repeat the
> > experiment, to find out exactly as you say, so I won't do it right
> > now. But thanks, understood.
> >
> > The reason usb_show_tree_graph() gets called with a null usb_device *
> > is that the code in cmd/usb.c for usb info and usb tree assumes
> > everything a UCLASS_MASS_STORAGE device can have as children are
> > devices with some usb_device in their dev_get_parent_priv().  It
> > carves out exceptions to this general rule for UCLASS_USB_EMUL and
> > UCLASS_BLK, but not for UCLASS_BOOTDEV. When it finds a child that is
> > UCLASS_BOOTDEV it happily recurses on it passing its parent_priv as
> > usb_device, but the bootdev code did not put any usb_device there,
> > it's null. So the first access causes a null pointer dereference.
> >
> > I would have to wrap my mind around more code to start understanding
> > if it's better to give that UCLASS_BOOTDEV some usb_device as parent
> > priv data, or it is better to give USB devices that can be enumerated
> > for listing (usb tree or usb info) some RECURSIBLE flag that indicates
> > their priv parent data is reliably a usb_device.
> >
> > So checking that the alledged usb_device at least isn't null as in my
> > patch is possibly a partial solution. I'm sure if it's null we
> > shouldn't call, but if it points to something other than a usb_device we
> > shouldn't either, and it doesn't address why it is null (well, because
> > it's not really a USB internal node, not even a proper leaf, so it
> > shouldn't be recursed anyway).
> >
> > In usb_show_info() it is similar, usb_display_desc() gets called with a
> > null udev because that's what came in. Recursion is avoided for
> > UCLASS_USB_EMUL or UCLASS_BLK but not for UCLASS_BOOTDEV.
> >
> > A different solution could be to expand the exception to
> > UCLASS_BOOTDEV, but it still seems a wrong strategy to expect to know
> > everything you can find so you can list all the exceptions, instead of
> > checking that you have something expected that you can recurse on. Not
> > completely sure, just smells so to me. At least checking for null is
> > more general.
> >
> > Maybe the solution is to fix common/usb_storage.c when it calls
> > bootdev_setup_sibling_blk(), to ensure there's some useful usb_device
> > there as parent priv. Or something needs fixing in
> > drivers/usb/host/usb_bootdev.c ??? But I'm not sure. We really
> > shouldn't call recursively for usb info or tree anyway.
> >
> > For now I was trying to understand when that UCLASS_BOOTDEV is added
> > and why, and whether it should be removed at some time. This should
> > hint me at what is the minimum scenario to reproduce the issue.
> >
> >>
> >> The NAK is really only to prevent this from accidentally going on.
> >>
> >
> > The NAK it's OK. I like people to want to understand stuff. Don't worry.
>
> s at going on at going in@ ... typo.
>
> >> Please see above, maybe it could be narrowed down ?
> >
> > I'll see if I can test better and send more useful reports.
> > Not sure when.
> >
> > I'm not sure I'll have the time to learn all I need. I just hoped someone
> > else had it in mind...
>
> I reproducer would be a good starting point, i.e. something like "I run
> these ... commands and I get this crash" . But please make sure you use
> current codebase and no preboot or similar scripting .

I'm not sure what is going on here. Which version are you testing? Do
you have these two commits?

8c29b73278d6 bootstd: usb: Avoid initing USB twice
9fea3a799dde usb: Tidy up the usb_start flag

Regards,
Simon


More information about the U-Boot mailing list