[SPAM] Re: [PATCH] cmd: usb: Prevent reset in usb tree/info command
Xavier Drudis Ferran
xdrudis at tinet.cat
Fri Jun 9 20:52:15 CEST 2023
Sorry, I replied to Marek only but meant to reply to all.
El Fri, Jun 09, 2023 at 03:20:33AM +0200, Marek Vasut deia:
> > No. Well, in some tests yes and some no, but I got the error in all cases.
>
> This is doubtful. It is mandatory to run 'usb start' or 'usb reset' before
> you would get any meaningful result out of 'usb info'. Without either, you
> would get 'USB is stopped.' message. Could it be there are some extra
> scripts in your environment that manipulate the USB ?
>
I saw usb_boootdev_hunt() calls usb_init() in drivers/usb/host/usb_bootdev.c
But maybe I don't get that called and it's really something silly in
my setup as you say later... Maybe it doesn't get called unless it
finds nothing else useful to boot.
>
> Can you test with stock U-Boot ?
>
I don't know. I'll see if I have time ...
I'd rather read the code to understand what's the condition for finding bootdevices...
> Can you test with another USB stick, i.e. is the issue specific to this USB
> stick ?
>
I could test this, yes.
> Is the issue specific to this partition layout of this USB stick, i.e. if
> you clone (dd if=... of=...) the content of the USB stick to another USB
> stick, does the error still occur.
>
I'll try to partition and flash a new USB.
> [...]
>
> > Model: Radxa ROCK Pi 4B
> > Net: eth0: ethernet at fe300000
> > Hit any key to stop autoboot: 2 1 0
> > rockchip_pcie pcie at f8000000: PCIe link training gen1 timeout!
> > Bus usb at fe380000: USB EHCI 1.00
> > Bus usb at fe3c0000: USB EHCI 1.00
> > Bus usb at fe800000: Register 2000140 NbrPorts 2
> > Starting the controller
> > USB XHCI 1.10
> > Bus usb at fe900000: Register 2000140 NbrPorts 2
> > Starting the controller
> > USB XHCI 1.10
> > scanning bus usb at fe380000 for devices... 1 USB Device(s) found
> > scanning bus usb at fe3c0000 for devices... 2 USB Device(s) found
> > scanning bus usb at fe800000 for devices... 1 USB Device(s) found
> > scanning bus usb at fe900000 for devices... 1 USB Device(s) found
> > rockchip_pcie pcie at f8000000: failed to find ep-gpios property
> > ethernet at fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
> > Could not initialize PHY ethernet at fe300000
> > rockchip_pcie pcie at f8000000: failed to find ep-gpios property
> > ethernet at fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
> > Could not initialize PHY ethernet at fe300000
>
> Is this some $preboot script which initializes your hardware ?
>
Mmm... yes, I used to have it... I thought not in this test, but I'd better recheck
Anyway, one should be allowed to stop the boot, call usb start and usb tree
and don't get a reset, shouldn't one?
> => printenv preboot
>
I'll send this later when I repeat the test. I'd like to find a
minimal test case or something...
> > => usb tree
> > USB device tree:
> > 1 Hub (480 Mb/s, 0mA)
> > u-boot EHCI Host Controller
> > 1 Hub (480 Mb/s, 0mA)
> > | u-boot EHCI Host Controller
> > |
> > uclass_id=64
> > |+-2 Mass Storage (480 Mb/s, 200mA)
> > TDK LoR TF10 07032B6B1D0ACB96
> > uclass_id=22
> > uclass_id=25
> > "Synchronous Abort" handler, esr 0x96000010, far 0x948
> > elr: 00000000002157d4 lr : 00000000002157d4 (reloc)
> > elr: 00000000f3f4f7d4 lr : 00000000f3f4f7d4
>
> Take the u-boot (unstripped elf) which matches this binary, and run
> aarch64-...objdump -lSD on it, then search for the $lr value, see
> doc/develop/crash_dumps.rst for details. That should tell you where exactly
> the crash occurred. Where did it occur ?
>
I didn't do it exactly so, but from u-boot.map I gathered that it was
in cmd/usb.c and the fact that my patch fixed it implies the problem
is the functions usb_show_tree_graph() or usb_show_info() get called
recursively with null as a first parameter.
Now I don't have that u-boot.map anymore and would have to repeat the
experiment, to find out exactly as you say, so I won't do it right
now. But thanks, understood.
The reason usb_show_tree_graph() gets called with a null usb_device *
is that the code in cmd/usb.c for usb info and usb tree assumes
everything a UCLASS_MASS_STORAGE device can have as children are
devices with some usb_device in their dev_get_parent_priv(). It
carves out exceptions to this general rule for UCLASS_USB_EMUL and
UCLASS_BLK, but not for UCLASS_BOOTDEV. When it finds a child that is
UCLASS_BOOTDEV it happily recurses on it passing its parent_priv as
usb_device, but the bootdev code did not put any usb_device there,
it's null. So the first access causes a null pointer dereference.
I would have to wrap my mind around more code to start understanding
if it's better to give that UCLASS_BOOTDEV some usb_device as parent
priv data, or it is better to give USB devices that can be enumerated
for listing (usb tree or usb info) some RECURSIBLE flag that indicates
their priv parent data is reliably a usb_device.
So checking that the alledged usb_device at least isn't null as in my
patch is possibly a partial solution. I'm sure if it's null we
shouldn't call, but if it points to something other than a usb_device we
shouldn't either, and it doesn't address why it is null (well, because
it's not really a USB internal node, not even a proper leaf, so it
shouldn't be recursed anyway).
In usb_show_info() it is similar, usb_display_desc() gets called with a
null udev because that's what came in. Recursion is avoided for
UCLASS_USB_EMUL or UCLASS_BLK but not for UCLASS_BOOTDEV.
A different solution could be to expand the exception to
UCLASS_BOOTDEV, but it still seems a wrong strategy to expect to know
everything you can find so you can list all the exceptions, instead of
checking that you have something expected that you can recurse on. Not
completely sure, just smells so to me. At least checking for null is
more general.
Maybe the solution is to fix common/usb_storage.c when it calls
bootdev_setup_sibling_blk(), to ensure there's some useful usb_device
there as parent priv. Or something needs fixing in
drivers/usb/host/usb_bootdev.c ??? But I'm not sure. We really
shouldn't call recursively for usb info or tree anyway.
For now I was trying to understand when that UCLASS_BOOTDEV is added
and why, and whether it should be removed at some time. This should
hint me at what is the minimum scenario to reproduce the issue.
>
> The NAK is really only to prevent this from accidentally going on.
>
The NAK it's OK. I like people to want to understand stuff. Don't worry.
> Please see above, maybe it could be narrowed down ?
I'll see if I can test better and send more useful reports.
Not sure when.
I'm not sure I'll have the time to learn all I need. I just hoped someone
else had it in mind...
More information about the U-Boot
mailing list