[SPAM] Re: [PATCH] cmd: usb: Prevent reset in usb tree/info command
Marek Vasut
marex at denx.de
Sun Jun 11 14:29:29 CEST 2023
On 6/9/23 20:52, Xavier Drudis Ferran wrote:
> Sorry, I replied to Marek only but meant to reply to all.
>
> El Fri, Jun 09, 2023 at 03:20:33AM +0200, Marek Vasut deia:
>
>>> No. Well, in some tests yes and some no, but I got the error in all cases.
>>
>> This is doubtful. It is mandatory to run 'usb start' or 'usb reset' before
>> you would get any meaningful result out of 'usb info'. Without either, you
>> would get 'USB is stopped.' message. Could it be there are some extra
>> scripts in your environment that manipulate the USB ?
>>
>
> I saw usb_boootdev_hunt() calls usb_init() in drivers/usb/host/usb_bootdev.c
>
> But maybe I don't get that called and it's really something silly in
> my setup as you say later... Maybe it doesn't get called unless it
> finds nothing else useful to boot.
>
>>
>> Can you test with stock U-Boot ?
>>
>
> I don't know. I'll see if I have time ...
> I'd rather read the code to understand what's the condition for finding bootdevices...
>
>> Can you test with another USB stick, i.e. is the issue specific to this USB
>> stick ?
>>
>
> I could test this, yes.
>
>> Is the issue specific to this partition layout of this USB stick, i.e. if
>> you clone (dd if=... of=...) the content of the USB stick to another USB
>> stick, does the error still occur.
>>
>
> I'll try to partition and flash a new USB.
>
>> [...]
>>
>>> Model: Radxa ROCK Pi 4B
>>> Net: eth0: ethernet at fe300000
>>> Hit any key to stop autoboot: 2 1 0
>>> rockchip_pcie pcie at f8000000: PCIe link training gen1 timeout!
>>> Bus usb at fe380000: USB EHCI 1.00
>>> Bus usb at fe3c0000: USB EHCI 1.00
>>> Bus usb at fe800000: Register 2000140 NbrPorts 2
>>> Starting the controller
>>> USB XHCI 1.10
>>> Bus usb at fe900000: Register 2000140 NbrPorts 2
>>> Starting the controller
>>> USB XHCI 1.10
>>> scanning bus usb at fe380000 for devices... 1 USB Device(s) found
>>> scanning bus usb at fe3c0000 for devices... 2 USB Device(s) found
>>> scanning bus usb at fe800000 for devices... 1 USB Device(s) found
>>> scanning bus usb at fe900000 for devices... 1 USB Device(s) found
>>> rockchip_pcie pcie at f8000000: failed to find ep-gpios property
>>> ethernet at fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
>>> Could not initialize PHY ethernet at fe300000
>>> rockchip_pcie pcie at f8000000: failed to find ep-gpios property
>>> ethernet at fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
>>> Could not initialize PHY ethernet at fe300000
>>
>> Is this some $preboot script which initializes your hardware ?
>>
>
> Mmm... yes, I used to have it... I thought not in this test, but I'd better recheck
>
> Anyway, one should be allowed to stop the boot, call usb start and usb tree
> and don't get a reset, shouldn't one?
>
>> => printenv preboot
>>
>
> I'll send this later when I repeat the test. I'd like to find a
> minimal test case or something...
Thank you
>>> => usb tree
>>> USB device tree:
>>> 1 Hub (480 Mb/s, 0mA)
>>> u-boot EHCI Host Controller
>>> 1 Hub (480 Mb/s, 0mA)
>>> | u-boot EHCI Host Controller
>>> |
>>> uclass_id=64
>>> |+-2 Mass Storage (480 Mb/s, 200mA)
>>> TDK LoR TF10 07032B6B1D0ACB96
>>> uclass_id=22
>>> uclass_id=25
>>> "Synchronous Abort" handler, esr 0x96000010, far 0x948
>>> elr: 00000000002157d4 lr : 00000000002157d4 (reloc)
>>> elr: 00000000f3f4f7d4 lr : 00000000f3f4f7d4
>>
>> Take the u-boot (unstripped elf) which matches this binary, and run
>> aarch64-...objdump -lSD on it, then search for the $lr value, see
>> doc/develop/crash_dumps.rst for details. That should tell you where exactly
>> the crash occurred. Where did it occur ?
>>
>
> I didn't do it exactly so, but from u-boot.map I gathered that it was
> in cmd/usb.c and the fact that my patch fixed it implies the problem
> is the functions usb_show_tree_graph() or usb_show_info() get called
> recursively with null as a first parameter.
>
> Now I don't have that u-boot.map anymore and would have to repeat the
> experiment, to find out exactly as you say, so I won't do it right
> now. But thanks, understood.
>
> The reason usb_show_tree_graph() gets called with a null usb_device *
> is that the code in cmd/usb.c for usb info and usb tree assumes
> everything a UCLASS_MASS_STORAGE device can have as children are
> devices with some usb_device in their dev_get_parent_priv(). It
> carves out exceptions to this general rule for UCLASS_USB_EMUL and
> UCLASS_BLK, but not for UCLASS_BOOTDEV. When it finds a child that is
> UCLASS_BOOTDEV it happily recurses on it passing its parent_priv as
> usb_device, but the bootdev code did not put any usb_device there,
> it's null. So the first access causes a null pointer dereference.
>
> I would have to wrap my mind around more code to start understanding
> if it's better to give that UCLASS_BOOTDEV some usb_device as parent
> priv data, or it is better to give USB devices that can be enumerated
> for listing (usb tree or usb info) some RECURSIBLE flag that indicates
> their priv parent data is reliably a usb_device.
>
> So checking that the alledged usb_device at least isn't null as in my
> patch is possibly a partial solution. I'm sure if it's null we
> shouldn't call, but if it points to something other than a usb_device we
> shouldn't either, and it doesn't address why it is null (well, because
> it's not really a USB internal node, not even a proper leaf, so it
> shouldn't be recursed anyway).
>
> In usb_show_info() it is similar, usb_display_desc() gets called with a
> null udev because that's what came in. Recursion is avoided for
> UCLASS_USB_EMUL or UCLASS_BLK but not for UCLASS_BOOTDEV.
>
> A different solution could be to expand the exception to
> UCLASS_BOOTDEV, but it still seems a wrong strategy to expect to know
> everything you can find so you can list all the exceptions, instead of
> checking that you have something expected that you can recurse on. Not
> completely sure, just smells so to me. At least checking for null is
> more general.
>
> Maybe the solution is to fix common/usb_storage.c when it calls
> bootdev_setup_sibling_blk(), to ensure there's some useful usb_device
> there as parent priv. Or something needs fixing in
> drivers/usb/host/usb_bootdev.c ??? But I'm not sure. We really
> shouldn't call recursively for usb info or tree anyway.
>
> For now I was trying to understand when that UCLASS_BOOTDEV is added
> and why, and whether it should be removed at some time. This should
> hint me at what is the minimum scenario to reproduce the issue.
>
>>
>> The NAK is really only to prevent this from accidentally going on.
>>
>
> The NAK it's OK. I like people to want to understand stuff. Don't worry.
s at going on at going in@ ... typo.
>> Please see above, maybe it could be narrowed down ?
>
> I'll see if I can test better and send more useful reports.
> Not sure when.
>
> I'm not sure I'll have the time to learn all I need. I just hoped someone
> else had it in mind...
I reproducer would be a good starting point, i.e. something like "I run
these ... commands and I get this crash" . But please make sure you use
current codebase and no preboot or similar scripting .
More information about the U-Boot
mailing list