[SPAM] Re: [PATCH] cmd: usb: Prevent reset in usb tree/info command

Marek Vasut marex at denx.de
Sun Jun 11 14:29:29 CEST 2023


On 6/9/23 20:52, Xavier Drudis Ferran wrote:
> Sorry, I replied to Marek only but meant to reply to all.
> 
> El Fri, Jun 09, 2023 at 03:20:33AM +0200, Marek Vasut deia:
> 
>>> No. Well, in some tests yes and some no, but I got the error in all cases.
>>
>> This is doubtful. It is mandatory to run 'usb start' or 'usb reset' before
>> you would get any meaningful result out of 'usb info'. Without either, you
>> would get 'USB is stopped.' message. Could it be there are some extra
>> scripts in your environment that manipulate the USB ?
>>
> 
> I saw usb_boootdev_hunt() calls usb_init() in drivers/usb/host/usb_bootdev.c
> 
> But maybe I don't get that called and it's really something silly in
> my setup as you say later... Maybe it doesn't get called unless it
> finds nothing else useful to boot.
> 
>>
>> Can you test with stock U-Boot ?
>>
> 
> I don't know. I'll see if I have time ...
> I'd rather read the code to understand what's the condition for finding bootdevices...
> 
>> Can you test with another USB stick, i.e. is the issue specific to this USB
>> stick ?
>>
> 
> I could test this, yes.
> 
>> Is the issue specific to this partition layout of this USB stick, i.e. if
>> you clone (dd if=... of=...) the content of the USB stick to another USB
>> stick, does the error still occur.
>>
> 
> I'll try to partition and flash a new USB.
> 
>> [...]
>>
>>> Model: Radxa ROCK Pi 4B
>>> Net:   eth0: ethernet at fe300000
>>> Hit any key to stop autoboot:  2  1  0
>>> rockchip_pcie pcie at f8000000: PCIe link training gen1 timeout!
>>> Bus usb at fe380000: USB EHCI 1.00
>>> Bus usb at fe3c0000: USB EHCI 1.00
>>> Bus usb at fe800000: Register 2000140 NbrPorts 2
>>> Starting the controller
>>> USB XHCI 1.10
>>> Bus usb at fe900000: Register 2000140 NbrPorts 2
>>> Starting the controller
>>> USB XHCI 1.10
>>> scanning bus usb at fe380000 for devices... 1 USB Device(s) found
>>> scanning bus usb at fe3c0000 for devices... 2 USB Device(s) found
>>> scanning bus usb at fe800000 for devices... 1 USB Device(s) found
>>> scanning bus usb at fe900000 for devices... 1 USB Device(s) found
>>> rockchip_pcie pcie at f8000000: failed to find ep-gpios property
>>> ethernet at fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
>>> Could not initialize PHY ethernet at fe300000
>>> rockchip_pcie pcie at f8000000: failed to find ep-gpios property
>>> ethernet at fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
>>> Could not initialize PHY ethernet at fe300000
>>
>> Is this some $preboot script which initializes your hardware ?
>>
> 
> Mmm... yes, I used to have it... I thought not in this test, but I'd better recheck
> 
> Anyway, one should be allowed to stop the boot, call usb start and usb tree
> and don't get a reset, shouldn't one?
> 
>> => printenv preboot
>>
> 
> I'll send this later when I repeat the test. I'd like to find a
> minimal test case or something...

Thank you

>>> => usb tree
>>> USB device tree:
>>>     1  Hub (480 Mb/s, 0mA)
>>>        u-boot EHCI Host Controller
>>>     1  Hub (480 Mb/s, 0mA)
>>>     |  u-boot EHCI Host Controller
>>>     |
>>>                          uclass_id=64
>>>     |+-2  Mass Storage (480 Mb/s, 200mA)
>>>          TDK LoR TF10 07032B6B1D0ACB96
>>>                          uclass_id=22
>>>                          uclass_id=25
>>>        "Synchronous Abort" handler, esr 0x96000010, far 0x948
>>> elr: 00000000002157d4 lr : 00000000002157d4 (reloc)
>>> elr: 00000000f3f4f7d4 lr : 00000000f3f4f7d4
>>
>> Take the u-boot (unstripped elf) which matches this binary, and run
>> aarch64-...objdump -lSD on it, then search for the $lr value, see
>> doc/develop/crash_dumps.rst for details. That should tell you where exactly
>> the crash occurred. Where did it occur ?
>>
> 
> I didn't do it exactly so, but from u-boot.map I gathered that it was
> in cmd/usb.c and the fact that my patch fixed it implies the problem
> is the functions usb_show_tree_graph() or usb_show_info() get called
> recursively with null as a first parameter.
> 
> Now I don't have that u-boot.map anymore and would have to repeat the
> experiment, to find out exactly as you say, so I won't do it right
> now. But thanks, understood.
> 
> The reason usb_show_tree_graph() gets called with a null usb_device *
> is that the code in cmd/usb.c for usb info and usb tree assumes
> everything a UCLASS_MASS_STORAGE device can have as children are
> devices with some usb_device in their dev_get_parent_priv().  It
> carves out exceptions to this general rule for UCLASS_USB_EMUL and
> UCLASS_BLK, but not for UCLASS_BOOTDEV. When it finds a child that is
> UCLASS_BOOTDEV it happily recurses on it passing its parent_priv as
> usb_device, but the bootdev code did not put any usb_device there,
> it's null. So the first access causes a null pointer dereference.
> 
> I would have to wrap my mind around more code to start understanding
> if it's better to give that UCLASS_BOOTDEV some usb_device as parent
> priv data, or it is better to give USB devices that can be enumerated
> for listing (usb tree or usb info) some RECURSIBLE flag that indicates
> their priv parent data is reliably a usb_device.
> 
> So checking that the alledged usb_device at least isn't null as in my
> patch is possibly a partial solution. I'm sure if it's null we
> shouldn't call, but if it points to something other than a usb_device we
> shouldn't either, and it doesn't address why it is null (well, because
> it's not really a USB internal node, not even a proper leaf, so it
> shouldn't be recursed anyway).
> 
> In usb_show_info() it is similar, usb_display_desc() gets called with a
> null udev because that's what came in. Recursion is avoided for
> UCLASS_USB_EMUL or UCLASS_BLK but not for UCLASS_BOOTDEV.
> 
> A different solution could be to expand the exception to
> UCLASS_BOOTDEV, but it still seems a wrong strategy to expect to know
> everything you can find so you can list all the exceptions, instead of
> checking that you have something expected that you can recurse on. Not
> completely sure, just smells so to me. At least checking for null is
> more general.
> 
> Maybe the solution is to fix common/usb_storage.c when it calls
> bootdev_setup_sibling_blk(), to ensure there's some useful usb_device
> there as parent priv. Or something needs fixing in
> drivers/usb/host/usb_bootdev.c ??? But I'm not sure. We really
> shouldn't call recursively for usb info or tree anyway.
> 
> For now I was trying to understand when that UCLASS_BOOTDEV is added
> and why, and whether it should be removed at some time. This should
> hint me at what is the minimum scenario to reproduce the issue.
> 
>>
>> The NAK is really only to prevent this from accidentally going on.
>>
> 
> The NAK it's OK. I like people to want to understand stuff. Don't worry.

s at going on at going in@ ... typo.

>> Please see above, maybe it could be narrowed down ?
> 
> I'll see if I can test better and send more useful reports.
> Not sure when.
> 
> I'm not sure I'll have the time to learn all I need. I just hoped someone
> else had it in mind...

I reproducer would be a good starting point, i.e. something like "I run 
these ... commands and I get this crash" . But please make sure you use 
current codebase and no preboot or similar scripting .


More information about the U-Boot mailing list