[PATCH v2 0/8] USB fixes: xHCI error handling

Neal Gompa neal at gompa.dev
Sun Oct 29 17:21:50 CET 2023


On Sun, Oct 29, 2023 at 2:38 AM Hector Martin <marcan at marcan.st> wrote:
>
> This series is the first of a few bundles of USB fixes we have been
> carrying downstream on the Asahi U-Boot branch for a few months.
>
> Patches #1-#6 fix a series of related robustness issues. Certain
> conditions related to endpoint stalls revealed a chain of bugs
> throughout the stack that caused U-Boot to completely fail when
> encountering some errors (and errors are a fact of life with USB).
>
> Example scenario:
> - The device stalls a bulk endpoint due to an error
> - The upper layer driver tries to use the endpoint again
> - There is no endpoint stall clear wired up in the API, so for starters
>   this is doomed to fail (fix: #4)
> - xHCI knows the endpoint is halted, but tries to queue the TRB anyway,
>   which can't work (fix: #5)
> - Since the endpoint is halted nothing happens, so the transfer times
>   out.
> - The timeout handling tries to abort the transfer
> - abort_td() tries to stop the endpoint, but since it is already halted,
>   this results in a context state error. As the transfer never started,
>   there is no completion event, so xhci_wait_for_event() is waiting for
>   the wrong event type, and logs an error and returns NULL. (fix: #2)
> - abort_td() crashes due to failing to guard against the NULL event
>   (fix: #1)
> - Even after fixing all that, abort_td() would not handle the context
>   state error properly and BUG() (fix: #3). This also fixes a race
>   condition documented in the xHCI spec that could occur even in the
>   absence of all the other bugs.
>
> Patch #6 addresses a related robustness issue where
> xhci_wait_for_event() panics on event timeouts other than for transfers.
> While this is clearly an unexpected condition and indicates a bug
> elsewhere, it's no reason to outright crash. USB is notoriously
> impossible to get 100% right, and we can't afford to be breaking users'
> systems at any sign of trouble. Error conditions should always be dealt
> with as gracefully as possible (even if that results in a completely
> broken USB controller, that is much preferable to aborting the boot
> process entirely, especially on devices with non-USB storage and
> keyboards where USB support is effectively optional for most users).
> Since after patch #1 we now guard all callers to xhci_wait_for_event()
> with at least trivial NULL checks, it's okay to return and continue in
> case of timeouts.
>
> Patch #7 addresses an unrelated bug I ran into while debugging all this,
> and patch #8 adds extra debug logs to make finding future issues less
> painful.
>
> I believe this should fix this Fedora bug too, which is either an
> instance of the above sequence of events, or (more likely, given the
> difficulty reproducing) the race condition documented in xHCI 4.6.9:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=2244305
>
> Signed-off-by: Hector Martin <marcan at marcan.st>
> ---
> Changes in v2:
> - Squashed in a trivial fix for patch #1
> - Removed spurious blank line
> - Added a longer description to the cover letter
> - Link to v1: https://lore.kernel.org/r/20231027-usb-fixes-1-v1-0-1c879bbcd928@marcan.st
>
> ---
> Hector Martin (8):
>       usb: xhci: Guard all calls to xhci_wait_for_event
>       usb: xhci: Better error handling in abort_td()
>       usb: xhci: Allow context state errors when halting an endpoint
>       usb: xhci: Recover from halted bulk endpoints
>       usb: xhci: Fail on attempt to queue TRBs to a halted endpoint
>       usb: xhci: Do not panic on event timeouts
>       usb: xhci: Fix DMA address calculation in queue_trb
>       usb: xhci: Add more debugging
>
>  drivers/usb/host/xhci-ring.c | 99 ++++++++++++++++++++++++++++++++++++--------
>  drivers/usb/host/xhci.c      |  9 ++++
>  include/usb/xhci.h           |  2 +
>  3 files changed, 92 insertions(+), 18 deletions(-)
> ---
> base-commit: 7c0d668103abae3ec14cd96d882ba20134bb4529
> change-id: 20231027-usb-fixes-1-83bfc7013012
>

Series LGTM.

Reviewed-by: Neal Gompa <neal at gompa.dev>


-- 
真実はいつも一つ!/ Always, there's only one truth!


More information about the U-Boot mailing list