[PATCH] usb: ehci: Fix "EHCI timed out on TD - token=XXXX" error on ehci-hcd
Marek Vasut
marex at denx.de
Tue Mar 3 13:29:11 CET 2020
On 3/3/20 12:25 AM, Lukasz Majewski wrote:
> Hi Marek,
Hi,
[...]
>>>> Every read operation starts at the maximum block size. When the USB
>>>> pendrive is not able to correctly serve this data read request, the
>>>> dynamic reduction of IO size is performed. Up to six tries (with
>>>> smaller IO block each time) are attempted.
>>>>
>>>> A related problem is that some drives are slow to come up. Linux
>>>> handles this by issuing a spinup command and allowing more time for
>>>> the drive to respond. The same idea is applied in this fix.
>>>>
>>>> On TPC70 (i.MX6Q) once per ~10 times (without this patch):
>>>>
>>>> Bus usb at 2184200: USB EHCI 1.00
>>>> scanning bus usb at 2184200 for devices... 2 USB Device(s) found
>>>> scanning usb for storage devices... 1 Storage Device(s)
>>>> found EHCI timed out on TD - token=0x1f8c80
>>>
>>> This is how the error gets evident. The detailed explanation is in
>>> link [1].
>>
>> Is there one specific post in that forum, or do I need to read through
>> the whole multi-page thread ?
>
> There is a thread pointed out by the below link which explains the
> issue thoroughly:
> https://forum.doozan.com/read.php?3,35295,35295#msg-35295
>
> (The first 2 posts from 'rayvt' explains the problem).
So basically a drive spin-up problem and the problem with stick counter
overflow.
btw QH/qTD are not microcode instructions, they are just plain DMA
descriptors. And the USB controller in Marvell Kirkwood/Orion SoCs is
on-SoC, it's not a discrete chip.
>> I would expect that if you run -- I assume 'usb reset' (the command is
>> missing above) -- then the bus gets power-cycled, hence the USB device
>> also gets power-cycled.
>
> Unfortunately, the 'usb reset' is not fixing the issue. The controller
> is unresponsive and the only way for recovering is a power cycle.
My guess would be it might be related to the async schedule, so that's
where I would start debugging.
[...]
>>>> + qhtoken = hc32_to_cpu(qh->qh_overlay.qt_token);
>>>> if (!(QT_TOKEN_GET_STATUS(qhtoken) &
>>>> QT_TOKEN_STATUS_ACTIVE)) { debug("TOKEN=%#x\n", qhtoken);
>>>> + if (qhtoken & QT_TOKEN_STATUS_XACTERR) {
>>> ^^^^^^^^^^^^^^^^^^^^^^^^ - this flag
>>> before this patch was also not
>>> checked.
>>>> + if (--trynum >= 0) {
>>>> + /*
>>>> + * It is necessary to do this,
>>>> otherwise the
>>>> + * disk is clagged.
>>>> + */
>>>> + debug("reset the TD and redo,
>>>> because of XACTERR\n");
>>>> + qhtoken &=
>>>> ~QT_TOKEN_STATUS_HALTED;
>>>> + qhtoken |= QT_TOKEN_STATUS_ACTIVE
>>>> |
>>>> + QT_TOKEN_CERR(2);
>>>> + vtd->qt_token =
>>>> cpu_to_hc32(qhtoken);
>>>> + qh->qh_overlay.qt_token =
>>>> cpu_to_hc32(qhtoken);
>>>> + goto retry_xacterr;
>>>> + }
>>>> + dev->status = USB_ST_XACTERR;
>>>> + dev->act_len = length -
>>>> + QT_TOKEN_GET_TOTALBYTES(qhtoken);
>>>>
>>>
>>> This may solve the issue - as it resets the controller in the case
>>> of transfer error.
>>
>> Do you need to reset the controller ? Or is there some graceful way
>> out, i.e. a way to recover from the error ?
>
> The original patch was developed when the async transmission was start
> and stopped for each transfer (before your optimization), hence the code
> for disabling it.
>
> I need to check if it would be enough to just clear the
> QT_TOKEN_STATUS_HALTED and set QT_TOKEN_STATUS_ACTIVE again.
OK
More information about the U-Boot
mailing list