[PATCH] usb: ehci: Fix "EHCI timed out on TD - token=XXXX" error on ehci-hcd

Lukasz Majewski lukma at denx.de
Mon Mar 2 14:01:07 CET 2020


Hi Marek,

> On 3/1/20 6:59 PM, Lukasz Majewski wrote:
> > Hi Marek,  
> 
> Hi,
> 
> [...]
> 
> >>> The description of the fix written by the original author:
> >>> https://forum.doozan.com/read.php?3,35295,35295#msg-35295
> >>>
> >>> states that the reduction of the transfer is done just for "spin
> >>> up"/"detection" of the pen drive. The issue is that not all pen
> >>> drive disks are able to be properly detected in the first place.
> >>>   
> >>
> >> But the commit message claims this patch reduces the transfer rate
> >> by 20% always. So that's a NAK right away, sorry,   
> > 
> > Marek, now we do have from time to time NOT functional USB (once per
> > ten attempts on iMX6Q - tpc70 board). If somebody relies on 'usb
> > start' for recovery/normal operation - it is bad.
> > 
> > In such a use case - the 20% performance drop is acceptable.  
> 
> I'm sure we can do better. 2020.04 is still quite far out.
> 
> Can you take a look ?

There are several issues which are fixed by this patch. 

I will reply to the actual patch set to point them out in the code, as
this may be more readable.

> 
> >> you need to find a
> >> better solution which doesn't have such an enormous impact.
> >>
> >> I'm fine with the spin-up handling, but this auto-detection of
> >> transfer size should be pulled into separate patch and is likely
> >> wrong anyway.  
> > 
> > I cannot agree that is it "likely wrong anyway" - please correct me
> > if I'm wrong, but up till now we were fiddling with transfer size
> > up till having "good enough" results.  
> 
> Correction, read the patches I linked above
> (7d6fd7f0ba71cd93d94079132f958d9630f27a89 

The above sha1 uses the observation (and heuristics) from other
operating systems to deal with usb pendrives, which controllers cannot
handle more than 256 sectors (as they have 8 bit counter).

> and esp.
> 02b0e1a36c5bc20174299312556ec4e266872bd6).

This patch introduces the optimization - instead of setup/disable async
transmissions - it enables it once and then just adds new tokens to be
transmitted.

Is the transmission error (USB_ST_XACTERR 0x80) handled at this point?

> Those solve the "we were
> fiddling around with transfer size" attempts, because I actually
> invested the time to understand the issue with those devices that
> failed and I was right about never letting any of those "fiddling
> around with transfer size" patches into mainline.
> 
> The problem with those devices which needed "fiddling with transfer
> size" was that their internal 16bit counters (or 8bit counters)
> overflew when a very long transfer was started and thus those devices
> failed. The fix there was to transfer less than that amount of data
> which triggered the overflow, which on a wast majority of devices is
> 240 blocks. However , this led to a massive slowdown, which is
> unacceptable. A fix for that slowdown was to avoid turning async
> schedule off-on between each transfer, but rather keep it running,
> that saves A LOT of wasted transfer time. If you build a long
> transfer up front, you would never do that many transfers to notice
> the delays incurred by turning the async schedule off/on, but if you
> do multiple shorter transfers, these delays add up.
> 

Thanks for the explanation.

> Yes, these issues are convoluted and take time to solve properly, yet
> the end result might not have such a performance hit.
> 
> > The above link describes what needs to be done in case of the "EHCI
> > timed out on TD - token=0x1f8c80" error.
> > 
> > Do we handle it now? In my opinion we don't.
> >   
> >>
> >> So try to separate this into two patches, one for handling the spin
> >> up, another one for this auto-detection of the transfer size. And I
> >> think you will end up needed only the first one.  
> > 
> > This is an open question if only the spin up fixes the error. It
> > would be great if you could look for the changes introduced in this
> > patch and compare it with your experience of ehci driver
> > development.  
> 
> Surely you can separate those two patches and perform such a test ?

I would prefer to first go through the actual patch code, before I
spent time on things which may be not necessary.

> 
> >>> Last but not least - the mainline is still broken. The
> >>> "token=0x1f8c80" errors pops up from time to time. However, after
> >>> applying the approach from this fix - the error is gone (and pass
> >>> some acceptance tests).    
> >>
> >> It works for me.  
> > 
> > Also works for Tom.
> >   
> >> Do you have a specific device which fails ?  
> > 
> > Yes. On my desk - tpc70 with i.MX6Q. And hdc with i.MX53. I can
> > confirm that from U-Boot 2018.04 up till now the rate of:
> > 
> > "EHCI timed out on TD - token=0x1f8c80" error has been reduced
> > significantly for their use case (reading ~16MiB of binary from
> > several pen drives).   
> 
> Can you run lsusb (to get IDs) on those pendrives which fail ?
> I would like to add them to a list.

This was working correctly on the beginning, but after some time it
started to show errors in question:

Bus 001 Device 040: ID 0930:6544 Toshiba Corp. TransMemory-Mini /
Kingston DataTraveler 2.0 Stick

This one was causing issues from the very beginning:
Bus 001 Device 041: ID 058f:6387 Alcor Micro Corp. Flash Drive

This one - Verbatim 
Bus 001 Device 042: ID 21c4:8005

Never caused issues - worked reliably even with old U-Boot (on which
errors were observed very often).

> 
> > Unfortunately the errors are still present from time to time.
> >   
> >> Which
> >> device and on which hardware ?
> >>
> >> Also, what is "some acceptance test" ?
> >>  
> > 
> > The set of automated tests to assess if "usb start" don't break.  
> 
> The usb start only reads out the partition table and does not do any
> long transfers (above 240 blocks, which usually triggers the failure
> on cheap sticks). So if long transfers fail for you, then this is a
> separate issue from the 'usb start' spin-up problem. And so, if this
> patch makes both work, then this indeed should be two patches.


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma at denx.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <https://lists.denx.de/pipermail/u-boot/attachments/20200302/b7a659d3/attachment.sig>


More information about the U-Boot mailing list