[PATCH] usb: ehci: Fix "EHCI timed out on TD - token=XXXX" error on ehci-hcd
Lukasz Majewski
lukma at denx.de
Mon Mar 2 14:21:50 CET 2020
Hi,
> This patch aims to improve robustness of 'usb' command operation on
> the ehci-hcd IP block as it ports to contemporary U-Boot the patch
> described and provided in [1] (originally applicable to U-Boot
> 2016.05).
>
> According to the fix author - "rayvt" (from [1]):
>
> TD stands for "Queue Element Transfer Descriptor (qTD)", which is a
> micro-code instruction for the EHCI USB chip.
> The "token" field is detailed information, control, and status of the
> TD's data transfer. In particular, the rightmost byte is the status
> field. 0x80 bit means the TD is active and the data transfer has not
> yet completed. 0x08 bit means there was some sort of data transfer
> error (XactErr).
^^^^^^^ - this error is NOT handled now in mainline u-boot. Do
you have any experience with it?
>
> If the drive gets a transfer error, it refuses to do any other I/O
> transfer until the error is properly cleared and reset. U-boot did
> not do this, so every subsequent disk read would get a timeout error
> because the drive would not send any data. The fix is to properly
> clear and reset the USB disk when it gets a transfer error.
>
> Every read operation starts at the maximum block size. When the USB
> pendrive is not able to correctly serve this data read request, the
> dynamic reduction of IO size is performed. Up to six tries (with
> smaller IO block each time) are attempted.
>
> A related problem is that some drives are slow to come up. Linux
> handles this by issuing a spinup command and allowing more time for
> the drive to respond. The same idea is applied in this fix.
>
> On TPC70 (i.MX6Q) once per ~10 times (without this patch):
>
> Bus usb at 2184200: USB EHCI 1.00
> scanning bus usb at 2184200 for devices... 2 USB Device(s) found
> scanning usb for storage devices... 1 Storage Device(s) found
> EHCI timed out on TD - token=0x1f8c80
This is how the error gets evident. The detailed explanation is in link
[1].
>
> Performance impact:
>
> With U-Boot -master in mainline:
> 16869440 bytes read in 979 ms (16.4 MiB/s)
>
> With this patch:
> 16869440 bytes read in 1219 ms (13.2 MiB/s)
>
> Links:
> [1] - https://forum.doozan.com/read.php?3,35295,35295#msg-35295
> [2] -
> https://www.dropbox.com/s/nrkrd1no63viuu8/uboot-bodhi-2016.05-timeoutTD.patch?dl=0
>
> Signed-off-by: Lukasz Majewski <lukma at denx.de>
> [Unfortunately, the original patch [2] did not contain S-o-B from the
> original author - "rayvt"]
> ---
>
> common/usb.c | 10 +++-
> common/usb_storage.c | 106
> +++++++++++++++++++++++++++++++++--- drivers/usb/host/ehci-hcd.c |
> 55 +++++++++++++++++-- include/usb.h | 1 +
> include/usb_defs.h | 1 +
> 5 files changed, 160 insertions(+), 13 deletions(-)
>
> diff --git a/common/usb.c b/common/usb.c
> index 349e838f1d..305482b5bb 100644
> --- a/common/usb.c
> +++ b/common/usb.c
> @@ -925,14 +925,20 @@ static int get_descriptor_len(struct usb_device
> *dev, int len, int expect_len) __maybe_unused struct
> usb_device_descriptor *desc; ALLOC_CACHE_ALIGN_BUFFER(unsigned char,
> tmpbuf, USB_BUFSIZ); int err;
> + int retry = 5;
>
> desc = (struct usb_device_descriptor *)tmpbuf;
>
> +again:
> err = usb_get_descriptor(dev, USB_DT_DEVICE, 0, desc, len);
> if (err < expect_len) {
> if (err < 0) {
> - printf("unable to get device descriptor
> (error=%d)\n",
> - err);
> + printf("unable to get device descriptor
> (error=%d) retry: %d\n",
> + err, retry);
> + mdelay(50);
> + if (--retry >= 0)
> + /* Some drives are just slow to wake
> up. */
> + goto again;
This I think is just to "wait" for the inserted device if it is rather
slow (or probably its memory got degraded over time).
This shouldn't bring any transfer slow down.
> return err;
> } else {
> printf("USB device descriptor short read
> (expected %i, got %i)\n", diff --git a/common/usb_storage.c
> b/common/usb_storage.c index 097b6729c1..48c8c2ae64 100644
> --- a/common/usb_storage.c
> +++ b/common/usb_storage.c
> @@ -111,6 +111,18 @@ int usb_stor_get_info(struct usb_device *dev,
> struct us_data *us, struct blk_desc *dev_desc);
> int usb_storage_probe(struct usb_device *dev, unsigned int ifnum,
> struct us_data *ss);
> +
> +#ifdef CONFIG_USB_EHCI_HCD
> + /*
> + * The U-Boot EHCI driver can handle any transfer length as
> long as
> + * there is enough free heap space left, but the SCSI
> READ(10) and
> + * WRITE(10) commands are limited to 65535 blocks.
> + */
> +int usb_max_xfer_blk = 4096;
Setting it to magic value - 4096 may be indeed problematic. However, I
don't know what is the 'usual' transfer size.
> +#else
> +int usb_max_xfer_blk = 20;
This is also a magic value - probably took from some old U-Boot (when
default transfer was set to this value).
I do guess that this snippet shall be removed from this patch set.
> +#endif
> +
> #if CONFIG_IS_ENABLED(BLK)
> static unsigned long usb_stor_read(struct udevice *dev, lbaint_t
> blknr, lbaint_t blkcnt, void *buffer);
> @@ -729,6 +741,7 @@ static int usb_stor_BBB_transport(struct scsi_cmd
> *srb, struct us_data *us) pipeout = usb_sndbulkpipe(us->pusb_dev,
> us->ep_out); /* DATA phase + error handling */
> data_actlen = 0;
> + mdelay(10); /* Like linux does. */
> /* no data, go immediately to the STATUS phase */
> if (srb->datalen == 0)
> goto st;
> @@ -740,6 +753,16 @@ static int usb_stor_BBB_transport(struct
> scsi_cmd *srb, struct us_data *us)
> result = usb_bulk_msg(us->pusb_dev, pipe, srb->pdata,
> srb->datalen, &data_actlen, USB_CNTL_TIMEOUT * 5);
> +
> + /* special handling of XACTERR in DATA phase */
> + if (result < 0 && (us->pusb_dev->status & USB_ST_XACTERR)) {
> + debug("XACTERR in data phase - clr, reset, and
> return fail.\n");
> + usb_stor_BBB_clear_endpt_stall(us, dir_in ?
> + us->ep_in :
> us->ep_out);
> + usb_stor_BBB_reset(us);
> + return USB_STOR_TRANSPORT_FAILED;
> + }
> +
Here the case of XACTERR is handled. This is missing now in -master.
> /* special handling of STALL in DATA phase */
> if ((result < 0) && (us->pusb_dev->status & USB_ST_STALLED))
> { debug("DATA:stall\n");
> @@ -1013,9 +1036,32 @@ static int usb_request_sense(struct scsi_cmd
> *srb, struct us_data *ss) return 0;
> }
>
> +/*
> + * This spins up the disk and also consumes the time that the
> + * disk takes to become active and ready to read data.
> + * Some drives (like Western Digital) can take more than 5 seconds.
> + * The delay occurs on the 1st data read from the disk.
> + * Extending the timeout here works better than handling the timeout
> + * as an error on a "real" read.
> + */
> +static int usb_spinup(struct scsi_cmd *srb, struct us_data *ss)
> +{
> + memset(&srb->cmd[0], 0, 12);
> + srb->cmd[0] = SCSI_START_STP;
> + srb->cmd[1] = srb->lun << 5;
> + srb->cmd[4] = 1; /* Start spinup. */
> + srb->datalen = 0;
> + srb->cmdlen = 6;
> + ss->pusb_dev->extra_timeout = 9876;
> + ss->transport(srb, ss);
> + ss->pusb_dev->extra_timeout = 0;
> + return 0;
> +}
> +
> static int usb_test_unit_ready(struct scsi_cmd *srb, struct us_data
> *ss) {
> int retries = 10;
> + int gave_extra_time = 0;
>
> do {
> memset(&srb->cmd[0], 0, 12);
> @@ -1038,6 +1084,17 @@ static int usb_test_unit_ready(struct scsi_cmd
> *srb, struct us_data *ss) if ((srb->sense_buf[2] == 0x02) &&
> (srb->sense_buf[12] == 0x3a))
> return -1;
> + /*
> + * If the status is "Not Ready - becoming ready",
> give it
> + * more time. Linux issues a spinup command (once)
> and gives
> + * it 100 seconds.
> + */
> + if (srb->sense_buf[2] == 0x02 &&
> + srb->sense_buf[12] == 0x04 &&
> + gave_extra_time == 0) {
> + retries = 100; /* Allow 10 seconds. */
> + gave_extra_time = retries;
> + }
The above code looks like targeting the HDDs, but some degraded pen
drives may need this time as well for proper operation.
> mdelay(100);
> } while (retries--);
>
> @@ -1166,22 +1223,57 @@ static unsigned long usb_stor_read(struct
> blk_desc *block_dev, lbaint_t blknr, block_dev->devnum, start, blks,
> buf_addr);
> do {
> - /* XXX need some comment here */
> + /*
> + * Probably most errors are USB errors, not hard
> disk error.
> + * Many disks use a USB chip that is flaky when
> doing large
> + * transfers. The workaround is to dynamically
> reduce the
> + * transfer size and allow an additional try.
> + * This should pick up flaky disks. Linux uses a
> quirks table.
> + * We'll use observation:
> + * Give it 1 try very large, 1 try large, 2 tries
> medium and 2
> + * tries small(ish).
> + * On a solid fail (actual disk error - which should
> be rare),
> + * this will give us 6 tries max, and only that many
> if the read
> + * is quite large.
> + * A fail on a very short read obviously doesn't
> have a
> + * too-large max_blks. Timeout due to spinup being
> a case in
> + * this point.
> + */
> retry = 2;
> srb->pdata = (unsigned char *)buf_addr;
> - if (blks > ss->max_xfer_blk)
> - smallblks = ss->max_xfer_blk;
> +retry_it:
> + if (blks > usb_max_xfer_blk)
> + smallblks = usb_max_xfer_blk;
> else
> smallblks = (unsigned short) blks;
> -retry_it:
> - if (smallblks == ss->max_xfer_blk)
> +
> + if (smallblks == usb_max_xfer_blk)
> usb_show_progress();
> +
> srb->datalen = block_dev->blksz * smallblks;
> srb->pdata = (unsigned char *)buf_addr;
> if (usb_read_10(srb, ss, start, smallblks)) {
> debug("Read ERROR\n");
> ss->flags &= ~USB_READY;
> usb_request_sense(srb, ss);
> +
> + /* Dynamically reduce the I/O size. */
> + if (smallblks > 2047) {
> + usb_max_xfer_blk = 2047;
> + ++retry;
> + } else if (smallblks > 512) {
> + usb_max_xfer_blk = 512;
> + ++retry;
> + } else if (smallblks > 511) {
> + usb_max_xfer_blk = 511;
> + ++retry;
> + } else if (smallblks > 63) {
> + usb_max_xfer_blk = 63;
> + retry += 2;
> + }
> + debug("step down usb_max_xfer_blk to %d\n",
> + usb_max_xfer_blk);
> +
This shall not be needed, as the ehci controller's driver will scatter
the transfers up till 240 blocks max. Am I right?
The above code was supposed to work before the 240 block limitation was
in place?
> if (retry--)
> goto retry_it;
> blkcnt -= blks;
> @@ -1196,8 +1288,7 @@ retry_it:
> start, smallblks, buf_addr);
>
> usb_disable_asynch(0); /* asynch transfer allowed */
> - if (blkcnt >= ss->max_xfer_blk)
> - debug("\n");
> +
> return blkcnt;
> }
>
> @@ -1492,6 +1583,7 @@ int usb_stor_get_info(struct usb_device *dev,
> struct us_data *ss, dev_desc->log2blksz = LOG2(dev_desc->blksz);
> dev_desc->type = perq;
> debug(" address %d\n", dev_desc->target);
> + usb_spinup(pccb, ss);
As above - this may be required for some "slower" (i.e. degraded) disks.
>
> return 1;
> }
> diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c
> index 1cc02052f5..c4041f1e9c 100644
> --- a/drivers/usb/host/ehci-hcd.c
> +++ b/drivers/usb/host/ehci-hcd.c
> @@ -315,7 +315,9 @@ ehci_submit_async(struct usb_device *dev,
> unsigned long pipe, void *buffer, int timeout;
> int ret = 0;
> struct ehci_ctrl *ctrl = ehci_get_ctrl(dev);
> + int trynum;
>
> + debug("\n***** %s ****\n", __func__);
> debug("dev=%p, pipe=%lx, buffer=%p, length=%d, req=%p\n",
> dev, pipe, buffer, length, req);
> if (req != NULL)
> @@ -557,6 +559,15 @@ ehci_submit_async(struct usb_device *dev,
> unsigned long pipe, void *buffer, ehci_writel(&ctrl->hcor->or_usbsts,
> (usbsts & 0x3f));
> /* Enable async. schedule. */
> + trynum = 1; /* No more than 2 tries, in case of
> XACTERR. */
> + /* When the 1st try gets xacterr,
> + * 2nd try gets xacterr and often babble and/or halted.
> + * 3rd try times out.
> + * After the 2nd try, the disk has recovered, so we need to
> clear and
> + * reset the USB port, then return fail so the upper layer
> can retry.
> + */
> +retry_xacterr:
> + vtd = &qtd[qtd_counter - 1];
> cmd = ehci_readl(&ctrl->hcor->or_usbcmd);
> if (!(cmd & CMD_ASE)) {
> cmd |= CMD_ASE;
> @@ -572,8 +583,9 @@ ehci_submit_async(struct usb_device *dev,
> unsigned long pipe, void *buffer,
> /* Wait for TDs to be processed. */
> ts = get_timer(0);
> - vtd = &qtd[qtd_counter - 1];
> +
> timeout = USB_TIMEOUT_MS(pipe);
> + timeout += dev->extra_timeout;
> do {
> /* Invalidate dcache */
> invalidate_dcache_range((unsigned
> long)&ctrl->qh_list, @@ -588,9 +600,10 @@ ehci_submit_async(struct
> usb_device *dev, unsigned long pipe, void *buffer, break;
> WATCHDOG_RESET();
> } while (get_timer(ts) < timeout);
> - qhtoken = hc32_to_cpu(qh->qh_overlay.qt_token);
>
> - ctrl->qh_list.qh_link =
> cpu_to_hc32(virt_to_phys(&ctrl->qh_list) | QH_LINK_TYPE_QH);
> + debug("took %4lu ms of %4d\n", get_timer(ts), timeout);
> + ctrl->qh_list.qh_link =
> + cpu_to_hc32(virt_to_phys(&ctrl->qh_list) |
> QH_LINK_TYPE_QH); flush_dcache_range((unsigned long)&ctrl->qh_list,
> ALIGN_END_ADDR(struct QH, &ctrl->qh_list, 1));
>
> @@ -605,14 +618,48 @@ ehci_submit_async(struct usb_device *dev,
> unsigned long pipe, void *buffer, */
> if (buffer != NULL && length > 0)
> invalidate_dcache_range((unsigned long)buffer,
> - ALIGN((unsigned long)buffer + length,
> ARCH_DMA_MINALIGN));
> + ALIGN((unsigned long)buffer + length,
> + ARCH_DMA_MINALIGN));
>
> /* Check that the TD processing happened */
> if (QT_TOKEN_GET_STATUS(token) & QT_TOKEN_STATUS_ACTIVE)
> printf("EHCI timed out on TD - token=%#x\n", token);
>
> + /* Disable async schedule. */
> + cmd = ehci_readl(&ctrl->hcor->or_usbcmd);
> + cmd &= ~CMD_ASE;
> + ehci_writel(&ctrl->hcor->or_usbcmd, cmd);
> +
> + ret = handshake((uint32_t *)&ctrl->hcor->or_usbsts, STS_ASS,
> 0,
> + 100 * 1000);
> + if (ret < 0) {
> + printf("EHCI fail timeout STS_ASS reset\n");
> + goto fail;
> + }
> +
> + qhtoken = hc32_to_cpu(qh->qh_overlay.qt_token);
> if (!(QT_TOKEN_GET_STATUS(qhtoken) &
> QT_TOKEN_STATUS_ACTIVE)) { debug("TOKEN=%#x\n", qhtoken);
> + if (qhtoken & QT_TOKEN_STATUS_XACTERR) {
^^^^^^^^^^^^^^^^^^^^^^^^ - this flag
before this patch was also not checked.
> + if (--trynum >= 0) {
> + /*
> + * It is necessary to do this,
> otherwise the
> + * disk is clagged.
> + */
> + debug("reset the TD and redo,
> because of XACTERR\n");
> + qhtoken &= ~QT_TOKEN_STATUS_HALTED;
> + qhtoken |= QT_TOKEN_STATUS_ACTIVE |
> + QT_TOKEN_CERR(2);
> + vtd->qt_token = cpu_to_hc32(qhtoken);
> + qh->qh_overlay.qt_token =
> cpu_to_hc32(qhtoken);
> + goto retry_xacterr;
> + }
> + dev->status = USB_ST_XACTERR;
> + dev->act_len = length -
> + QT_TOKEN_GET_TOTALBYTES(qhtoken);
This may solve the issue - as it resets the controller in the case of
transfer error.
> + goto fail;
> + }
> +
> switch (QT_TOKEN_GET_STATUS(qhtoken) &
> ~(QT_TOKEN_STATUS_SPLITXSTATE |
> QT_TOKEN_STATUS_PERR)) { case 0:
> diff --git a/include/usb.h b/include/usb.h
> index efb67ea33f..41c8c47f89 100644
> --- a/include/usb.h
> +++ b/include/usb.h
> @@ -140,6 +140,7 @@ struct usb_device {
> int act_len; /* transferred bytes */
> int maxchild; /* Number of ports if
> hub */ int portnr; /* Port number, 1=first */
> + int extra_timeout; /* Add to timeout in
> ehci_submit_async */ #if !CONFIG_IS_ENABLED(DM_USB)
> /* parent hub, or NULL if this is the root hub */
> struct usb_device *parent;
> diff --git a/include/usb_defs.h b/include/usb_defs.h
> index 6dd2c997f9..572f6ab296 100644
> --- a/include/usb_defs.h
> +++ b/include/usb_defs.h
> @@ -197,6 +197,7 @@
> #define USB_ST_NAK_REC 0x10 /* NAK Received*/
> #define USB_ST_CRC_ERR 0x20 /* CRC/timeout Error */
> #define USB_ST_BIT_ERR 0x40 /* Bitstuff error */
> +#define USB_ST_XACTERR 0x80 /* XACTERR error */
> #define USB_ST_NOT_PROC 0x80000000L /* Not yet
> processed */
>
Best regards,
Lukasz Majewski
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma at denx.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <https://lists.denx.de/pipermail/u-boot/attachments/20200302/30fd31d6/attachment.sig>
More information about the U-Boot
mailing list