[PATCH v1] usb: xhci: Check return value of wait for TRB_TRANSFER event

Thu Oct 19 04:46:37 CEST 2023


On 2023/10/18 18:55, Marek Vasut wrote:
> On 10/18/23 12:16, Minda Chen wrote:
>>
>>
>> On 2023/10/18 18:11, Marek Vasut wrote:
>>> On 10/18/23 05:46, Minda Chen wrote:
>>>>
>>>>
>>>> On 2023/10/18 10:35, Marek Vasut wrote:
>>>>> On 10/18/23 03:22, Minda Chen wrote:
>>>>>>
>>>>>>
>>>>>> On 2023/10/17 19:20, Marek Vasut wrote:
>>>>>>> On 10/17/23 08:20, Minda Chen wrote:
>>>>>>>> xhci_wait_for_event() waiting TRB_TRANSFER event may return
>>>>>>>> NULL. Checking the return value to avoid crash.
>>>>>>>>
>>>>>>>> Signed-off-by: Minda Chen <minda.chen at starfivetech.com>
>>>>>>>
>>>>>>> How did you trigger this error ? Is there a reproducer ? Details please ...
>>>>>>
>>>>>> While Scanning a lenovo usb2.0 udisk， not 100 % reproduce
>>>>>
>>>>> Can you include Linux
>>>>>
>>>>> lsusb -vvv
>>>>>
>>>>> output for this device and include that information in the commit message ? (or the U-Boot info below, that works too, just please add it into the commit message, it is important for future reference).
>>>>>
>>>> OK， I will add lsusb -vvv Linux udisk message and crash dump info to commit message
>>>
>>> Thank you
>>>
>>>>>> This is log.
>>>>>>
>>>>>> StarFive # usb reset
>>>>>> resetting USB...
>>>>>> Bus xhci_pci: Register 5000420 NbrPorts 5
>>>>>> Starting the controller
>>>>>> USB XHCI 1.00
>>>>>> scanning bus xhci_pci for devices... WARN halted endpoint, queueing URB anyway.
>>>>>> Unexpected XHCI event TRB, skipping... (f77141f0 00000000 13000000 02008401)
>>>>>> Unhandled exception: Load access fault
>>>>>> EPC: 00000000f7f563c6 RA: 00000000f7f563c6 TVAL: 000000000000000c
>>>>>> EPC: 000000004024a3c6 RA: 000000004024a3c6 reloc adjusted
>>>>>
>>>>> Where does the crash point to in code, can you disassemble the PC pointer ? (or maybe you can use scripts/decodecode I think)
>>>>>
>>>> OK， I will add EPC pointer disassemble  to commit message
>>>
>>> This part probably doesn't need to be in the commit message. I'd like to know where the crash occurred in the code.
>>
>>
>> 000000004024a376 <abort_td>:
>> {
>>      4024a376:   7179                    addi    sp,sp,-48
>>      4024a378:   f406                    sd      ra,40(sp)
>>      4024a37a:   f022                    sd      s0,32(sp)
>>      4024a37c:   ec26                    sd      s1,24(sp)
>>      4024a37e:   e84a                    sd      s2,16(sp)
>>      4024a380:   e44e                    sd      s3,8(sp)
>>      4024a382:   e052                    sd      s4,0(sp)
>>      4024a384:   89ae                    mv      s3,a1
>>      4024a386:   84aa                    mv      s1,a0
>>          struct xhci_ctrl *ctrl = xhci_get_ctrl(udev);
>>      4024a388:   8c4fe0ef                jal     ra,4024844c <xhci_get_ctrl>
>>          struct xhci_ring *ring =  ctrl->devs[udev->slot_id]->eps[ep_index].ring;
>>      4024a38c:   6785                    lui     a5,0x1
>>      4024a38e:   94be                    add     s1,s1,a5
>>      4024a390:   9444a603                lw      a2,-1724(s1)
>>      4024a394:   00198713                addi    a4,s3,1
>>      4024a398:   0712                    slli    a4,a4,0x4
>>      4024a39a:   02061793                slli    a5,a2,0x20
>>      4024a39e:   9381                    srli    a5,a5,0x20
>>      4024a3a0:   07c9                    addi    a5,a5,18
>>      4024a3a2:   078e                    slli    a5,a5,0x3
>>      4024a3a4:   97aa                    add     a5,a5,a0
>>      4024a3a6:   679c                    ld      a5,8(a5)
>>          xhci_queue_command(ctrl, NULL, udev->slot_id, ep_index, TRB_STOP_RING);
>>      4024a3a8:   2981                    sext.w  s3,s3
>>      4024a3aa:   86ce                    mv      a3,s3
>>          struct xhci_ring *ring =  ctrl->devs[udev->slot_id]->eps[ep_index].ring;
>>      4024a3ac:   97ba                    add     a5,a5,a4
>>          xhci_queue_command(ctrl, NULL, udev->slot_id, ep_index, TRB_STOP_RING);
>>      4024a3ae:   4581                    li      a1,0
>>      4024a3b0:   473d                    li      a4,15
>>          struct xhci_ring *ring =  ctrl->devs[udev->slot_id]->eps[ep_index].ring;
>>      4024a3b2:   0087ba03                ld      s4,8(a5) # 1008 <_start-0x401feff8>
>>          struct xhci_ctrl *ctrl = xhci_get_ctrl(udev);
>>      4024a3b6:   842a                    mv      s0,a0
>>          xhci_queue_command(ctrl, NULL, udev->slot_id, ep_index, TRB_STOP_RING);
>>      4024a3b8:   d75ff0ef                jal     ra,4024a12c <xhci_queue_command>
>>          event = xhci_wait_for_event(ctrl, TRB_TRANSFER);
>>      4024a3bc:   02000593                li      a1,32
>>      4024a3c0:   8522                    mv      a0,s0
>>      4024a3c2:   ebdff0ef                jal     ra,4024a27e <xhci_wait_for_event>
>>          field = le32_to_cpu(event->trans_event.flags);
>> epc-> 4024a3c6:   455c                    lw      a5,12(a0)
> 
> So the fault occurs when reading the controller register(s), do I understand it right ?
> 
I think it is right. Actually this error occur in error path, control tx transfer TRB_TRANSFER error occur and jump to error path.
sending TRB_TRANSFER again.  
> Could it be the problem is rather some clock, which are turned off after a fault ?
I think not. Just this udisk can reproduce this issue.