[U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot

Stephen Warren swarren at wwwdotorg.org
Mon Feb 9 18:38:52 CET 2015


On 02/08/2015 02:25 PM, Jörg Krause wrote:
> On Fr, 2015-02-06 at 11:06 -0700, Stephen Warren wrote:
>> On 02/05/2015 06:06 PM, Jörg Krause wrote:
>>> On Do, 2015-02-05 at 15:23 -0700, Stephen Warren wrote:
>>>>
>>>> b) In ci_bounce(), the bounce buffer is only allocated if the
>>>> user-buffer is already aligned, and if a large-enough bounce buffer
>>>> wasn't previously allocated. If ci_req->b_buf was uninitialized it could
>>>> be non-zero (thus preventing the expected aligned allocation) yet not
>>>> actually aligned enough.
>>>
>>> I can reproduce this issue now. After some "timeout sending packets to
>>> usb ethernet" messages, the bounce buffer somehow gets corrupted.
>>> ci_bounce() is called with an unaligned input buffer length
>>> 'req->length=66', but the bounce buffer length
>>> 'ci_req->b_len=1140305940' or in hex 'ci_req->b_len=0x43f7b014'. This
>>> bounce buffer length is obviously an address, as the following
>>> misaligned error message shows: "CACHE: Misaligned operation at range
>>> [43f7b010, 43f7b070]".
>>
>> Ah, I hadn't realized that was [start, length] rather than [start, end].
>>
>> The question is: How is ci_req->b_len getting corrupted? Is it simply
>> never initialized, or does something trash that value later?
>>
>> ci_ep_alloc_request() appears to calloc() the whole struct ci_req, so I
>> imagine an initialization/allocating error isn't happening.
>>
>> The only issue there might be some code somehow creating its own struct
>> usb_request instead of calling into the controller's ->alloc_request()
>> function. I vaguely recall fixing some of those, but might have missed
>> some in protocols that I didn't test (i.e. anything other than USB Mass
>> Storage or DFU, although I might have very briefly tested netconsole once?).
>>
>> I would suggest adding a whole ton of printfs() to catch where ci_reqs
>> are being allocated, and where ci_req->b_len is getting written in which
>> ci_req objects, and then mapping that back to the ci_req that the cache
>> alignment error message complains about. Sorry, this will be a bit painful.
>>
>> If the ci_req is always at the same address on different boots of the
>> code, that will make it easier, especially if you have a debugger with a
>> data watchpoint, or can write some code to use any data watchpoint
>> self-hosted debug capability in your CPU.
>
> I think I found the answer.
>
> I used a lot of debug messages and tried to understand the involved
> drivers. I will try to sum up my investigations.
>
> NetLoop is entered first for the ping protocol and then for the
> netconsole protocol:
>          --- NetLoop Entry (PING)
>          --- NetLoop Entry (NETCONSOLE)
>
> ci_udc is probed and ci_ep_alloc_request() is called for EP0:
>          ci_ep_alloc_request usb_ep:0x43fd0028 ci_req:0x43b83220
>
> In eth_bind() ci_ep_alloc_request() is called for status EP:
>          ci_ep_alloc_request usb_ep:0x43fd00a0 ci_req:0x43b7b4a8
>
> Everything is fine:
>          using ci_udc, OUT ep- IN ep- STATUS ep-
>          MAC 00:19:b8:00:00:02
>          HOST MAC 00:19:b8:00:00:01
>
> In eth_set_config() ci_ep_alloc_request() is invoked again for tx_req
> and rx_req:
>          ci_ep_alloc_request usb_ep:0x43fd0050 ci_req:0x43b7b568
>          ci_ep_alloc_request usb_ep:0x43fd0078 ci_req:0x43b7b5b0
>
> Everything is fine:
>          high speed config #1: 2 mA, Ethernet Gadget, using CDC Ethernet
>          USB network up!
>
> Now the NetLoop is available for netconsole:
>          --- NetState set to 0 (CONTINUE)
>          --- NetLoop Init (NETCONSOLE)
>          --- NetLoop ARP handler set (43fa3b14)
>          [..]
>          Got ARP REPLY, set eth addr (00:19:b8:00:00:01)
>          --- NetState set to 2 (SUCCESS)
>          --- NetLoop UDP handler set (00000000)
>          --- NetLoop ARP handler set (00000000)
>          --- NetLoop timeout handler cancelled
>          --- NetLoop Success! (NETCONSOLE)
>
> Now it's pings turn in the NetLoop. The if (eth_is_on_demand_init())
> branch is executed. eth_halt() and later eth_disconnect() and
> eth_reset_config() are invoked. There the in and out EPs are freed:
>          ci_ep_free_request ci_req:0x43b7b568
>          ci_ep_free_request ci_req:0x43b7b5b0

That sounds like a problem.

I would assume that once netconsole is enabled, the Ethernet connection 
is no longer on-demand, but should be initialized once, and then never 
taken down.

Take a look at the definition of eth_is_on_demand_init() in 
include/net.h. It has a special case for CONFIG_NETCONSOLE, based on 
variable net_loop_last_protocol. Perhaps that isn't getting set 
correctly, or perhaps the variable is getting set to something other 
than netconsole, thus causing the connection to get taken down?

> However, netconsole tries to send every printf() as an UDP packet to the
> host.
>          sending UDP to 10.0.0.1/00:19:b8:00:00:01
>
> But the usb_request [1] pointer in usb_eth_send is NULL after the free
> request:
>          usb_eth_send usb_request: 00000000 length: 82
>
> usb_eth_send() calls ci_ep_queue() which calls ci_bounce() and because
> of the corrupted ci_req pointer I get a misaligned cache.
>
> So, as far as I understand, the main problem is that netconsole does not
> knows that the connection is disconnected by the NetLoop?
>
> What are your suggestions to this issue? Should we add an addition check
> for an initialized usb_request pointer in usb_eth_send()? Or somewhere
> else?

It might be a useful debugging/validation aid to add a check somewhere 
for valid non-free struct usb_requests. However, it sounds like that's 
just a symptom; the underlying problem is that the Ethernet connection 
is taken down at all, while netconsole is active.


More information about the U-Boot mailing list