[U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot
Jörg Krause
jkrause at posteo.de
Sun Feb 8 22:25:24 CET 2015
On Fr, 2015-02-06 at 11:06 -0700, Stephen Warren wrote:
> On 02/05/2015 06:06 PM, Jörg Krause wrote:
> > On Do, 2015-02-05 at 15:23 -0700, Stephen Warren wrote:
> >>
> >> b) In ci_bounce(), the bounce buffer is only allocated if the
> >> user-buffer is already aligned, and if a large-enough bounce buffer
> >> wasn't previously allocated. If ci_req->b_buf was uninitialized it could
> >> be non-zero (thus preventing the expected aligned allocation) yet not
> >> actually aligned enough.
> >
> > I can reproduce this issue now. After some "timeout sending packets to
> > usb ethernet" messages, the bounce buffer somehow gets corrupted.
> > ci_bounce() is called with an unaligned input buffer length
> > 'req->length=66', but the bounce buffer length
> > 'ci_req->b_len=1140305940' or in hex 'ci_req->b_len=0x43f7b014'. This
> > bounce buffer length is obviously an address, as the following
> > misaligned error message shows: "CACHE: Misaligned operation at range
> > [43f7b010, 43f7b070]".
>
> Ah, I hadn't realized that was [start, length] rather than [start, end].
>
> The question is: How is ci_req->b_len getting corrupted? Is it simply
> never initialized, or does something trash that value later?
>
> ci_ep_alloc_request() appears to calloc() the whole struct ci_req, so I
> imagine an initialization/allocating error isn't happening.
>
> The only issue there might be some code somehow creating its own struct
> usb_request instead of calling into the controller's ->alloc_request()
> function. I vaguely recall fixing some of those, but might have missed
> some in protocols that I didn't test (i.e. anything other than USB Mass
> Storage or DFU, although I might have very briefly tested netconsole once?).
>
> I would suggest adding a whole ton of printfs() to catch where ci_reqs
> are being allocated, and where ci_req->b_len is getting written in which
> ci_req objects, and then mapping that back to the ci_req that the cache
> alignment error message complains about. Sorry, this will be a bit painful.
>
> If the ci_req is always at the same address on different boots of the
> code, that will make it easier, especially if you have a debugger with a
> data watchpoint, or can write some code to use any data watchpoint
> self-hosted debug capability in your CPU.
I think I found the answer.
I used a lot of debug messages and tried to understand the involved
drivers. I will try to sum up my investigations.
NetLoop is entered first for the ping protocol and then for the
netconsole protocol:
--- NetLoop Entry (PING)
--- NetLoop Entry (NETCONSOLE)
ci_udc is probed and ci_ep_alloc_request() is called for EP0:
ci_ep_alloc_request usb_ep:0x43fd0028 ci_req:0x43b83220
In eth_bind() ci_ep_alloc_request() is called for status EP:
ci_ep_alloc_request usb_ep:0x43fd00a0 ci_req:0x43b7b4a8
Everything is fine:
using ci_udc, OUT ep- IN ep- STATUS ep-
MAC 00:19:b8:00:00:02
HOST MAC 00:19:b8:00:00:01
In eth_set_config() ci_ep_alloc_request() is invoked again for tx_req
and rx_req:
ci_ep_alloc_request usb_ep:0x43fd0050 ci_req:0x43b7b568
ci_ep_alloc_request usb_ep:0x43fd0078 ci_req:0x43b7b5b0
Everything is fine:
high speed config #1: 2 mA, Ethernet Gadget, using CDC Ethernet
USB network up!
Now the NetLoop is available for netconsole:
--- NetState set to 0 (CONTINUE)
--- NetLoop Init (NETCONSOLE)
--- NetLoop ARP handler set (43fa3b14)
[..]
Got ARP REPLY, set eth addr (00:19:b8:00:00:01)
--- NetState set to 2 (SUCCESS)
--- NetLoop UDP handler set (00000000)
--- NetLoop ARP handler set (00000000)
--- NetLoop timeout handler cancelled
--- NetLoop Success! (NETCONSOLE)
Now it's pings turn in the NetLoop. The if (eth_is_on_demand_init())
branch is executed. eth_halt() and later eth_disconnect() and
eth_reset_config() are invoked. There the in and out EPs are freed:
ci_ep_free_request ci_req:0x43b7b568
ci_ep_free_request ci_req:0x43b7b5b0
However, netconsole tries to send every printf() as an UDP packet to the
host.
sending UDP to 10.0.0.1/00:19:b8:00:00:01
But the usb_request [1] pointer in usb_eth_send is NULL after the free
request:
usb_eth_send usb_request: 00000000 length: 82
usb_eth_send() calls ci_ep_queue() which calls ci_bounce() and because
of the corrupted ci_req pointer I get a misaligned cache.
So, as far as I understand, the main problem is that netconsole does not
knows that the connection is disconnected by the NetLoop?
What are your suggestions to this issue? Should we add an addition check
for an initialized usb_request pointer in usb_eth_send()? Or somewhere
else?
Best regards
Jörg Krause
[1]
http://git.denx.de/?p=u-boot.git;a=blob;f=drivers/usb/gadget/ether.c;h=ba442d5ed529bb04a88a41e630477b4a46b32d46;hb=HEAD#l2377
More information about the U-Boot
mailing list