[U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot

Jörg Krause jkrause at posteo.de
Wed Feb 11 23:08:38 CET 2015


On Mo, 2015-02-09 at 10:38 -0700, Stephen Warren wrote:
> On 02/08/2015 02:25 PM, Jörg Krause wrote:
> > On Fr, 2015-02-06 at 11:06 -0700, Stephen Warren wrote:
> >> On 02/05/2015 06:06 PM, Jörg Krause wrote:
> >>> On Do, 2015-02-05 at 15:23 -0700, Stephen Warren wrote:
> >>>>
> >>>> b) In ci_bounce(), the bounce buffer is only allocated if the
> >>>> user-buffer is already aligned, and if a large-enough bounce buffer
> >>>> wasn't previously allocated. If ci_req->b_buf was uninitialized it could
> >>>> be non-zero (thus preventing the expected aligned allocation) yet not
> >>>> actually aligned enough.
> >>>
> >>> I can reproduce this issue now. After some "timeout sending packets to
> >>> usb ethernet" messages, the bounce buffer somehow gets corrupted.
> >>> ci_bounce() is called with an unaligned input buffer length
> >>> 'req->length=66', but the bounce buffer length
> >>> 'ci_req->b_len=1140305940' or in hex 'ci_req->b_len=0x43f7b014'. This
> >>> bounce buffer length is obviously an address, as the following
> >>> misaligned error message shows: "CACHE: Misaligned operation at range
> >>> [43f7b010, 43f7b070]".
> >>
> >> Ah, I hadn't realized that was [start, length] rather than [start, end].
> >>
> >> The question is: How is ci_req->b_len getting corrupted? Is it simply
> >> never initialized, or does something trash that value later?
> >>
> >> ci_ep_alloc_request() appears to calloc() the whole struct ci_req, so I
> >> imagine an initialization/allocating error isn't happening.
> >>
> >> The only issue there might be some code somehow creating its own struct
> >> usb_request instead of calling into the controller's ->alloc_request()
> >> function. I vaguely recall fixing some of those, but might have missed
> >> some in protocols that I didn't test (i.e. anything other than USB Mass
> >> Storage or DFU, although I might have very briefly tested netconsole once?).
> >>
> >> I would suggest adding a whole ton of printfs() to catch where ci_reqs
> >> are being allocated, and where ci_req->b_len is getting written in which
> >> ci_req objects, and then mapping that back to the ci_req that the cache
> >> alignment error message complains about. Sorry, this will be a bit painful.
> >>
> >> If the ci_req is always at the same address on different boots of the
> >> code, that will make it easier, especially if you have a debugger with a
> >> data watchpoint, or can write some code to use any data watchpoint
> >> self-hosted debug capability in your CPU.
> >
> > I think I found the answer.
> >
> > I used a lot of debug messages and tried to understand the involved
> > drivers. I will try to sum up my investigations.
> >
> > NetLoop is entered first for the ping protocol and then for the
> > netconsole protocol:
> >          --- NetLoop Entry (PING)
> >          --- NetLoop Entry (NETCONSOLE)
> >
> > ci_udc is probed and ci_ep_alloc_request() is called for EP0:
> >          ci_ep_alloc_request usb_ep:0x43fd0028 ci_req:0x43b83220
> >
> > In eth_bind() ci_ep_alloc_request() is called for status EP:
> >          ci_ep_alloc_request usb_ep:0x43fd00a0 ci_req:0x43b7b4a8
> >
> > Everything is fine:
> >          using ci_udc, OUT ep- IN ep- STATUS ep-
> >          MAC 00:19:b8:00:00:02
> >          HOST MAC 00:19:b8:00:00:01
> >
> > In eth_set_config() ci_ep_alloc_request() is invoked again for tx_req
> > and rx_req:
> >          ci_ep_alloc_request usb_ep:0x43fd0050 ci_req:0x43b7b568
> >          ci_ep_alloc_request usb_ep:0x43fd0078 ci_req:0x43b7b5b0
> >
> > Everything is fine:
> >          high speed config #1: 2 mA, Ethernet Gadget, using CDC Ethernet
> >          USB network up!
> >
> > Now the NetLoop is available for netconsole:
> >          --- NetState set to 0 (CONTINUE)
> >          --- NetLoop Init (NETCONSOLE)
> >          --- NetLoop ARP handler set (43fa3b14)
> >          [..]
> >          Got ARP REPLY, set eth addr (00:19:b8:00:00:01)
> >          --- NetState set to 2 (SUCCESS)
> >          --- NetLoop UDP handler set (00000000)
> >          --- NetLoop ARP handler set (00000000)
> >          --- NetLoop timeout handler cancelled
> >          --- NetLoop Success! (NETCONSOLE)
> >
> > Now it's pings turn in the NetLoop. The if (eth_is_on_demand_init())
> > branch is executed. eth_halt() and later eth_disconnect() and
> > eth_reset_config() are invoked. There the in and out EPs are freed:
> >          ci_ep_free_request ci_req:0x43b7b568
> >          ci_ep_free_request ci_req:0x43b7b5b0
> 
> That sounds like a problem.
> 
> I would assume that once netconsole is enabled, the Ethernet connection 
> is no longer on-demand, but should be initialized once, and then never 
> taken down.

I took me a while to fully understand the concept behind the net driver.
The initial suppose for the net driver was to halt the underlying
ethernet driver after each protocol communication. This was changed for
speed-up purpose of netconsole. However, I think this broke the
communication for other protocols used together with netconsole.

> Take a look at the definition of eth_is_on_demand_init() in 
> include/net.h. It has a special case for CONFIG_NETCONSOLE, based on 
> variable net_loop_last_protocol. Perhaps that isn't getting set 
> correctly, or perhaps the variable is getting set to something other 
> than netconsole, thus causing the connection to get taken down?

The cause it, that after running a ping or tftp command, the ethernet
connection is halt. I managed to get it working for those two protocols
by including them to eth_is_on_demand_init() and some other parts in the
net, but I'm not quite satisfied with the solution. I will discuss the
with Joe in another mail.

> 
> > However, netconsole tries to send every printf() as an UDP packet to the
> > host.
> >          sending UDP to 10.0.0.1/00:19:b8:00:00:01
> >
> > But the usb_request [1] pointer in usb_eth_send is NULL after the free
> > request:
> >          usb_eth_send usb_request: 00000000 length: 82
> >
> > usb_eth_send() calls ci_ep_queue() which calls ci_bounce() and because
> > of the corrupted ci_req pointer I get a misaligned cache.
> >
> > So, as far as I understand, the main problem is that netconsole does not
> > knows that the connection is disconnected by the NetLoop?
> >
> > What are your suggestions to this issue? Should we add an addition check
> > for an initialized usb_request pointer in usb_eth_send()? Or somewhere
> > else?
> 
> It might be a useful debugging/validation aid to add a check somewhere 
> for valid non-free struct usb_requests. However, it sounds like that's 
> just a symptom; the underlying problem is that the Ethernet connection 
> is taken down at all, while netconsole is active.

You're right, it's just a symptom, but it would help in early detection
of possible bugs in any of the other drivers. Like the cache misaligned
debug message.



More information about the U-Boot mailing list