[U-Boot] [PATCH 4/4] ARM: bcm283x: Switch to generic timer
Stephen Warren
swarren at wwwdotorg.org
Fri May 8 18:06:41 CEST 2015
On 05/06/2015 01:51 PM, Tyler Baker wrote:
> On 6 May 2015 at 11:13, Marek Vasut <marex at denx.de> wrote:
>> On Wednesday, May 06, 2015 at 05:52:37 PM, Stephen Warren wrote:
>> [...]
>>>>>> So, if now is close to 0x7fffffff (which it can), then if endtime is
>>>>>> big-ish, diff will become negative and this udelay() will not perform
>>>>>> the correct delay, right ?
>>>>>
>>>>> I don't believe so, no.
>>>>>
>>>>> endtime and now are both unsigned. My (admittedly intuitive rather than
>>>>> well-researched) understanding of C math promotion rules means that
>>>>> "endtime - now" will be calculated as an unsigned value, then converted
>>>>> into a signed value to be stored in the signed diff. As such, I would
>>>>> expect the value of diff to be a small value in this case. I wrote a
>>>>> test program to validate this; endtime = 0x80000002, now = 0x7ffffffe,
>>>>> yields diff=4 as expected.
>>>>>
>>>>> Perhaps you meant a much larger endtime value than 0x80000002; perhaps
>>>>> 0xffffffff? This doesn't cause issues either. All that's relevant is the
>>>>> difference between endtime and now, not their absolute values, and not
>>>>> whether endtime has wrapped but now has or hasn't. For example, endtime
>>>>> = 0x00000002, now = 0xfffffff0 yields diff=18 as expected.
>>>>
>>>> So what if the difference is bigger than 1 << 31 ?
>>>
>>> As I said, I don't believe that case is relevant; it can only happen if
>>> passing ridiculously large delay values into __udelay() (i.e. greater
>>> than the 1<<31value you mention), and I don't believe there's any need
>>> to support that.
>>
>> So what you say is that it's OK to have a function which is buggy in
>> corner cases ?
>>
>>> The implementation in lib/time.c probably has exactly the same problem,
>>> except that since it uses 64-bit math rather than 32-bit math, so the
>>> issue happens at 1<<63 rather than 1<<31. It's probably equally
>>> problematic for delay values as large as 1<<63:-) In practice, given
>>> 1<<31 us is so large, I don't think there's any practical difference.
>>
>> The implementation in lib/time.c uses 32bit usec argument though, so
>> it's not prone to this overflow. Please correct me if I'm wrong.
>>
>>>>>>> Besides, what's passing a value >~36 minutes to udelay()?
>>>>>>
>>>>>> Nothing, but that doesn't mean we can have a possibly broken
>>>>>> implementation, right ?
>>>>>
>>>>> True. However, I'd expect that any specification for udelay would
>>>>> disallow such large parameter values, and hence its behaviour wouldn't
>>>>> be relevant if such values were passed.
>>>>
>>>> Do you think you can pick this patch and drop the "fixes overflow" part
>>>> or do you need resubmission ?
>>>
>>> Tom Rini (or in the past Albert Aribaud) actually apply the patches.
>>>
>>> Re: the patch description: I'd certainly be happy if it was re-written
>>> to say something more like "replace bcm2835-specific timer logic with
>>> common code to reduce the number of different implementations for the
>>> same thing".
>>
>> Tom, do you want a repost ?
>>
>>> I think you'd mentioned on IRC that this change fixed something
>>> USB-related for you, and I still don't understand how that could be
>>> possible. Perhaps there's some intermittent problem, and it just
>>> happened not to show up when you tested after this patch?
>>
>> I think Tyler can elaborate on that, but in his test case, he still
>> triggers the USB issue.
>
> I'll provide some context on the issue I'm fighting...
>
> I recently bought a RPi B+ Model, flashed the latest raspbian image[0]
> to an sd card, built the master branch (v2015.05+) u-boot and
> overwrote kernel.img with u-boot.bin. U-Boot came right up but I was
> unable to obtain a DHCP lease after using 'usb start; dhcp'. I ran
> tcpdump and saw the DHCP requests being made, but shortly after the
> board was seemly ignoring responses from the DHCP server. Now
> sometimes if the response from the DHCP server was quick enough, it
> could get a lease, but the tftp transfer would stall. I decided it
> would be best to set 'ipaddr, gateway, netmask' to see if this was a
> DHCP issue. It turned out that I had no network connectivity even when
> I configured the ip address statically.
>
> Could not obtain a lease:
>
> starting USB...
> USB0: Core Release: 2.80a
> scanning bus 0 for devices... 3 USB Device(s) found
> scanning usb for storage devices... 0 Storage Device(s) found
> scanning usb for ethernet devices... 1 Ethernet Device(s) found
> Waiting for Ethernet connection... done.
> BOOTP broadcast 1
> BOOTP broadcast 2
> BOOTP broadcast 3
> BOOTP broadcast 4
> BOOTP broadcast 5
> BOOTP broadcast 6
> BOOTP broadcast 7
> BOOTP broadcast 8
> Retry time exceeded; starting again
> Bad Linux ARM zImage magic!
>
> Obtains a lease, but stalls on transfer:
>
> starting USB...
> USB0: Core Release: 2.80a
> scanning bus 0 for devices... 3 USB Device(s) found
> scanning usb for storage devices... 0 Storage Device(s) found
> scanning usb for ethernet devices... 1 Ethernet Device(s) found
> Waiting for Ethernet connection... done.
> BOOTP broadcast 1
> DHCP client bound to address 192.168.2.55 (1328 ms)
> Waiting for Ethernet connection... done.
> Using sms0 device
> TFTP from server 192.168.2.2; our IP address is 192.168.2.55
> Filename 'tmp2rkX_N/.zImage'.
> Load address: 0x1000000
> Loading: * T T T T T T T T T T
> Retry count exceeded; starting again
> Bad Linux ARM zImage magic!
>
> At this point I reached out for help on IRC and that is when Marek and
> I starting chatting about this. Hacking around, I found that using
> v2015.01 I was able to obtain a lease and transfer files 100% of the
> time (although it seems very slow). Thinking that we have a good and
> bad commit, this would be easy to bisect. Wrong, we both tried and got
> different results. It seems that as you move from v2015.01 -> HEAD
> master this issue becomes very intermittent and thus hard to pin down.
>
> So my test case for this issue has become...
>
> * Obtain lease
> * Transfer kernel, dtb, ramdisk without stalling/timing out
> * Do this 10 times in a row with a power cycle in between
>
> Hope this help clarify the situation in some way,
OK, but if you apply Marek's change to replace the timer implementation
with the one in lib/time.c, does that reliably fix the issue when tested
over a large number of runs with/without that change? With the current
explanation, I can't see how it possibly could. Equally, I can't see why
the move from v2015.01 to HEAD would affect the issue if it was caused
by the timer implementation.
My suspicion is that there's something else entirely going on.
More information about the U-Boot
mailing list