[U-Boot] [RFC][Timer API] Revised Specification - Implementation details

Fri May 27 08:54:23 CEST 2011

On Fri, May 27, 2011 at 4:33 PM, J. William Campbell
<jwilliamcampbell at comcast.net> wrote:
> On 5/26/2011 9:33 PM, Graeme Russ wrote:
>>
>> Hi Bill,
>>
> <snip>
>>

[massive snip]

OK, you have my ears pricked - Can you give me code samples for:

 - get_ticks()
 - sync_timbase() (no need to implement the whole lot if that is too
   much effort right now)
 - timer_isr()

that handle the following hardware tick counter scenarios:

a) 64-bit up counter
b) 64-bit down counter
c) 32-bit up counter, wraps at 65000
d) 16-bit microsecond up counter 0-999 which wraps and triggers a 16-bit
   millisecond up counter. Reading milliseconds latched microseconds and
   clears milliseconds (look in arch/x86/cpu/sc520/timer.c for example)
e) 24-bit counter occupying bits 2-25 of a 32-bit word (just to be
   difficult)
f) Any other option anyone cares to throw ;)

All of these options must be covered using:
 - Minimal global data (we would like it to work before relocation, but
   not mandatory - GD footprint would be nice)
 - All use the same sync_timebase function
 - Demonstrate using an ISR NOT synced to the hardware tick counter and
   an ISR that is
 - Parameters to get_ticks() and sync_timer() are permitted, but not for
   timer_isr() (naturally)

> I don't' see any reason to push this down to a lower level. It is just one
> more thing to get messed up across implementations.

Agreed

>>> detection in the non-interrupt case to sync_timebase as well.
>>> Sync_timebase
>>> can also invert the down-counting counters, removing that from get_ticks.
>>> The wrap detection code can be #ifdef out if one is using interrupts and
>>
>> Urghh - Why are you adding unnecessary ugliness - #ifdef in the middle of
>> code is usually a sign you are doing something wrong
>
> As I said, this is an optional optimization. I do not agree that an #ifdef
> in the middle of code indicates you have a bad design. Lots and Lots of
> ifdefs certainly indicates a bad design. An ifdef to eliminate code if some
> option is not selected is hardly such a strange thing, especially only a
> single #ifdef. However, feel free to not have it if you so desire.

OK, I'll let this slide for the moment - please include in above example

>>> offended by it's presence. Thanks for pointing this out and compelling me
>>> to
>>> reduce the number of cases! Making get_ticks more lightweight is a good
>>> idea
>>> in my opinion.
>>>>
>>>> Lets say you have a platform with a 32-bit tick counter running at a
>>>> reasonably long rollover time so you decide not to use interrupts. Then
>>>> you create a new platform with the same tick counter, but it runs much
>>>> faster and you realise you need to implement the interrupt routine to
>>>> make get_timer() work for long enough periods - Fine, you add an ISR
>>>> for the new platform that calls sync_timebase - No other changes are
>>>> required.
>>>>
>>>> The last thing we want is for the 64-bit tick counter to be conceptually
>>>> different across platforms
>>>>
>>>> I just realised - the ISR _does not need to call the sync_timebase at
>>>> all_
>>>> The ISR only needs to call get_ticks(), so it will be fast anyway
>>>
>>> The problem is that the way we previously detected wrapping does not work
>>> if
>>> the interrupt rate is == to the counter wrap time, which it essentially
>>> always is. If get_ticks is trying to update the wrap count when an
>>> interrupt
>>
>> Is it, always, on every platform?
>
> Yes, pretty much. You get a terminal count/counter underflow interrupt and
> that is it.

Not on sc520 - The micro/millisecond counter cannot be used to driver an
interrupt - you need to setup a seperate timer. I think the x86 internal
performance counters are the same

>>> comes in, it will do it wrong. If the interrupt rate was faster, it would
>>> work, because get_ticks would always know the correct answer. Consider
>>> the
>>> following. Get ticks is called by the counter overflowing (which resets
>>> it
>>> to 0) and executes with the counter value at 0. Later, at the next
>>> rollover,
>>> with no intervening calls to get ticks, the interrupt routine calls get
>>> ticks. Assuming negligible  interrupt latency, get_ticks just sees the
>>> counter is still 0, so it will not detect a wrap condition. So now you
>>> loose
>>
>> But wait a minute, don't you KNOW that the interrupt gets triggered on a
>> rollover so therefore there MUST have been a rollover, therefore there has
>> been 2^32 (or 2^16 or 2^whatever) ticks which you can just subtract from
>> the
>> last known tick count (which would be zero if there had been no get_timer
>> calls in between)
>
> Except if this interrupt was delayed because get_ticks turned off
> interrupts, get_ticks may have already compensated the number. When we then
> get the (delayed) interrupt, we will do it twice.

That would mean get_timer() got called EXACTLY on the rollover - A race
condition that should be avoided. But still, a race can still occur through
using disable/enable interrupts which could push the timer isr out

>>> a period. I thought by passing in a flag we could force get_ticks to do
>>> the
>>> right thing in this case, but since we must disable interrupts when we
>>> call
>>> get ticks from the get_timer routine, the outer call to get ticks may
>>> have
>>> already detected the rollover and the flag will force an additional
>>> rollover
>>> to be detected. It is a classic race condition. If we detect rollover
>>> only
>>
>> Aha! - In nearly all situations, a race is better avoided than run!
>
> Yes, that is why the new approach does NOT turn off interrupts when
> get_ticks is called, but rather loops if the overflow count was changed
> while reading the count lsb. That ensures get_ticks always gets both words
> either before the counter wrapped or after the counter wrapped, but not
> halfway in between. Some ISYNCs may be required to make sure there are no
> outstanding changes pending to memory, depending on your architecture.

OK, lets have a closer look

>>> in the interrupt routine, we do not need to protect the rollover variable
>>> from access by the main program, so we can be sure the results of
>>> get_ticks
>>> is coherent. If we try do do it in both places, we will have trouble. If
>>> the
>>> interrupt occurred at a faster rate, like say twice the rollover, we
>>> wouldn't have a problem. But it doesn't. In most cases it happens at the
>>> rollover rate, or just a bit more sometimes due to interrupt latency not
>>
>> Again does it really - for all arches?
>
> Yep, pretty much. Almost all timers I know of only give one interrupt per
> cycle.

A bold assumption - and one that is wrong ;)

>>> being a constant. It may appear to happen at somewhat less than a
>>> rollover
>>> as well, if the interrupt latency was long at time n-1 and short at time
>>> n.
>>> I think this problem can be overcome in the old design by keeping track
>>> of
>>> whether the last update was by a non-interrupt or interrupt call to
>>> get_ticks , but I think the new design is better anyway.
>>
>> I disagree - The whole point of a HAL is to Abstract the Hardware
>
> The HAL should abstract what needs abstracting, and no more. We are really
> talking about literally 4 lines of code here, but I think those four lines
> certainly belong in sync_timer. All the math in one place is a good thing.
> In most cases, get_ticks becomes trivial. Only in cases where the timer is
> wrong endian or embedded in another bit field (both things being possible
> but not all that common) does get_ticks need to do anything other than just
> read the registers and loop until the msb is stable. Simple here is good,
> because this is what new implementers need to add/change. The less here, the
> better! In fact, the looping until the msb is stable can also be moved up to
> sync_timer. That is even simpler.

OK, you got me - "show me the money^H^H^H^H^Hcode"

Regards,

Graeme