[U-Boot] [RFC][Timer API] Revised Specification - Implementation details

J. William Campbell jwilliamcampbell at comcast.net
Fri May 27 05:54:26 CEST 2011


On 5/26/2011 6:51 PM, Graeme Russ wrote:
> Hi Bill,
>
> On Fri, May 27, 2011 at 11:26 AM, J. William Campbell
> <jwilliamcampbell at comcast.net>  wrote:
>> On 5/26/2011 4:28 PM, Graeme Russ wrote:
>>> Why mess around with bit shifting (which you would then have to cludge
>>> into
>>> your platform code) when carting around a 64-bit value is relatively
>>> cheap,
>>> transparent and poratble (all all supported up-to-date tool chains)
>>>
>> I really STRONGLY disagree with this statement. If you actually needed 64
>> bit variables, fine use them. But as I have already shown, you do not need
>> them in general.  We are computing a 32 bit result. There is some entropy
>> argument that says you shouldn't need 64 bits to do so. Another way to look
>> at it is that converting the top 32 bit word and the bottom 32 bit word to
>> ms separately can be easier than doing them both together at once.  However,
>> as we will see below, I do agree we need two 32 bit words to make this
>> process go smoothly. I just don't agree that they should/will constitute a
>> 64 bit binary word. See below.
>>>>>   - void timer_isr()
>>>>>      - Optional (particularly if tick counter rollover period is
>>>>>        exceptionally log which is usually the case for native 64-bit tick
>>>>>        counters)
>>>>>      - Simply calls sync_timebase()
>>>>>      - For platforms without any tick counter, this can implement one
>>>>>        (but accuracy will be harmed due to usage of disable_interrupts()
>>>>> and
>>>>>        enable_interrupts() in U-Boot
>>>>>
>>>>> So to get the new API up and running, only two functions are mandatory:
>>>>>
>>>>> get_ticks() which reads the hardware tick counter and deals with any
>>>>> 'funny
>>>>> stuff' including rollovers, short timers (12-bit for example), composite
>>>>> counters (16-bit 0-999 microsecond + 16-bit millisecond) and maintains a
>>>>> 'clean' 64-bit tick counter which rolls over from all 1's to all 0's.
>>>>> The
>>>> I think it is the task of get_ticks to return the hardware tick counter
>>>> as
>>>> an  increasing counter, period.  The counter may wrap at some final count
>>>> that is not all ones. That is ok. Sync_timebase deals with the rollovers
>>>> if
>>> The hardware tick counter may, the 64-bit software tick counter maintained
>>> by get_ticks() may not
>>>> necessary. get_ticks is very lightweight. get_ticks should deal with
>>>> decrementing counters by returning the complement of the counter.  The
>>>> sc520
>>>> case is a bit more complex if you intend to use the 0-999 and 16 bit
>>>> millisec registers, in that you do need to add them to the previous value
>>>> to
>>> As I mentioned in another post, this is a problem for the platform
>>> maintainer and is abstracted away throught the platform specific
>>> implementation of get_ticks()
>>>
>>>> make an increasing counter. Sync_timebase "likes" short counters in that
>>>> they are easy to convert to millisec and tick remainders.
>>> The compiler should handle using 64-bit rather than 32-bit transparently
>> True enough.  But you don't need 64 bit variables at this point two 32 bit
>> ones work just fine, in fact better in most cases.
> Remember, we are not dealing with a high performance OS here. The primary
> goal is portability - Performance optimisations (which do not break
> portability) can be performed later
>
>>>>> 64-bit tick counter does not need to be reset to zero ever (even on
>>>>> startup
>>>>> - sync_timebase tacks care of all the details)
>>>> True, but sync_timebase does have to be initialized (as does the timer
>>>> itself in most cases, so this is not a restriction).
>>> This can be done in timer_init() via a call to sync_timebase() after the
>>> timer has been configured. This should bring everything into line
>>>
>>>>> ticks_per_millisecond() simply return the number of ticks in a
>>>>> millisecond
>>>>> - This may as simple as:
>>>>>
>>>>> inline u64 ticks_per_millisecond(void)
>>>>> {
>>>>>         return CONFIG_SYS_TICK_PER_MS;
>>>>> }
>>>>>
>>>>> But it may be trickier if you have a programmable tick frequency
>>>> You will have to call the routine that initializes sync_timebase. This
>>>> routine should have a name, like void init_sync_timebase(void)?
>>>>> The optional timer ISR is required if the tick counter has a short roll
>>>>> over duration (short is up to you - 1 second is short, 1 hour might be,
>>>>> 1
>>>>> century is not)
>>>>>
>>>>> Regards,
>>>>>
>>>>> Graeme
>>>>>
>>>> It is probably true that sync_timebase should have a parameter flag. The
>>>> reason is that if the timer isr is called only when the timer wraps, then
>>>> the calls to sync_timebase may be slightly more than a full timer period
>>>> apart. (due to interrupt latency). Therefore, when the timer difference
>>>> is
>>>> computed, if the current update is due to a wrap AND the previous update
>>>> is
>>>> due to a wrap, the difference should be approximately 1 wrap. If it comes
>>>> up
>>>> real short, you must add a wrap. This isn't necessary if the routine is
>>>> called more often than once per wrap. Also, when sync_timebase is called
>>>> in
>>> timer_isr() MUST be called more often than the rollover period of the
>>> underlying hardware tick counter - This is a requirement
>> The equality case can be made to work.  If the extension of the counter is
>> done in the interrupt routine, not in get_ticks, get_ticks just needs to
>> read the msb of the counter, read the lsb of the counter, then verify that
>> the msb has not changed. If you have interrupts that work, that is the
>> easiest way to go. If the lsb of the counter has represents 1 ms or less,
>> you can just drop it (equivalent to the what the PPC does now). If the
>> interrupt is slower than that, you must use at least part of the LSB. If you
>> don't have interrupts, the point is moot.
> So now we have a complicated ISR and a complicated get_ticks() and you
> have to change get_ticks() when you decide to implement an ISR
>
>>>> get_timer, you must first disable interrupts and then enable interrupts
>>>> after sync_timebase returns
>>> Why? - get_ticks() provides an atomic read of the hardware tick counter.
>>> If get_ticks() needs to disable and enable interrupts to do so, that is a
>>> problem for the platform maintainer
>>>
>>> Admittedly, sync_timebase() will not be re-entrant, but how will it ever
>>> be called concurrently? - Ah, I see - a call to get_timer() interrupted
>>> by the timer ISR :)
>> Yes, that is the problem. I have come to the view that  two 32 bit words are
>> the best approach. Note that the lsb may actually not fill the full 32 bits.
> Urghhh
Hi All,
     Consider the case where we have a 16 bit counter clocked by 1.0 MHz 
(a nice number) interrupting on overflow. We could return (overflow 
counter << 16) | hardware counter from the get_ticks routine, which 
maintains the "binaryness" of the "number".
OTOH, suppose the counter is a down counter clocked at 1.19318 MHz with 
a 1 ms period, like the PC counter is set up at present.  What do we 
return then?   Whatever we do, it depends on the clock  rate, which may 
change, and involves some math, which may not work for all clock rates. 
In effect, you have a dual radix number. Yes, conversion is possible but 
it is messy and would be present in different forms in all the various 
get_tick routines. This is neither simple nor maintainable. Further, it 
is un-necessary, as the sync_timer routine is just going to convert the 
number from whatever radix we converted it to into millisec.  If we 
leave the two numbers as split, all that complexity is removed from 
get_ticks and sent upward to the common routine that converts the answer 
into ms anyway. This makes the system more maintainable by placing the 
minimum requirement on get_ticks. The "tick" should be opaque to anybody 
but sync_timebase anyway.
>> The top 32 bits are the rollover count and the bottom 32 bits are the
>> current counter. If the counter is a full 32 bits, so much the better.
> Ahhhhh - Lets keep it that way
>
>> Again, one could put them together inside the interrupt routine , but it is
>> easier to check for a changed value if you don't do this. Otherwise, you
>> have to check both words. It also makes the isr faster. It is just an
> As I said before - Simple First, Fast Later
I am in favor of simple. That is why I want get_ticks to do as little as 
possible. It should just read the hardware register and the overflow 
counter if it is separate. Make sure the overflow didn't change while we 
were reading. This is redundant if we are not using interrupts but we 
can leave the code in. It just won't do anything.  We can also move the 
rollover detection to sync_timebase. It will be redundant if we are 
using interrupts, because time will never "back up". But we can do it 
this way. This centralizes the overflow detection, which is a good thing.
>> increment of the overflow counter, like the PPC is now. I happen to think it
>> is easier to convert the two 32 bit words to milliseconds one at a time, but
>> if you feel you must use 64 bit words, that is fine too. Just remember, the
>> counter does not always fill the entire bottom word.
> Urghhh
>
>> In cases where there are no interrupts, get_ticks has to detect that the
>> timer has "backed up" and increment the overflow counter itself, unless the
>> counter is 64 bits to begin with and overflow is impossible anyway.
>> get_ticks should NOT try to detect overflows if interrupts are available. If
>> it got both words before an interrupt happened, the answer is correct. If it
>> got an interrupt in between fetching the words, the event will be detected
>> and the value re-fetched. All sync_timebase would do now is convert the
>> returned value to milliseconds.
>>
>> So, if you have a 64 bit hardware counter, get_ticks reads and returns it.
>> Else if you have interrupts, get_ticks reads the overflow counter into the
>> msb. Next, it reads the hardware timer into the lsb. If the counter is a
>> down counter, the lsb is = to the counter max - the lsb. The msb is then
>> checked to make sure it hasn't changed, if it has, repeat the process. All
>> the interrupt routine does is increase the overflow count.
>> If you don't have interrupts get_ticks reads the hardware counter into the
>> lsb. If the counter is a down counter, the lsb is = to the counter max - the
>> lsb. If the lsb is less than it was in the previous call to get ticks, the
>> overflow counter is increased. get_ticks then loads the overflow counter
>> into the msb.
>>
>> sync_timebase converts the msb and lsb into millisec. It may do this by a 64
>> bit divide, or some shifting to align the lsb with then msb and the a 64 bit
>> divide, or a bunch of 32 bit fractional multiplies, or any such approach
>> that works.
>>
>> How does that sound?
> The fact that you have described three different implementations of
> get_ticks() with two of these differentiated between whether you have
> interrupts or not immediately suggests this solution is inherently more
> complex and less maintainable.
Yes, It is less maintainable because there are three cases.  As shown 
above, we can reduce the number of cases to one at the expense of some 
redundant code. There are problems with our original approach when the 
interrupt rate is == to the counter rollover period. This problem is 
detailed below. You placed the constraint that the interrupt rate was 
faster than the rollover period on the previous implementation. However, 
that constraint is not met in essentially all interrupt driven cases. 
Also, note that these get_ticks routines, even while different, are 
pretty simple. OTOH, some of the 64 bit counters can't be read 
atomically either, so having the "redundant"check in the code is not so 
bad. The again, we are talking about a total of probably 5 lines of code 
or so anyway. It is a good idea to move the rollover detection in the 
non-interrupt case to sync_timebase as well. Sync_timebase can also 
invert the down-counting counters, removing that from get_ticks. The 
wrap detection code can be #ifdef out if one is using interrupts and 
offended by it's presence. Thanks for pointing this out and compelling 
me to reduce the number of cases! Making get_ticks more lightweight is a 
good idea in my opinion.
> Lets say you have a platform with a 32-bit tick counter running at a
> reasonably long rollover time so you decide not to use interrupts. Then
> you create a new platform with the same tick counter, but it runs much
> faster and you realise you need to implement the interrupt routine to
> make get_timer() work for long enough periods - Fine, you add an ISR
> for the new platform that calls sync_timebase - No other changes are
> required.
>
> The last thing we want is for the 64-bit tick counter to be conceptually
> different across platforms
>
> I just realised - the ISR _does not need to call the sync_timebase at all_
> The ISR only needs to call get_ticks(), so it will be fast anyway
The problem is that the way we previously detected wrapping does not 
work if the interrupt rate is == to the counter wrap time, which it 
essentially always is. If get_ticks is trying to update the wrap count 
when an interrupt comes in, it will do it wrong. If the interrupt rate 
was faster, it would work, because get_ticks would always know the 
correct answer. Consider the following. Get ticks is called by the 
counter overflowing (which resets it to 0) and executes with the counter 
value at 0. Later, at the next rollover, with no intervening calls to 
get ticks, the interrupt routine calls get ticks. Assuming negligible  
interrupt latency, get_ticks just sees the counter is still 0, so it 
will not detect a wrap condition. So now you loose a period. I thought 
by passing in a flag we could force get_ticks to do the right thing in 
this case, but since we must disable interrupts when we call get ticks 
from the get_timer routine, the outer call to get ticks may have already 
detected the rollover and the flag will force an additional rollover to 
be detected. It is a classic race condition. If we detect rollover only 
in the interrupt routine, we do not need to protect the rollover 
variable from access by the main program, so we can be sure the results 
of get_ticks is coherent. If we try do do it in both places, we will 
have trouble. If the interrupt occurred at a faster rate, like say twice 
the rollover, we wouldn't have a problem. But it doesn't. In most cases 
it happens at the rollover rate, or just a bit more sometimes due to 
interrupt latency not being a constant. It may appear to happen at 
somewhat less than a rollover as well, if the interrupt latency was long 
at time n-1 and short at time n. I think this problem can be overcome in 
the old design by keeping track of whether the last update was by a 
non-interrupt or interrupt call to get_ticks , but I think the new 
design is better anyway.

Best Regards,
Bill Campbell
> Regards,
>
> Graeme
>
>



More information about the U-Boot mailing list