[U-Boot] [RFC][Timer API] Revised Specification - Implementation details

Fri May 27 03:26:20 CEST 2011

On 5/26/2011 4:28 PM, Graeme Russ wrote:
> Hi Bill,
>
> On Fri, May 27, 2011 at 2:56 AM, J. William Campbell
> <jwilliamcampbell at comcast.net>  wrote:
>> On 5/26/2011 6:27 AM, Graeme Russ wrote:
>>> Hello Everyone,
>>>
>>> OK - Starting a new thread to discuss implementation details. This is a
>>> heads-up for arch/platform maintainers - Once this is a bit more stable, I
>>> will put it on the wiki
>>>
>>> Assumed Capabilities of the Platform
>>>   - Has a 'tick counter' that does not rely on software to increment
>> Hi All,
>>        The nios2 with the most basic timer does not meet this requirement. It
>> will not count at all without the 10 ms interrupt. I don't think this
>> requirement matters anyway. We need a 'tick counter' that 'ticks'. If it
>> takes software to make it tick, we don't much care. There may be problems
>> with early use of udelay in that case, but that is a different issue.
> I think we will need to define get_timer() weak - Nios will have to
> override the default implementation to cater for it's (Nios') limitations
Hi All,
     Yes, that will probably be required here.
>>>   - tick interval may by a fixed constant which cannot be controlled
>>>     via software, or it could be programmable (PIT)
>>>
>>> API Functions (/lib/timer.c)
>>>   - u32 get_timer(u32 start)
>>>      - Returns the number of elapsed milliseconds since 'start'
>>>
>>> API Functions (/arch/...)
>>>   - void udelay(u32 delay)
>>>      - Used for 'short' delays (generally up to several seconds)
>>>      - Can use the tick counter if it is fast enough
>>>      - MUST NOT RESET THE TICK COUNTER
>> There is a requirement that udelay be available before relocation and before
>> the BSS is available. One can use the tick counter to provide udelay as long
>> as sync_timebase is not called OR sync timebase does not use BSS. It appears
>> many implementations ignore this requirement at present. We should try to
>> fix this, but is should not be a requirement.
> If you really wanted to, sync_timebase() could use global data (it doesn't
> have many static variables) in which case all timer functions would be
> available before relocation
Yes, my implementation of the sync_timebase routine was written that 
way, using gd-> for the required variables.
>>> 'Helper' Functions (/lib/timer.c)
>> I think this function should be weak, so that it is possible for people to
>> override it with a "custom" function. The fully general sync_timebase has
>> lots of code in it that can be simplified in special cases. We want and need
>> a fully general function to be available, but other users who are real tight
>> on space may want a cut down version. We should make that easily possible.
> Agree
>
>>>   - void sync_timebase(void)
>>>      - Updates the millisecond timer
>>>      - Utilises HAL functions to access the platform's tick counter
>>>      - Must be called more often than the rollover period of the
>>>        platform's tick counter
>>>      - Does not need to be called with a regular frequency (immune
>>>        to interrupt skew)
>>>      - Is always called by get_timer()
>>>      - For platforms with short tick counter rollovers it should
>>>        be called by an ISR
>>>      - Bill Campbell wrote a good example which proved this can be common
>>>        and arbitrary (and optionally free of divides and capable of
>>>        maintaining accuracy even if the tick frequency is not an even
>>>        division of 1ms)
>>>
>>> HAL Functions (/arch/... or /board/...)
>>>   - u64 get_ticks(void)
>> For what it's worth, I would like to propose using a (gasp!) typedef here.
>> It seems to me there are a whole lot of cases where the max number of ticks
>> is a u32 or less. In those cases, the wrap at 32 bits helps things a lot. If
>> the tick counter is really 64 bits, the function of sync_timebase  is simply
>> to convert the tick value  to millisec, and that is it. Otherwise, if it is
>> 32 bits or less then some other actions will be required. I think this is
>> one of those times where a typedef would help, We could define a type called
>> timer_tick_t to describe this quantity. That would allow a pure 32 bit
>> implementation where appropriate.
>>
>> Another suggestion is that perhaps we want a u32 get_ticks_lsb(void) as well
>> as a regular get_ticks. The lsb version would be used for udelay and could
>> possibly come from another timer if that was necessary/desirable. See the
>> requirement for early udelay early availability.
> I think this all adds unnecessary complexity
>
>>>      - Returns a tick count as an unsigned 64-bit integer
>>>      - Abstracts the implementation of the platform tick counter
>>>        (platform may by 32-bit 3MHz counter, might be a 16-bit
>>>        0-999 microsecond plus 16-bit 0-65535 millisecond etc)
>>>   - u64 ticks_per_millisecond()
>>>      - Returns the number of ticks (as returned by get_ticks()) per
>>>        millisecond
>> I think ticks_per_sec would be a better choice. First, such a function
>> already exists in almost all u-boots. Second, if one wants the best accuracy
>> for things like udelay, you need better accuracy than  millisec. Since this
>> function is used only infrequently, when things are initialized, a divide to
>> get ticks_per_millsec (if that is what you really want) is no big deal.
>> Lastly, I think this function can remain u32. Yes, there is a 4 GHz limit on
> Don't underestimate the ability of existing platforms to already exceed
> this limit - Scientific equipment can easily have a 1 nano second tick
> counter (with extreme precision)
True enough. I have already agreed that usec2ticks and ticks2usec are 
fine for this purpose.
>> the clock rate. If this ever becomes an issue, we can change the type to
>> timer_tick_t. When the CPU clock rate gets quite high, it is an advantage to
>> divide it down for performance measurement anyway. The AMD/Intel chips
>> already do this. If the hardware doesn't do it, shift the timer value right
>> two bits. I doubt you will miss the 0.4 nanoseconds resolution loss from
>> your 10 GHz timestamp.
> Why mess around with bit shifting (which you would then have to cludge into
> your platform code) when carting around a 64-bit value is relatively cheap,
> transparent and poratble (all all supported up-to-date tool chains)
>
I really STRONGLY disagree with this statement. If you actually needed 
64 bit variables, fine use them. But as I have already shown, you do not 
need them in general.  We are computing a 32 bit result. There is some 
entropy argument that says you shouldn't need 64 bits to do so. Another 
way to look at it is that converting the top 32 bit word and the bottom 
32 bit word to ms separately can be easier than doing them both together 
at once.  However, as we will see below, I do agree we need two 32 bit 
words to make this process go smoothly. I just don't agree that they 
should/will constitute a 64 bit binary word. See below.
>>>   - void timer_isr()
>>>      - Optional (particularly if tick counter rollover period is
>>>        exceptionally log which is usually the case for native 64-bit tick
>>>        counters)
>>>      - Simply calls sync_timebase()
>>>      - For platforms without any tick counter, this can implement one
>>>        (but accuracy will be harmed due to usage of disable_interrupts()
>>> and
>>>        enable_interrupts() in U-Boot
>>>
>>> So to get the new API up and running, only two functions are mandatory:
>>>
>>> get_ticks() which reads the hardware tick counter and deals with any
>>> 'funny
>>> stuff' including rollovers, short timers (12-bit for example), composite
>>> counters (16-bit 0-999 microsecond + 16-bit millisecond) and maintains a
>>> 'clean' 64-bit tick counter which rolls over from all 1's to all 0's. The
>> I think it is the task of get_ticks to return the hardware tick counter as
>> an  increasing counter, period.  The counter may wrap at some final count
>> that is not all ones. That is ok. Sync_timebase deals with the rollovers if
> The hardware tick counter may, the 64-bit software tick counter maintained
> by get_ticks() may not

>> necessary. get_ticks is very lightweight. get_ticks should deal with
>> decrementing counters by returning the complement of the counter.  The sc520
>> case is a bit more complex if you intend to use the 0-999 and 16 bit
>> millisec registers, in that you do need to add them to the previous value to
> As I mentioned in another post, this is a problem for the platform
> maintainer and is abstracted away throught the platform specific
> implementation of get_ticks()
>
>> make an increasing counter. Sync_timebase "likes" short counters in that
>> they are easy to convert to millisec and tick remainders.
> The compiler should handle using 64-bit rather than 32-bit transparently
True enough.  But you don't need 64 bit variables at this point two 32 
bit ones work just fine, in fact better in most cases.
>>> 64-bit tick counter does not need to be reset to zero ever (even on
>>> startup
>>> - sync_timebase tacks care of all the details)
>> True, but sync_timebase does have to be initialized (as does the timer
>> itself in most cases, so this is not a restriction).
> This can be done in timer_init() via a call to sync_timebase() after the
> timer has been configured. This should bring everything into line
>
>>> ticks_per_millisecond() simply return the number of ticks in a millisecond
>>> - This may as simple as:
>>>
>>> inline u64 ticks_per_millisecond(void)
>>> {
>>>         return CONFIG_SYS_TICK_PER_MS;
>>> }
>>>
>>> But it may be trickier if you have a programmable tick frequency
>> You will have to call the routine that initializes sync_timebase. This
>> routine should have a name, like void init_sync_timebase(void)?
>>> The optional timer ISR is required if the tick counter has a short roll
>>> over duration (short is up to you - 1 second is short, 1 hour might be, 1
>>> century is not)
>>>
>>> Regards,
>>>
>>> Graeme
>>>
>> It is probably true that sync_timebase should have a parameter flag. The
>> reason is that if the timer isr is called only when the timer wraps, then
>> the calls to sync_timebase may be slightly more than a full timer period
>> apart. (due to interrupt latency). Therefore, when the timer difference is
>> computed, if the current update is due to a wrap AND the previous update is
>> due to a wrap, the difference should be approximately 1 wrap. If it comes up
>> real short, you must add a wrap. This isn't necessary if the routine is
>> called more often than once per wrap. Also, when sync_timebase is called in
> timer_isr() MUST be called more often than the rollover period of the
> underlying hardware tick counter - This is a requirement
The equality case can be made to work.  If the extension of the counter 
is done in the interrupt routine, not in get_ticks, get_ticks just needs 
to read the msb of the counter, read the lsb of the counter, then verify 
that the msb has not changed. If you have interrupts that work, that is 
the easiest way to go. If the lsb of the counter has represents 1 ms or 
less, you can just drop it (equivalent to the what the PPC does now). If 
the interrupt is slower than that, you must use at least part of the 
LSB. If you don't have interrupts, the point is moot.
>> get_timer, you must first disable interrupts and then enable interrupts
>> after sync_timebase returns
> Why? - get_ticks() provides an atomic read of the hardware tick counter.
> If get_ticks() needs to disable and enable interrupts to do so, that is a
> problem for the platform maintainer
>
> Admittedly, sync_timebase() will not be re-entrant, but how will it ever
> be called concurrently? - Ah, I see - a call to get_timer() interrupted
> by the timer ISR :)
Yes, that is the problem. I have come to the view that  two 32 bit words 
are the best approach. Note that the lsb may actually not fill the full 
32 bits. The top 32 bits are the rollover count and the bottom 32 bits 
are the current counter. If the counter is a full 32 bits, so much the 
better. Again, one could put them together inside the interrupt routine 
, but it is easier to check for a changed value if you don't do this. 
Otherwise, you have to check both words. It also makes the isr faster. 
It is just an increment of the overflow counter, like the PPC is now. I 
happen to think it is easier to convert the two 32 bit words to 
milliseconds one at a time, but if you feel you must use 64 bit words, 
that is fine too. Just remember, the counter does not always fill the 
entire bottom word.
In cases where there are no interrupts, get_ticks has to detect that the 
timer has "backed up" and increment the overflow counter itself, unless 
the counter is 64 bits to begin with and overflow is impossible anyway. 
get_ticks should NOT try to detect overflows if interrupts are 
available. If it got both words before an interrupt happened, the answer 
is correct. If it got an interrupt in between fetching the words, the 
event will be detected and the value re-fetched. All sync_timebase would 
do now is convert the returned value to milliseconds.

So, if you have a 64 bit hardware counter, get_ticks reads and returns it.
Else if you have interrupts, get_ticks reads the overflow counter into 
the msb. Next, it reads the hardware timer into the lsb. If the counter 
is a down counter, the lsb is = to the counter max - the lsb. The msb is 
then checked to make sure it hasn't changed, if it has, repeat the 
process. All the interrupt routine does is increase the overflow count.
If you don't have interrupts get_ticks reads the hardware counter into 
the lsb. If the counter is a down counter, the lsb is = to the counter 
max - the lsb. If the lsb is less than it was in the previous call to 
get ticks, the overflow counter is increased. get_ticks then loads the 
overflow counter into the msb.

sync_timebase converts the msb and lsb into millisec. It may do this by 
a 64 bit divide, or some shifting to align the lsb with then msb and the 
a 64 bit divide, or a bunch of 32 bit fractional multiplies, or any such 
approach that works.

How does that sound?
Best Regards.
Bill Campbell

> Regards,
>
> Graeme
>
>