[U-Boot] [RFC][Timer API] Revised Specification - Implementation details

Fri May 27 08:33:51 CEST 2011

On 5/26/2011 9:33 PM, Graeme Russ wrote:
> Hi Bill,
>
<snip>
> get_ticks() does not care about the clock rate - It simply looks at the
> current value of the hardware tick counter and the value of the hardware
> tick counter the last time get_ticks() was called, calculates the difference
> and adds that to the 64-bit software tick-counter
> I don't see how it being a down counter makes that any more difficult
>
>> This is neither simple nor maintainable. Further, it is un-necessary, as the
>> sync_timer routine is just going to convert the number from whatever radix
>> we converted it to into millisec.  If we leave the two numbers as split, all
>> that complexity is removed from get_ticks and sent upward to the common
>> routine that converts the answer into ms anyway. This makes the system more
>> maintainable by placing the minimum requirement on get_ticks. The "tick"
>> should be opaque to anybody but sync_timebase anyway.
> But how is the common routine going to know if it is a split timer, up
> timer, down timer, little endian, big endian, etc, etc.
>
> get_ticks() abstracts the hardware implementation of the hardware timer
> from sync_timer()
Hi All,
     I understand your point. I prefer a higher level of abstraction. 
You are correct that there are some aspects of the tick counter that are 
very hardware quirky, and these attributes are hard to abstract. If the 
timer is embedded into a bit field with other variables, it is 
reasonable to expect get_ticks to extract the bit field and right 
justify the number. If there are endian issues, the get_ticks routine 
must return the number in the "natural" endianness of the platform. 
However, after that point, the values are extremely "regular". The fact 
that a counter is a down counter can be expressed in a data structure as 
a boolean. The high limit of the hardware counter is a number. The 
number of ticks per millsecond is obtainable from usec2ticks(1000), or 
10000 if we want to avoid some roundoff. From these values, sync_timer 
can take the two part ticks value and convert it to millisec. Trust me 
on this. I have the routines to do it. This puts as much of the 
abstraction of the two numbers into ONE COMMON ROUTINE, sync_timer. Now 
it is clearly possible to move some of the abstraction down a level into 
sync_timer. For instance you could move inverting the counter down to 
that level, and then multiply the msb by the maximum value of the lsb 
counter and add in the msb. It is clearly possible to move ALL of 
sync_timer down into get_ticks, if one wanted to. It is clearly possible 
to replace general values in gd-> with platform specific constant 
values. However, if you do that, you end up with a lot of duplicate, or 
almost duplicate, code running around. That has proven to be error 
prone, and it has left new ports of u-boot to sort of fend for 
themselves in figuring out how things should work. I prefer to abstract 
that all up into sync_timer. That way, all the math is in one place, and 
is table driven so it is easy to change.
>>>> The top 32 bits are the rollover count and the bottom 32 bits are the
>>>> current counter. If the counter is a full 32 bits, so much the better.
>>> Ahhhhh - Lets keep it that way
>>>
>>>> Again, one could put them together inside the interrupt routine , but it
>>>> is
>>>> easier to check for a changed value if you don't do this. Otherwise, you
>>>> have to check both words. It also makes the isr faster. It is just an
>>> As I said before - Simple First, Fast Later
>> I am in favor of simple. That is why I want get_ticks to do as little as
>> possible. It should just read the hardware register and the overflow counter
>> if it is separate. Make sure the overflow didn't change while we were
>> reading. This is redundant if we are not using interrupts but we can leave
>> the code in. It just won't do anything.  We can also move the rollover
>> detection to sync_timebase. It will be redundant if we are using interrupts,
>> because time will never "back up". But we can do it this way. This
>> centralizes the overflow detection, which is a good thing.
> That does not sound simple to me. This, however, does:
>
> u64 get_ticks(void)
> {
> 	static u64 last_hw_tick_count;
> 	static u64 last_sw_tick_count;
>
> 	/* Now for the platform specific stuff - Read hardware tick counter */
> 	u64 current_hw_tick_count = /* Read hw registers etc */
>
> 	/* Deal with hardware weirdness - errata, stupid hw engineers etc */
>
> 	u64 elapsed_ticks = current_hw_tick_count - last_hw_tick_count;
> 	last_sw_tick_count += elapsed_ticks;
>
> 	return last_sw_tick_count;
> }
>
> The '/* Read hw registers etc */' bit will always be the same, no matter
> what way you do it
Agree.
> The '/* Deal with hardware weirdness - errata, stupid hw engineers etc */'
> bit is where we are truly abstracting the hardware away - This is the
> bit you propose to leave mangled and deal with in sync_time?
Not totally. The get_ticks routine must mask off any extra bits and 
right justify the hardware counter. If the counter is  of "improper" 
endianness (not common, but probably not unknown), it should be fixed in 
get_ticks.
>   You make a
> lot of assumptions about the consistency of this highly variable logic
> across all current (and future) U-Boot platforms
Not really. The assumptions are that the two numbers are both binary numbers
> What if the hardware engineer figures he can save a squeeze some extra
> mileage out of a FPGA by implementing a 24-bit counter + 8-bit status
> in one 32-bit register - get_ticks() strips all that out
Correct, it should strip out the extra bits
> You want to do some in get_ticks(), some in the ISR and some in
> sync_timer() -
Well, not really. All the ISR ever does is increment the count msb. Just 
like it does now on the PPC. Always that, exactly that, and nothing 
more. Get ticks "extracts" the hardware counter into an msb and lsb 
part. It makes sure that the msb hasn't changed while it extracted the 
lsb. This avoids a race with the isr, and is not necessary if interrupts 
are not used. Both your version and my version of the get_ticks routine 
will need to do this.
> Put all the weird stuff on one place - The HAL
I agree. I just don't consider compensating for the timer counting down 
instead of up to be "weird". I consider it common. I also don't consider 
the fact that the hardware counter may not contain 32 bits and may not 
cover a power of 2 to be "weird". I consider it to be common also. The 
total difference between what you would put in get_ticks and what I 
would put in sync_timer is something like this

/* if timer is a down counter, reverse it */
    if (gd->timer_counts_down)
     count.lsb = gd->lsb_max_value - count.lsb;
/* you would add the following to make a number out of the two parts. I 
would do this at the next level up, after I had checked for wrapping. 
That way I can avoid 64 bits longer, possibly forever.  Note that a 0 
lsb_max_value corresponds to a full 32 bits */
      if (gd->lsb_max_value > 0 )
         count.u64 = count.msb * gd->lsb_max_value + count.lsb;
/* if it was a 64 bit number to begin with, we don't have to do any 
multiply/add */

I don't' see any reason to push this down to a lower level. It is just 
one more thing to get messed up across implementations.

>
>> detection in the non-interrupt case to sync_timebase as well. Sync_timebase
>> can also invert the down-counting counters, removing that from get_ticks.
>> The wrap detection code can be #ifdef out if one is using interrupts and
> Urghh - Why are you adding unnecessary ugliness - #ifdef in the middle of
> code is usually a sign you are doing something wrong
As I said, this is an optional optimization. I do not agree that an 
#ifdef in the middle of code indicates you have a bad design. Lots and 
Lots of ifdefs certainly indicates a bad design. An ifdef to eliminate 
code if some option is not selected is hardly such a strange thing, 
especially only a single #ifdef. However, feel free to not have it if 
you so desire.

>> offended by it's presence. Thanks for pointing this out and compelling me to
>> reduce the number of cases! Making get_ticks more lightweight is a good idea
>> in my opinion.
>>> Lets say you have a platform with a 32-bit tick counter running at a
>>> reasonably long rollover time so you decide not to use interrupts. Then
>>> you create a new platform with the same tick counter, but it runs much
>>> faster and you realise you need to implement the interrupt routine to
>>> make get_timer() work for long enough periods - Fine, you add an ISR
>>> for the new platform that calls sync_timebase - No other changes are
>>> required.
>>>
>>> The last thing we want is for the 64-bit tick counter to be conceptually
>>> different across platforms
>>>
>>> I just realised - the ISR _does not need to call the sync_timebase at all_
>>> The ISR only needs to call get_ticks(), so it will be fast anyway
>> The problem is that the way we previously detected wrapping does not work if
>> the interrupt rate is == to the counter wrap time, which it essentially
>> always is. If get_ticks is trying to update the wrap count when an interrupt
> Is it, always, on every platform?
Yes, pretty much. You get a terminal count/counter underflow interrupt 
and that is it.
>> comes in, it will do it wrong. If the interrupt rate was faster, it would
>> work, because get_ticks would always know the correct answer. Consider the
>> following. Get ticks is called by the counter overflowing (which resets it
>> to 0) and executes with the counter value at 0. Later, at the next rollover,
>> with no intervening calls to get ticks, the interrupt routine calls get
>> ticks. Assuming negligible  interrupt latency, get_ticks just sees the
>> counter is still 0, so it will not detect a wrap condition. So now you loose
> But wait a minute, don't you KNOW that the interrupt gets triggered on a
> rollover so therefore there MUST have been a rollover, therefore there has
> been 2^32 (or 2^16 or 2^whatever) ticks which you can just subtract from the
> last known tick count (which would be zero if there had been no get_timer
> calls in between)
Except if this interrupt was delayed because get_ticks turned off 
interrupts, get_ticks may have already compensated the number. When we 
then get the (delayed) interrupt, we will do it twice.
>> a period. I thought by passing in a flag we could force get_ticks to do the
>> right thing in this case, but since we must disable interrupts when we call
>> get ticks from the get_timer routine, the outer call to get ticks may have
>> already detected the rollover and the flag will force an additional rollover
>> to be detected. It is a classic race condition. If we detect rollover only
> Aha! - In nearly all situations, a race is better avoided than run!
Yes, that is why the new approach does NOT turn off interrupts when 
get_ticks is called, but rather loops if the overflow count was changed 
while reading the count lsb. That ensures get_ticks always gets both 
words either before the counter wrapped or after the counter wrapped, 
but not halfway in between. Some ISYNCs may be required to make sure 
there are no outstanding changes pending to memory, depending on your 
architecture.
>> in the interrupt routine, we do not need to protect the rollover variable
>> from access by the main program, so we can be sure the results of get_ticks
>> is coherent. If we try do do it in both places, we will have trouble. If the
>> interrupt occurred at a faster rate, like say twice the rollover, we
>> wouldn't have a problem. But it doesn't. In most cases it happens at the
>> rollover rate, or just a bit more sometimes due to interrupt latency not
> Again does it really - for all arches?
Yep, pretty much. Almost all timers I know of only give one interrupt 
per cycle.
>> being a constant. It may appear to happen at somewhat less than a rollover
>> as well, if the interrupt latency was long at time n-1 and short at time n.
>> I think this problem can be overcome in the old design by keeping track of
>> whether the last update was by a non-interrupt or interrupt call to
>> get_ticks , but I think the new design is better anyway.
> I disagree - The whole point of a HAL is to Abstract the Hardware
The HAL should abstract what needs abstracting, and no more. We are 
really talking about literally 4 lines of code here, but I think those 
four lines certainly belong in sync_timer. All the math in one place is 
a good thing. In most cases, get_ticks becomes trivial. Only in cases 
where the timer is wrong endian or embedded in another bit field (both 
things being possible but not all that common) does get_ticks need to do 
anything other than just read the registers and loop until the msb is 
stable. Simple here is good, because this is what new implementers need 
to add/change. The less here, the better! In fact, the looping until the 
msb is stable can also be moved up to sync_timer. That is even simpler.

Best Regards,
Bill Campbell
> Regards,
>
> Graeme
>
>