[U-Boot] [RFC][Timer API] Revised Specification - Implementation details

Fri May 27 06:33:15 CEST 2011

Hi Bill,

On Fri, May 27, 2011 at 1:54 PM, J. William Campbell
<jwilliamcampbell at comcast.net> wrote:
> On 5/26/2011 6:51 PM, Graeme Russ wrote:
>>
>> Hi Bill,
>>
>> On Fri, May 27, 2011 at 11:26 AM, J. William Campbell
>> <jwilliamcampbell at comcast.net>  wrote:
>>>

[snip]

>>>
>>> Yes, that is the problem. I have come to the view that  two 32 bit words
>>> are
>>> the best approach. Note that the lsb may actually not fill the full 32
>>> bits.
>>
>> Urghhh
>
> Hi All,
>    Consider the case where we have a 16 bit counter clocked by 1.0 MHz (a
> nice number) interrupting on overflow. We could return (overflow counter <<
> 16) | hardware counter from the get_ticks routine, which maintains the
> "binaryness" of the "number".
> OTOH, suppose the counter is a down counter clocked at 1.19318 MHz with a 1
> ms period, like the PC counter is set up at present.  What do we return
> then?   Whatever we do, it depends on the clock  rate, which may change, and
> involves some math, which may not work for all clock rates. In effect, you
> have a dual radix number. Yes, conversion is possible but it is messy and
> would be present in different forms in all the various get_tick routines.

get_ticks() does not care about the clock rate - It simply looks at the
current value of the hardware tick counter and the value of the hardware
tick counter the last time get_ticks() was called, calculates the difference
and adds that to the 64-bit software tick-counter

I don't see how it being a down counter makes that any more difficult

> This is neither simple nor maintainable. Further, it is un-necessary, as the
> sync_timer routine is just going to convert the number from whatever radix
> we converted it to into millisec.  If we leave the two numbers as split, all
> that complexity is removed from get_ticks and sent upward to the common
> routine that converts the answer into ms anyway. This makes the system more
> maintainable by placing the minimum requirement on get_ticks. The "tick"
> should be opaque to anybody but sync_timebase anyway.

But how is the common routine going to know if it is a split timer, up
timer, down timer, little endian, big endian, etc, etc.

get_ticks() abstracts the hardware implementation of the hardware timer
from sync_timer()

>>>
>>> The top 32 bits are the rollover count and the bottom 32 bits are the
>>> current counter. If the counter is a full 32 bits, so much the better.
>>
>> Ahhhhh - Lets keep it that way
>>
>>> Again, one could put them together inside the interrupt routine , but it
>>> is
>>> easier to check for a changed value if you don't do this. Otherwise, you
>>> have to check both words. It also makes the isr faster. It is just an
>>
>> As I said before - Simple First, Fast Later
>
> I am in favor of simple. That is why I want get_ticks to do as little as
> possible. It should just read the hardware register and the overflow counter
> if it is separate. Make sure the overflow didn't change while we were
> reading. This is redundant if we are not using interrupts but we can leave
> the code in. It just won't do anything.  We can also move the rollover
> detection to sync_timebase. It will be redundant if we are using interrupts,
> because time will never "back up". But we can do it this way. This
> centralizes the overflow detection, which is a good thing.

That does not sound simple to me. This, however, does:

u64 get_ticks(void)
{
	static u64 last_hw_tick_count;
	static u64 last_sw_tick_count;

	/* Now for the platform specific stuff - Read hardware tick counter */
	u64 current_hw_tick_count = /* Read hw registers etc */

	/* Deal with hardware weirdness - errata, stupid hw engineers etc */

	u64 elapsed_ticks = current_hw_tick_count - last_hw_tick_count;
	last_sw_tick_count += elapsed_ticks;

	return last_sw_tick_count;
}

The '/* Read hw registers etc */' bit will always be the same, no matter
what way you do it

The '/* Deal with hardware weirdness - errata, stupid hw engineers etc */'
bit is where we are truly abstracting the hardware away - This is the
bit you propose to leave mangled and deal with in sync_time? You make a
lot of assumptions about the consistency of this highly variable logic
across all current (and future) U-Boot platforms

What if the hardware engineer figures he can save a squeeze some extra
mileage out of a FPGA by implementing a 24-bit counter + 8-bit status
in one 32-bit register - get_ticks() strips all that out

You want to do some in get_ticks(), some in the ISR and some in
sync_timer() - Put all the weird stuff on one place - The HAL

>>>
>>> increment of the overflow counter, like the PPC is now. I happen to think
>>> it
>>> is easier to convert the two 32 bit words to milliseconds one at a time,
>>> but
>>> if you feel you must use 64 bit words, that is fine too. Just remember,
>>> the
>>> counter does not always fill the entire bottom word.
>>
>> Urghhh
>>
>>> In cases where there are no interrupts, get_ticks has to detect that the
>>> timer has "backed up" and increment the overflow counter itself, unless
>>> the
>>> counter is 64 bits to begin with and overflow is impossible anyway.
>>> get_ticks should NOT try to detect overflows if interrupts are available.
>>> If
>>> it got both words before an interrupt happened, the answer is correct. If
>>> it
>>> got an interrupt in between fetching the words, the event will be
>>> detected
>>> and the value re-fetched. All sync_timebase would do now is convert the
>>> returned value to milliseconds.
>>>
>>> So, if you have a 64 bit hardware counter, get_ticks reads and returns
>>> it.
>>> Else if you have interrupts, get_ticks reads the overflow counter into
>>> the
>>> msb. Next, it reads the hardware timer into the lsb. If the counter is a
>>> down counter, the lsb is = to the counter max - the lsb. The msb is then
>>> checked to make sure it hasn't changed, if it has, repeat the process.
>>> All
>>> the interrupt routine does is increase the overflow count.
>>> If you don't have interrupts get_ticks reads the hardware counter into
>>> the
>>> lsb. If the counter is a down counter, the lsb is = to the counter max -
>>> the
>>> lsb. If the lsb is less than it was in the previous call to get ticks,
>>> the
>>> overflow counter is increased. get_ticks then loads the overflow counter
>>> into the msb.
>>>
>>> sync_timebase converts the msb and lsb into millisec. It may do this by a
>>> 64
>>> bit divide, or some shifting to align the lsb with then msb and the a 64
>>> bit
>>> divide, or a bunch of 32 bit fractional multiplies, or any such approach
>>> that works.
>>>
>>> How does that sound?
>>
>> The fact that you have described three different implementations of
>> get_ticks() with two of these differentiated between whether you have
>> interrupts or not immediately suggests this solution is inherently more
>> complex and less maintainable.
>
> Yes, It is less maintainable because there are three cases.  As shown above,
> we can reduce the number of cases to one at the expense of some redundant
> code. There are problems with our original approach when the interrupt rate
> is == to the counter rollover period. This problem is detailed below. You
> placed the constraint that the interrupt rate was faster than the rollover
> period on the previous implementation. However, that constraint is not met
> in essentially all interrupt driven cases. Also, note that these get_ticks
> routines, even while different, are pretty simple. OTOH, some of the 64 bit
> counters can't be read atomically either, so having the "redundant"check in
> the code is not so bad. The again, we are talking about a total of probably
> 5 lines of code or so anyway. It is a good idea to move the rollover
> detection in the non-interrupt case to sync_timebase as well. Sync_timebase
> can also invert the down-counting counters, removing that from get_ticks.
> The wrap detection code can be #ifdef out if one is using interrupts and

Urghh - Why are you adding unnecessary ugliness - #ifdef in the middle of
code is usually a sign you are doing something wrong

> offended by it's presence. Thanks for pointing this out and compelling me to
> reduce the number of cases! Making get_ticks more lightweight is a good idea
> in my opinion.
>>
>> Lets say you have a platform with a 32-bit tick counter running at a
>> reasonably long rollover time so you decide not to use interrupts. Then
>> you create a new platform with the same tick counter, but it runs much
>> faster and you realise you need to implement the interrupt routine to
>> make get_timer() work for long enough periods - Fine, you add an ISR
>> for the new platform that calls sync_timebase - No other changes are
>> required.
>>
>> The last thing we want is for the 64-bit tick counter to be conceptually
>> different across platforms
>>
>> I just realised - the ISR _does not need to call the sync_timebase at all_
>> The ISR only needs to call get_ticks(), so it will be fast anyway
>
> The problem is that the way we previously detected wrapping does not work if
> the interrupt rate is == to the counter wrap time, which it essentially
> always is. If get_ticks is trying to update the wrap count when an interrupt

Is it, always, on every platform?

> comes in, it will do it wrong. If the interrupt rate was faster, it would
> work, because get_ticks would always know the correct answer. Consider the
> following. Get ticks is called by the counter overflowing (which resets it
> to 0) and executes with the counter value at 0. Later, at the next rollover,
> with no intervening calls to get ticks, the interrupt routine calls get
> ticks. Assuming negligible  interrupt latency, get_ticks just sees the
> counter is still 0, so it will not detect a wrap condition. So now you loose

But wait a minute, don't you KNOW that the interrupt gets triggered on a
rollover so therefore there MUST have been a rollover, therefore there has
been 2^32 (or 2^16 or 2^whatever) ticks which you can just subtract from the
last known tick count (which would be zero if there had been no get_timer
calls in between)

> a period. I thought by passing in a flag we could force get_ticks to do the
> right thing in this case, but since we must disable interrupts when we call
> get ticks from the get_timer routine, the outer call to get ticks may have
> already detected the rollover and the flag will force an additional rollover
> to be detected. It is a classic race condition. If we detect rollover only

Aha! - In nearly all situations, a race is better avoided than run!

> in the interrupt routine, we do not need to protect the rollover variable
> from access by the main program, so we can be sure the results of get_ticks
> is coherent. If we try do do it in both places, we will have trouble. If the
> interrupt occurred at a faster rate, like say twice the rollover, we
> wouldn't have a problem. But it doesn't. In most cases it happens at the
> rollover rate, or just a bit more sometimes due to interrupt latency not

Again does it really - for all arches?

> being a constant. It may appear to happen at somewhat less than a rollover
> as well, if the interrupt latency was long at time n-1 and short at time n.
> I think this problem can be overcome in the old design by keeping track of
> whether the last update was by a non-interrupt or interrupt call to
> get_ticks , but I think the new design is better anyway.

I disagree - The whole point of a HAL is to Abstract the Hardware

Regards,

Graeme