[U-Boot] [RFC] ARM timing code refactoring

Mon Jan 24 09:25:39 CET 2011

Dear Albert ARIBAUD,

Am 24.01.2011 um 08:24 schrieb Albert ARIBAUD:

> Le 24/01/2011 02:42, J. William Campbell a écrit :
> 
>> Hi All,
>>            In order to avoid doing 64 bit math, we can define a "jiffie"
>> or a "bogo_ms" that is the 64 bit timebase shifted right such that the
>> lsb of the bottom 32 bits has a resolution of between 0.5 ms and 1 ms.
>> It is then possible to convert the difference between two jiffie/bogo_ms
>> values to a number of ms using a 32 bit multiply and a right shift of 16
>> bits, with essentially negligible error.  get_bogo_ms() would return a
>> 32 bit number in bogo_ms, thus the timing loop would be written.
>> 
>> u32 start_time = get_bogo_ms();
>> do {
>>      if ("data_ready")
>>          /* transfer a byte */
>>      if (bogo_ms_to_ms(get_timer() - start_time)>   TIMEOUT_IN_MS)
>>          /* fail and exit loop */
>> } while (--"bytestodo">   0);
>> 
>> u32 get_bogo_ms()
>> {
>>          u64 tick;
>>          read(tick);
>> 
>>           return (tick>>   gd->timer_shift);
>> }
>> u32 bogo_ms_to_ms(u32 x)
>> {
>>     /* this code assumes the resulting ms will be between 0 and 65535,
>> or 65 seconds */
>>         return ((x * gd->cvt_bogo_ms_to_ms)>>   16); /* cvt_bogo_ms_to_ms
>> is a 16 bit binary fraction */
>> }
>> 
>> All the above code assumes timeouts are 65 seconds or less, which I
>> think is probably fair. Conversion of ms values up to 65 seconds to
>> bogo_ms is also easy, and a 32 bit multiplied result is all that is
>> required.
>> What is not so easy is converting a 32 bit timer value to ms.  It can be
>> done if the CPU can do a 32 by 32 multiply to produce a 64 bit result,
>> use the msb, and possibly correct the result by an add if  bit 32,of the
>> timer is set.  You need a 33 bit counter in bogo_ms to get a monotonic,
>> accurate 32 bit counter in ms. The powerpc can use a mulhw operation to
>> do this, and any CPU that will produce a 64 bit product can do this.
>> However, many CPUs do not produce 64 bit products easily. Using division
>> to do these operations are even less appealing, as many CPUs do not
>> provide hardware division at all. Since it is not necessary to do this
>> conversion to easily use timeouts with 1 ms resolution and accuracy,  I
>> think the idea of not using a timer in ms but rather bogo_ms/jiffies is
>> possibly better?
>> 
>> Best Regards,
>> Bill Campbell
> 
> That is assuming a 64-bit timebase, isn't it? for CPUs / SoCs that don't 
> have such a timebase but only a 32-bit timer, the bogo_ms/jiffy would 
> not go through the full 32-bit range, which would cause issues with the 
> timing loops on rollover -- and while a timeout of more than 65 sec may 
> not be too likely, a timeout starting near the wraparound value of 
> bogo_ms still could happen.

I agree with the possibility of wrap around near the end of u32 'bogo_ms' counter. Therefore we do need to define some constraints for such a '64 bit free running tick counter'. It could be implemented to overflow in some seconds as the u32 bogo_ms would do.

> Besides, the 'tick' unit of time makes physical sense but the bogo_ms 
> would not, while still not being a common timing value -- reminds me of 
> my ms_to_ticks conversion macro that Wolfgang did not like.

I also dislike to have another virtual physical dimension defined here.

> In a more general perspective, I'd like to see where where exactly we 
> need 64-bit multiply/divide operations in Wolfgang's proposal before we 
> try to get rid of it. In my understanding:
> 
> - get_timer() works in pure ticks, not ms, and thus does not need 
> multiply/divide; it may at most need to implement a carry over from 32 
> bit to 64 bits *if* the HW counter is 32 bits *and if* we want a 64-bit 
> virtual counter.
> 
> - get_time() works in ms, and thus needs scale conversion, so possibly a 
> multiply/divide but possibly some other method, to convert a tick value 
> to an ms value.
> 
> That's where I come back to one point of my proposal: if we can get a 
> general framework for get_timer() to return a 64-bit free-running tick 
> value, then we might not need a ms-based get_time() at all, because we 
> could use get_timer() as well for ms timings, provided we can convert 
> our timeout from ms to ticks, i.e.
> 
> 	/* let's wait 200 milliseconds */
> 	/* Timing loop uses ticks: convert 200 ms to 'timeout' ticks */
> 	timeout = ms_to_ticks(200);
> 	u32 start = get_timer(); /* start time, in ticks */
> 	do {
> 		...
> 	} while ( (get_timer() -start) < timeout);

You may think about the following change to this proposal:

/* lets wait 200 ms */
/* get the end point of our timeout in ticks */
u64 timeout_end = get_timer() + ms_to_ticks(200);
do {
 ...
} while ( get_timer() < timeout_end);

If I got Reinhard's proposal correct this is exactly what he meant. He call it 'timer_init(timeout_val)' and 'is_timeout()' but I feel this is exactly what he described.

First we calculate the timeout in ticks, then  just compare the 'now()' value with the end point of the timeout loop. I claim this approach is a bit better than yours on systems that can not do 64 bit instructions natively.

> This way, a timing loop would not involve anything more complex than a 
> 64-bit subtraction and comparison; the only division/multiplication 
> involved would be in the timeout computation, out of the loop.

You forgot to mention the 'ms_to_ticks()' could be pre-calculated by preprocessor in most cases. This may be a huge performance gain in most cases.

regards

Andreas Bießmann