[U-Boot] [PATCHv2] fsl_ddr: Don't use full 64-bit divides on32-bit PowerPC
Moffett, Kyle D
Kyle.D.Moffett at boeing.com
Fri Jul 29 23:26:46 CEST 2011
On Jul 29, 2011, at 14:46, York Sun wrote:
> On Fri, 2011-07-29 at 11:35 -0700, York Sun wrote:
>> On Fri, 2011-07-29 at 12:20 -0500, Moffett, Kyle D wrote:
>>> On Jul 28, 2011, at 17:41, York Sun wrote:
>>>> I found a problem with the round up. Please try
>>>>
>>>> picos_to_mclk(mclk_to_picos(3))
>>>>
>>>> You will notice the result is round up. I am sure it is unexpected.
>>>
>>> Hmm... what does fsl_ddr_get_mem_data_rate() return on your system?
>>> I cannot reproduce this here with my memory freq of 800MHz.
>>>
>>> The function has obviously always done rounding, even before I made
>>> any changes. Have you retested what the math does with the patch
>>> reverted to see if it makes any difference?
>>>
>>> Hmm, looking at it, the obvious problem case would be mclk_to_picos()
>>> using a rounded result in a subsequent multiply. Can you retest by
>>> replacing mclk_to_picos() with the following code; this should keep
>>> the error down by doing the rounding-to-10ps *after* the multiply.
>>>
>>> unsigned int mclk_to_picos(unsigned int mclk)
>>> {
>>> unsigned int data_rate = get_ddr_freq(0);
>>> unsigned int result;
>>>
>>> /* Round to nearest 10ps, being careful about 64-bit multiply/divide */
>>> unsigned long long mclk_ps = ULL_2e12 * mclk;
>>>
>>> /* Add 5*data_rate, for rounding */
>>> mclk_ps += 5*(unsigned long long)data_rate;
>>>
>>> /* Now perform the big divide, the result fits in 32-bits */
>>> do_div(mclk_ps, data_rate);
>>> result = mclk_ps;
>>>
>>> /* We still need to round to 10ps */
>>> return 10 * (result/10);
>>> }
>>>
>>> Obviously you could also get rid of the rounding alltogether, but I
>>> don't know why it was put in the code in the first place...
>>
>> I think the problem comes from the round up. But your calculation of
>> remainder is not right. You explained to me with example of 2*10^^12-1.
>> It seems to be right until I tried another number 264*10^^7. I think the
>> following calculation of remainder is
>>
>> clks_rem = do_div(clks, UL_5POW12);
>> clks >>= 13;
>> clks_rem |= (clks & (UL_2POW13-1)) * UL_5POW12;
That code you pasted is definitely not what is in the tree.
The in-tree code is:
57 /* First multiply the time by the data rate (32x32 => 64) */
58 clks = picos * (unsigned long long)get_ddr_freq(0);
59
60 /*
61 * Now divide by 5^12 and track the 32-bit remainder, then divide
62 * by 2*(2^12) using shifts (and updating the remainder).
63 */
64 clks_rem = do_div(clks, UL_5pow12);
65 clks_rem <<= 13;
66 clks_rem |= clks & (UL_2pow13-1);
67 clks >>= 13;
68
69 /* If we had a remainder, then round up */
70 if (clks_rem)
71 clks++;
Can you tell me what your value of get_ddr_freq() is? I may be able to
reproduce the issue here.
Also, can you try your test case with the replacement for mclk_to_picos()
that I pasted above and let me know what happens?
I think the *real* error is that get_memory_clk_period_ps() (which I did
not change) rounds up or down to a multiple of 10ps, and then the *result*
of get_memory_clk_period_ps() is multiplied by the number of clocks. The
effect is that the 5ps of rounding error that can occur is multiplied by
the number of clocks, and may result in a much-too-large result. When you
convert that number of picoseconds *back* to memory clocks you get a
result that is too big, but that's because it was too many picoseconds!
Where did you get the number 264*10^7? That seems too big to be a number
of picoseconds (that's 2ms!), but way too small to be the product of
get_ddr_freq() and some number of picoseconds.
EG: For my system here with a DDR frequency of 800MHz, each clock is 2500ps,
so 3 clocks times 2500ps times 800MHz is 6*10^12.
In fact if you give "picos_to_mclk()" any number of picoseconds larger than
at least 1 clock, the initial value of the "clks" variable is guaranteed to
be at least 2*10^12, right?
Using my number, let's run through the math:
So if you start with picos = 7499 (to put rounding to the test) and
get_ddr_freq() is 800MHz:
clks = 7499 * 800*1000*1000;
Now we have "clks" = 5999200000000, and we divide by 5**12:
clks_rem = do_div(clks, UL_5pow12);
Since 5999200000000/(5**12) is 24572.7232, we get:
clks = 24572
clks_rem = 176562500 (which is equal to 5^12 * 0.7232)
Now left-shift clks_rem by 13:
clks_rem = 1446400000000
Then bitwise-OR in the lower 13 bits of clks:
clks_rem |= (24572 & 8191)
clks_rem = 1446400008188;
Finally, rightshift clks by 13:
clks = 2
Since there is a remainder, we "round up" from 2 to 3.
These numbers empirically make sense, because I gave the system 7499ps
(which is 3 clocks minus 1ps) and it rounded up properly.
If I give it a simple number like 7500ps, then the math is really easy
and obvious:
clks = 7500 * 800*1000*100
Then the do_div():
clks = 24576
clks_rem = 0
Shifting clks_rem does nothing, because it is zero...
Oring with the lower 13 bits of clks has no effect (it has no low bits set)
Finally, when right-shifting clks by 13:
clks = 3
This is obviously the correct result.
Cheers,
Kyle Moffett
More information about the U-Boot
mailing list