[U-Boot] [PATCHv2] fsl_ddr: Don't use full 64-bit divides on32-bit PowerPC

Fri Jul 29 23:46:26 CEST 2011

On Fri, 2011-07-29 at 16:26 -0500, Moffett, Kyle D wrote:
> On Jul 29, 2011, at 14:46, York Sun wrote:
> > On Fri, 2011-07-29 at 11:35 -0700, York Sun wrote:
> >> On Fri, 2011-07-29 at 12:20 -0500, Moffett, Kyle D wrote:
> >>> On Jul 28, 2011, at 17:41, York Sun wrote:
> >>>> I found a problem with the round up. Please try
> >>>> 
> >>>> picos_to_mclk(mclk_to_picos(3))
> >>>> 
> >>>> You will notice the result is round up. I am sure it is unexpected.
> >>> 
> >>> Hmm... what does fsl_ddr_get_mem_data_rate() return on your system?
> >>> I cannot reproduce this here with my memory freq of 800MHz.
> >>> 
> >>> The function has obviously always done rounding, even before I made
> >>> any changes.  Have you retested what the math does with the patch
> >>> reverted to see if it makes any difference?
> >>> 
> >>> Hmm, looking at it, the obvious problem case would be mclk_to_picos()
> >>> using a rounded result in a subsequent multiply.  Can you retest by
> >>> replacing mclk_to_picos() with the following code; this should keep
> >>> the error down by doing the rounding-to-10ps *after* the multiply.
> >>> 
> >>> unsigned int mclk_to_picos(unsigned int mclk)
> >>> {
> >>>        unsigned int data_rate = get_ddr_freq(0);
> >>>        unsigned int result;
> >>> 
> >>>        /* Round to nearest 10ps, being careful about 64-bit multiply/divide */
> >>>        unsigned long long mclk_ps = ULL_2e12 * mclk;
> >>> 
> >>>        /* Add 5*data_rate, for rounding */
> >>>        mclk_ps += 5*(unsigned long long)data_rate;
> >>> 
> >>>        /* Now perform the big divide, the result fits in 32-bits */
> >>>        do_div(mclk_ps, data_rate);
> >>>        result = mclk_ps;
> >>> 
> >>>        /* We still need to round to 10ps */
> >>>        return 10 * (result/10);
> >>> }
> >>> 
> >>> Obviously you could also get rid of the rounding alltogether, but I
> >>> don't know why it was put in the code in the first place...
> >> 
> >> I think the problem comes from the round up. But your calculation of
> >> remainder is not right. You explained to me with example of 2*10^^12-1.
> >> It seems to be right until I tried another number 264*10^^7. I think the
> >> following calculation of remainder is
> >> 
> >> 	clks_rem = do_div(clks, UL_5POW12);
> >> 	clks >>= 13;
> >> 	clks_rem |= (clks & (UL_2POW13-1)) * UL_5POW12;
> 
> That code you pasted is definitely not what is in the tree.
> The in-tree code is:
>   57  /* First multiply the time by the data rate (32x32 => 64) */
>   58  clks = picos * (unsigned long long)get_ddr_freq(0);
>   59
>   60  /*
>   61   * Now divide by 5^12 and track the 32-bit remainder, then divide
>   62   * by 2*(2^12) using shifts (and updating the remainder).
>   63   */
>   64  clks_rem = do_div(clks, UL_5pow12);
>   65  clks_rem <<= 13;
>   66  clks_rem |= clks & (UL_2pow13-1);
>   67  clks >>= 13;
>   68
>   69  /* If we had a remainder, then round up */
>   70  if (clks_rem)
>   71          clks++;
> 
> Can you tell me what your value of get_ddr_freq() is?  I may be able to
> reproduce the issue here.

264000000

> 
> Also, can you try your test case with the replacement for mclk_to_picos()
> that I pasted above and let me know what happens?
> 
> I think the *real* error is that get_memory_clk_period_ps() (which I did
> not change) rounds up or down to a multiple of 10ps, and then the *result*
> of get_memory_clk_period_ps() is multiplied by the number of clocks.  The
> effect is that the 5ps of rounding error that can occur is multiplied by
> the number of clocks, and may result in a much-too-large result.  When you
> convert that number of picoseconds *back* to memory clocks you get a
> result that is too big, but that's because it was too many picoseconds!

Correct. get_memory_clk_period_ps() is causing this problem.

> 
> Where did you get the number 264*10^7?  That seems too big to be a number
> of picoseconds (that's 2ms!), but way too small to be the product of
> get_ddr_freq() and some number of picoseconds.

Forget that. I was just trying.

> 
> EG: For my system here with a DDR frequency of 800MHz, each clock is 2500ps,
> so 3 clocks times 2500ps times 800MHz is 6*10^12.
> 
> In fact if you give "picos_to_mclk()" any number of picoseconds larger than
> at least 1 clock, the initial value of the "clks" variable is guaranteed to
> be at least 2*10^12, right?
> 
> Using my number, let's run through the math:
> 
> So if you start with picos = 7499 (to put rounding to the test) and
> get_ddr_freq() is 800MHz:
>   clks = 7499 * 800*1000*1000;
> 
> Now we have "clks" = 5999200000000, and we divide by 5**12:
>   clks_rem = do_div(clks, UL_5pow12);
> 
> Since 5999200000000/(5**12) is 24572.7232, we get:
>   clks = 24572
>   clks_rem = 176562500 (which is equal to 5^12 * 0.7232)
> 
> Now left-shift clks_rem by 13:
>   clks_rem = 1446400000000
> 

Here I think it is wrong. I want to use the clks_rem value later.

I change it to this

	clks_rem = do_div(clks, UL_5POW12);
	clks_rem += (clks & (UL_2POW13-1)) * UL_5POW12;
	clks >>= 13;

	/* If we had a remainder greater than the 1ps error, then round up */
	if (clks_rem > data_rate)
		clks++;

After fixing the get_memory_clk_period_ps() to round up to near 1ps, the
calculation is more accurate (well, not as floating point). And ow
clks_rem holds the correct remainder.

York