[linux-sunxi] Re: [PATCH 2/2] sunxi: enable dual rank memory on R40

Icenowy Zheng icenowy at aosc.io
Tue Mar 2 19:08:08 CET 2021


在 2021-03-02星期二的 15:19 +0000,Andre Przywara写道:
> On Tue, 02 Mar 2021 21:50:49 +0800
> Icenowy Zheng <icenowy at aosc.io> wrote:
> 
> Hi Icenowy,
> 
> > 于 2021年3月2日 GMT+08:00 下午9:40:44, Andre Przywara <
> > andre.przywara at arm.com> 写到:
> > > On Fri, 26 Feb 2021 00:13:25 +0800
> > > Icenowy Zheng <icenowy at aosc.io> wrote:
> > >  
> > > > Previously we do not have proper dual rank memory detection on
> > > > R40
> > > > (because we omitted PIR_QSGATE, which does not work on R40 with
> > > > our
> > > > configuration), and dual rank memory is just simply disabled as
> > > > early
> > > > R40 boards available (Banana Pi M2 Ultra and Berry) have single
> > > > rank
> > > > memory.
> > > > 
> > > > As a board with dual rank memory (Forlinx OKA40i-C) is now
> > > > known to  
> > > us,  
> > > > we need to have a way to do memory rank detection to support
> > > > that  
> > > board.  
> > > > 
> > > > Add some routine to detect memory rank by trying to access the
> > > > memory
> > > > in rank 1 and check for error status of the memory controller,
> > > > and  
> > > then  
> > > > enable dual rank memory on R40.
> > > > 
> > > > Similar routine can be used to detect half DQ width (which is
> > > > also
> > > > detected by PIR_QSGATE on other SoCs), but it's left
> > > > unimplemented
> > > > because there's no known R40 board with half DQ width now.  
> > > 
> > > So when looking at the SPL size I realised that this patch breaks
> > > the
> > > socid constant parameter propagation, especially for
> > > mctl_set_cr(). I
> > > don't see immediately why, this whole code here should be
> > > compiled out
> > > on any A64 builds, for instance. Instead GCC turns
> > > "<mctl_set_cr.constprop.0>:" into "<mctl_set_cr>:", and passes
> > > 0x1689
> > > around everytime. I tried GCC 10.2.0 and 9.2.0, also tried adding
> > > const
> > > everywhere, to amplify the constant nature of this value. Patch
> > > 1/2 added to the code size, but kept the const propagation (only
> > > one
> > > instance of 0x1689 in the disassembly). This patch here should be
> > > for
> > > free on A64, but adds 104 bytes.
> > > 
> > > Does anyone have a clue why this is happening? Is this a compiler
> > > issue?  
> > 
> > Maybe the issue is that I assume this codepath is only for R40 and
> > I didn't add socid to it?
> 
> But that's clearly visible by this function only being called inside
> "if
> (socid == SOCID_R40)". And that works very well for the H3 ZQ
> calibration quirk, for instance.
> 
> > Could you try to add a socid parameter to
> > mctl_r40_detect_rank_count()?
> 
> I just tried that, and apart from looking weird it didn't do
> anything.
> 
> To be clear: Your code is absolutely fine, exactly the way it should
> be
> written. It's just that the compiler is playing stupid suddenly. I
> was
> thinking that your dummy readl might have upset it, but even with
> that
> commented out it's still happening.
> 
> > Or maybe it makes mctl_calc_rank_size() not inlined?
> 
> So the assembly looks like everything apart from mctl_set_cr() and
> mctl_auto_detect_dram_size_rank() is inlined. Those are extra because
> they are called multiple times and we are using -Os.
>  
> > (Well I think the code looks noop on non-R40)
> 
> Exactly that was my thinking: when compiling for the A64, it should
> be
> totally compiled out, and the object file should be the same before
> and
> after.
>  
> > BTW, original rank/width detection part won't get called on R40.
> > But
> > R40 is not where we are tight on code size, so I didn't bother to
> > ignore
> > it on R40.
> 
> I see. Yeah, I am not concerned about R40 either, but I want to avoid
> the A64 bloating up. 
> 
> > Or should we switch back to #if's and do not rely on compiler
> > behavior any longer?
> 
> I'd rather not do that. We are doing the right thing, and it works
> marvellously so far.
> 
> To be clear: it's not a show stopper, I was just curious why this
> happens.
> The code size is not really critical on the A64 at the moment, so I'd
> merge it as it, even if we don't find a solution. Maybe just leave a
> hint about it in the code should people need to come back to this.
> 
> I also asked some compiler buffs about it, but it's not exactly the
> simple reproducer case they like to deal with ;-)

Well I tried to make a simpler reproducer:

```
#include <stdint.h>

__attribute__((noinline)) static void mctl_set_cr(uint16_t socid)
{
	if (socid == 0x1701)
		*((volatile uint16_t *)0x12345678) = 40;
	else if (socid == 0x1689)
		*((volatile uint16_t *)0x12345678) = 64;
}

static void test(uint16_t socid)
{
	if (socid == 0x1701)
		mctl_set_cr(socid);

	*((volatile uint16_t *)0x12345670) = 0xefe8;
}

int sunxi_dram_init()
{
	uint16_t socid = 0x1689;

	test(socid);
	mctl_set_cr(socid);

	return 0;
}
```

If compile this with `aarch64-linux-gnu-gcc test.c -c -Os`, then the
mctl_set_cr won't get constprop optimization.

In the same situation with real dram_sunxi_dw, if judge for 0x1701 is
moved from test() to sunxi_dram_init() then mctl_set_cr gets optimized
and codepath for setting 40 is omitted.

I think, the conditions that triggers the lose of the optimization is
below:
- Inside a function inlined to where the constant is defined (not the
function that defines the constant itself)
- Get called inside a if() block that the expression inside the
condition uses the constant

> 
> Cheers,
> Andre
> 
> 
> 
> > 
> > > 
> > > Cheers,
> > > Andre
> > >   
> > > > Signed-off-by: Icenowy Zheng <icenowy at aosc.io>
> > > > ---
> > > >  arch/arm/mach-sunxi/dram_sunxi_dw.c | 55  
> > > +++++++++++++++++++++++++----  
> > > >  1 file changed, 49 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/arch/arm/mach-sunxi/dram_sunxi_dw.c  
> > > b/arch/arm/mach-sunxi/dram_sunxi_dw.c  
> > > > index 2b9d631d49..b86ae7cdf3 100644
> > > > --- a/arch/arm/mach-sunxi/dram_sunxi_dw.c
> > > > +++ b/arch/arm/mach-sunxi/dram_sunxi_dw.c
> > > > @@ -414,11 +414,9 @@ static void mctl_set_cr(uint16_t socid,
> > > > struct  
> > > dram_para *para)  
> > > >         }
> > > >  
> > > >         if (socid == SOCID_R40) {
> > > > -               if (para->dual_rank)
> > > > -                       panic("Dual rank memory not
> > > > supported\n");
> > > > -
> > > >                 /* Mux pin to A15 address line for single rank
> > > > memory. */
> > > > -               setbits_le32(&mctl_com->cr_r1,
> > > > MCTL_CR_R1_MUX_A15);
> > > > +               if (!para->dual_rank)
> > > > +                       setbits_le32(&mctl_com->cr_r1,
> > > > MCTL_CR_R1_MUX_A15);
> > > >         }
> > > >  }
> > > >  
> > > > @@ -702,8 +700,55 @@ static unsigned long
> > > > mctl_calc_rank_size(struct  
> > > rank_para *rank)  
> > > >         return (1UL << (rank->row_bits + rank->bank_bits)) *  
> > > rank->page_size;  
> > > >  }
> > > >  
> > > > +/*
> > > > + * Because we cannot do mctl_phy_init(PIR_QSGATE) on R40 now
> > > > (which  
> > > leads  
> > > > + * to failure), it's needed to detect the rank count of R40
> > > > in  
> > > another way.  
> > > > + *
> > > > + * The code here is modelled after time_out_detect() in BSP,
> > > > which  
> > > tries to  
> > > > + * access the memory and check for error code.
> > > > + *
> > > > + * TODO: auto detect half DQ width here
> > > > + */
> > > > +static void mctl_r40_detect_rank_count(struct dram_para *para)
> > > > +{
> > > > +       ulong rank1_base = (ulong) CONFIG_SYS_SDRAM_BASE +
> > > > +                          mctl_calc_rank_size(&para-
> > > > >ranks[0]);
> > > > +       struct sunxi_mctl_ctl_reg * const mctl_ctl =
> > > > +                       (struct sunxi_mctl_ctl_reg
> > > > *)SUNXI_DRAM_CTL0_BASE;
> > > > +
> > > > +       /* Enable read time out */
> > > > +       setbits_le32(&mctl_ctl->pgcr[0], 0x1 << 25);
> > > > +
> > > > +       (void) readl((void *) rank1_base);
> > > > +       udelay(10);
> > > > +
> > > > +       if (readl(&mctl_ctl->pgsr[0]) & (0x1 << 13)) {
> > > > +               clrsetbits_le32(&mctl_ctl->dtcr, 0xf << 24, 0x1
> > > > << 24);
> > > > +               para->dual_rank = 0;
> > > > +       }
> > > > +
> > > > +       /* Reset PHY FIFO to clear it */
> > > > +       clrbits_le32(&mctl_ctl->pgcr[0], 0x1 << 26);
> > > > +       udelay(100);
> > > > +       setbits_le32(&mctl_ctl->pgcr[0], 0x1 << 26);
> > > > +
> > > > +       /* Clear error status */
> > > > +       setbits_le32(&mctl_ctl->pgcr[0], 0x1 << 24);
> > > > +
> > > > +       /* Clear time out flag */
> > > > +       clrbits_le32(&mctl_ctl->pgsr[0], 0x1 << 13);
> > > > +
> > > > +       /* Disable read time out */
> > > > +       clrbits_le32(&mctl_ctl->pgcr[0], 0x1 << 25);
> > > > +}
> > > > +
> > > >  static void mctl_auto_detect_dram_size(uint16_t socid, struct 
> > > dram_para *para)  
> > > >  {
> > > > +       if (socid == SOCID_R40) {
> > > > +               mctl_r40_detect_rank_count(para);
> > > > +               mctl_set_cr(socid, para);
> > > > +       }
> > > > +
> > > >         mctl_auto_detect_dram_size_rank(socid, para,  
> > > (ulong)CONFIG_SYS_SDRAM_BASE, &para->ranks[0]);  
> > > >  
> > > >         if ((socid == SOCID_A64 || socid == SOCID_R40) && para-
> > > > >dual_rank)  
> > > {  
> > > > @@ -854,8 +899,6 @@ unsigned long sunxi_dram_init(void)
> > > >         uint16_t socid = SOCID_H3;
> > > >  #elif defined(CONFIG_MACH_SUN8I_R40)
> > > >         uint16_t socid = SOCID_R40;
> > > > -       /* Currently we cannot support R40 with dual rank
> > > > memory */
> > > > -       para.dual_rank = 0;
> > > >  #elif defined(CONFIG_MACH_SUN8I_V3S)
> > > >         uint16_t socid = SOCID_V3S;
> > > >  #elif defined(CONFIG_MACH_SUN50I)  
> 



More information about the U-Boot mailing list