[PATCH 1/1] sunxi: dram: Fix incorrect ram size detection for some H6 boards

Andre Przywara andre.przywara at arm.com
Tue Oct 3 11:48:38 CEST 2023


On Mon, 02 Oct 2023 20:50:49 +0200
Jernej Škrabec <jernej.skrabec at gmail.com> wrote:

Hi,

> Dne ponedeljek, 02. oktober 2023 ob 13:26:26 CEST je Andre Przywara napisal(a):
> > On Sun,  1 Oct 2023 21:43:32 +0530
> > Gunjan Gupta <viraniac at gmail.com> wrote:
> > 
> > (fixing Jernej's email) 
> > 
> > Hi Gunjan,
> > 
> > thanks for sending a patch!
> >    
> > > On some H6 boards like Orange Pi 3 LTS, some times U-Boot fails to detect
> > > ram size correctly. Instead of 2GB thats available, it detects 4GB of ram
> > > and then SPL just hangs there making board not to boot further.
> > > 
> > > On debugging, I found that the rows value were being determined correctly,
> > > but columns were sometimes off by one value. I found that adding some
> > > delay after the mctl_core_init call along with making use of dsb in the
> > > start of the mctl_mem_matches solves the issue.
> > > 
> > > Signed-off-by: Gunjan Gupta <viraniac at gmail.com>
> > > ---
> > > 
> > >  arch/arm/mach-sunxi/dram_helpers.c   | 1 +
> > >  arch/arm/mach-sunxi/dram_sun50i_h6.c | 2 ++
> > >  2 files changed, 3 insertions(+)
> > > 
> > > diff --git a/arch/arm/mach-sunxi/dram_helpers.c b/arch/arm/mach-sunxi/dram_helpers.c
> > > index cdf2750f1c..5758c58e07 100644
> > > --- a/arch/arm/mach-sunxi/dram_helpers.c
> > > +++ b/arch/arm/mach-sunxi/dram_helpers.c
> > > @@ -32,6 +32,7 @@ void mctl_await_completion(u32 *reg, u32 mask, u32 val)
> > >  #ifndef CONFIG_MACH_SUNIV
> > >  bool mctl_mem_matches(u32 offset)
> > >  {
> > > +	dsb();  
> > 
> > This looks a bit odd, do you have an explanation for that? And are you
> > sure that is really needed?
> > I understand why we need the DSB after the writel's below, but before that?
> > The only thing I could think of is that we are missing a barrier in
> > mctl_core_init() - which is the function called before mctl_mem_matches().
> > Can you move that dsb(); into mctl_auto_detect_dram_size(), right after
> > the mctl_core_init() call (where you add the udelay() below)? And I wonder
> > if a dmb() would already be sufficient? I noticed recently that the
> > clr/setbit_le32() functions don't have a barrier at all, maybe that should
> > be fixed instead?  
> 
> Looking at original BSP DRAM code, there is no data barriers that I can find.
> Cache shouldn't be a thing before DRAM is initialized, right?

With the *MMU off* every memory access is "device nGnRnE", so
definitely not cached. But a Cortex-A53 still has a store buffer which
is in effect even for device accesses.

> Conversely,
> I suggest adding memory barriers before each udelay(), as it is there for a
> reason.

Talking to a colleague and looking at the ARM ARM and other
documentation, including [1]:
A DMB memory barrier (__iowmb() in U-Boot and Linux), as used in
readl/writel, just ensures ordering, it does not force completion.
For just programming the DRAM controller, this is what we want.
A DSB does everything that a DMB does, plus ensures "completion" of
memory accesses (plus other things like TLBs and CMOs, which we
don't care about in this case). In a Cortex-A53, this seems to include
flushing the store buffer, which is probably the culprit here.

So I don't know if the delays in the BSP DRAM driver are really
pauses that the DRAM controller needs. In this case we would need at
least a DSB before the udelay(), if not a device-read-back, though I
just assume that the DRAM controller registers do not have another
buffer on the hardware side. Check 27:47 onward of [1].

Another possibility is that the delays are just crude measures to paper
over missing barriers, hoping that the buffer is drained after the
time. But since the delays don't really hurt here, we should just assume
they are meaningful and keep them.

Also we pretty surely need a DSB after we setup the DRAM controller,
but before we use the DRAM array: the controller registers and the array
are separate devices, on different AXI buses. So that aspect of this patch
seems actually to be alright, and the fact that Gunjan needed the DSB
supports that. 

I will try to come up with a patch that implements these ideas.

Cheers,
Andre

P.S. Please note that above statements are an application of the
architecture rules to the A53 and the MMU-off/single core situation in the
SPL. The situation is more complex when running Linux, especially on more
modern cores that do speculation and out-of-order execution.

[1] https://www.youtube.com/watch?v=i6DayghhA8Q

> 
> >   
> > >  	/* Try to write different values to RAM at two addresses */
> > >  	writel(0, CFG_SYS_SDRAM_BASE);
> > >  	writel(0xaa55aa55, (ulong)CFG_SYS_SDRAM_BASE + offset);
> > > diff --git a/arch/arm/mach-sunxi/dram_sun50i_h6.c b/arch/arm/mach-sunxi/dram_sun50i_h6.c
> > > index bff2e42513..a031a845f5 100644
> > > --- a/arch/arm/mach-sunxi/dram_sun50i_h6.c
> > > +++ b/arch/arm/mach-sunxi/dram_sun50i_h6.c
> > > @@ -623,6 +623,8 @@ static void mctl_auto_detect_dram_size(struct dram_para *para)
> > >  	para->cols = 11;
> > >  	mctl_core_init(para);
> > >  
> > > +	udelay(50);  
> > 
> > The location of that udelay() looks a bit odd, any chance that belongs at
> > the end of mctl_channel_init() instead? And how did you come up with that
> > and the value of 50? Just pure experimentation? I think the original BSP
> > DRAM code has plenty of delay calls, so we might have missed this one
> > here, but I would love to see some explanation here.  
> 
> I checked original driver and we have *almost* all delays. There are two
> missing. Both in mctl_phy_pir_init(). One before mctl_await_completion() and
> another after it. Both are 1 us long.
> 
> Maybe that solves it (in combination with data barriers)?
> 
> Best regards,
> Jernej
> 
> > 
> > Cheers,
> > Andre
> >   
> > >  	for (para->cols = 8; para->cols < 11; para->cols++) {
> > >  		/* 8 bits per byte and 16/32 bit width */
> > >  		if (mctl_mem_matches(1 << (para->cols + 1 +  
> > 
> >   
> 
> 
> 
> 



More information about the U-Boot mailing list