[U-Boot] [PATCH] Fix unreliable detection of DRAM size on Orange Pi 3

Ondřej Jirman megous at megous.com
Fri Aug 30 20:40:30 UTC 2019


Hi André,

On Fri, Aug 30, 2019 at 05:19:02PM +0100, Andre Przywara wrote:
> > OTOH, let's visualize it:
> > 
> > CPU address                             DRAM address
> > ---------------------------------------------------------------------------
> > [w 00000000] SDRAM_BASE                 0
> > [w aa55aa55] SDRAM_BASE + offset        0
> > 
> > Both writes should end up at the same physical location in dram).
> > 
> > The following test checks that values that are read back are the same. They
> > should be the same in this specific case of Opi3 SBC. It doesn't check for any
> > specific value, so even if writes are re-ordered, the value that's read back
> > should still be the same.
> 
> Ah, right. I was actually staring at my modified code, where I try to detect
> half DQ problems with some weird 3GB setup on the Eachlink H6 Mini. So
> I explicitly check for and compare to the second value written, and expect
> this write to have overwritten the first 0 write.
> 
> > Order of operations is:
> > 
> >   writel(0, CONFIG_SYS_SDRAM_BASE);
> >   writel(0xaa55aa55, (ulong)CONFIG_SYS_SDRAM_BASE + offset);
> >   dsb();
> 
> From all I can see this means at this point the two writes are "on the bus"
> ("completed" from the CPU's point of view).
> 
> >   readl(CONFIG_SYS_SDRAM_BASE)
> 
> This should actually push the first write to the device, as a read following
> a write to the same address requires the write to take effect first. Might
> still end up in a write buffer in the DRAM controller, though.
> 
> >   readl((ulong)CONFIG_SYS_SDRAM_BASE + offset);
> > 
> > The CPU can only re-order either the writes or reads. But neither should matter.
> 
> Yes, true, for the existing code (which compares the two reads against each
> other).
> 
> > Memory controller is a black box to me and that's where the issue probably
> > lies.
> 
> Yes, the plot thickens. Maybe the DRAM controller buffers the write
> request(s), as it waits for the page(s) to open in the chips, for instance.
> Now the reads can be satisfied from this buffer, since the addresses match. No
> idea if it would break down the addresses into rows/cols/bank/rank already to
> detect the aliasing.
> 
> > The only time this should fail and return different values when reading is if
> > the hardware does some shortcut/assumption, and reads will not go fully from
> > DRAM after the writes.
> 
> Sounds tempting, but with the caches and the MMU off (so treating everything
> as device memory) I don't see where this would happen. The only case I could
> imagine would be *in* the DRAM controller, as mentioned above.

Yes, that's what I meant.

> So in this case we would need to tell the DRAM controller to complete all
> outstanding requests (the two writes) first. Then issue the reads. As
> mentioned, sounds possible, but I have no clue how to do this. A slight hack
> would indeed be some delay loop, to give the DRAM controller time to fiddle
> the writes into the DRAM matrix.
> Or flood it with writes, then DSB to force this buffer to drain?

That's antoher interesting idea. :) Though we don't know the size of the buffer.

> > Anyway, I'm failing to reproduce this even without any patches, ATM.
> 
> Mmmh, shame.
> Any changes in your setup you can think of? New power supply?
> Maybe we should indeed experiment with lower clocks? Did you run stability
> tests in Linux? To prove that the DRAM timing is actually solid?

I thought it might be temperature related, because now I'm using fan to cool
the board. But that turned out to be not it (I pre-heated the board, and
nothing).

No stability tests, but I've been compiling mesa repeatedly on the board
without a single obvious issue, I also push a lot of data through the board
to the USB connected SSD.

The kernel didn't panic unexpectedly yet. (serveral months)

I suspect that the DRAM timing is good.

I'm trying memterster now, and it looks good, too.

regards,
	o.

> Cheers,
> Andre.
> 
> > regards,
> > 	o.
> > 
> > > So for the sake of this patch, we should take it. It is needed and
> > > apparently solves one issue, and I can't see any harm in it.
> > > 
> > > Regardless of this I was wondering if in this situation (single in-order
> > > core running with MMU and caches off) this is the full story, as the CPU
> > > does not have a good reason to reorder.
> > > 
> > > > Could you please experiment with this problem a little bit more?
> > > > 
> > > >  1. What if you just move this second DSB instruction after the
> > > >     second write and have two of them there? Does this also make
> > > >     the problem disappear?
> > > 
> > > I don't think two dsb()s next to each other will do anything useful. I
> > > would think the second dsb() would just be ignored, as the order is
> > > already preserved and all memory accesses have been completed:
> > > "A DSB is a memory barrier that ensures that memory accesses that occur
> > > before the DSB have completed before the completion of the DSB
> > > instruction." (ARMv8 ARM, section B2.3.5: DSB)
> > > I can't imagine this would introduce any kind of relevant delay, as a
> > > DSB is just a barrier.
> > > 
> > > >  2. What if you just replace DSB with udelay(1) / udelay(10) /
> > > >     udelay(100)? Does this make the problem disappear?>
> > > > 
> > > > I suspect that we may be dealing with some sort of buffering in
> > > > the DRAM controller itself
> > > 
> > > Indeed it seems perfectly possible that the DDR3 controller buffers and
> > > potentially reorders those two writes.
> > > The canonical solution would to force the write by telling the memory
> > > controller directly, however this is probably not only controller
> > > specific, but also undocumented.
> > > 
> > > I was wondering whether waiting for the next refresh would help (or for
> > > the period of one refresh cycle), but I guess the DRAM controller could
> > > buffer requests beyond that point.
> > > 
> > > > and the only reason why DSB helps is
> > > > that it introduces some delay because it's a rather slow
> > > > instruction.
> > > 
> > > How much delay a DSB instruction introduces, is dependent on many
> > > things, but mostly on the number of outstanding memory requests and the
> > > time it takes them to complete them. So I am not sure we can call it a
> > > "slow instruction" per se.
> > > 
> > > So to summarise: I think Siarhei has a point in that this is actually
> > > more of a timing issue, but the DSB in there is needed and we should
> > > apply this patch:
> > > 
> > > Reviewed-by: Andre Przywara <andre.przywara at arm.com>
> > > 
> > > Cheers,
> > > Andre.
> > > 
> 


More information about the U-Boot mailing list