[U-Boot] [PATCH] sunxi: mctl_mem_matches: Add missing memory barrier

Fri Apr 22 15:12:30 CEST 2016

Hi,

On 22/04/16 13:09, Hans de Goede wrote:
> Hi,
> 
> On 22-04-16 13:46, Andre Przywara wrote:
>> Hi Hans,
>>
>> thanks for the information and the heads up!
>>
>> On 22/04/16 11:48, Hans de Goede wrote:
>>> Hi,
>>>
>>> On 22-04-16 11:32, Ian Campbell wrote:
>>>> On Fri, 2016-04-15 at 09:34 +0200, Hans de Goede wrote:
>>>>>> I wonder if what you are observing could be possibly explained by
>>>>>> just
>>>>>> a usual data corruption problem? Which may be happening when the DRAM
>>>>>> clock speed is set higher than this particular device is able to
>>>>>> handle
>>>>>> in a reliable way. Inserting just one or more NOP instructions
>>>>>> instead
>>>>>> of the barrier could possibly change some timings too.
>>>>>>
>>>>>> If this patch helps, then it's fine. But I wonder if it is not merely
>>>>>> making the problem latent instead of fixing the root cause?
>>>>> I do believe that this patch addresses a real problem and is not
>>>>> hiding
>>>>> some dram timing issues, I might be wrong about the write-buffer being
>>>>> the cause, it could simply be that the compiler is doing something bad
>>>>> (despite the accesses being marked as volatile)  and that the DSB
>>>>> stops
>>>>> the compiler from optimizing things too much.
>>>>
>>>> I have a _very_ vague memory of seeing something not disimilar to this
>>>> (apparent write buffer interactions with MMU disabled) in the early
>>>> days of Xen development, but that was probably on models and so may not
>>>> have been representative of the intended behaviour of eventual silicon.
>>>>
>>>> It might be interesting to have a look at the generated assembly and
>>>> see if it differs in more or less than the addition of the single
>>>> instruction and perhaps experiment with just a compiler barrier.
>>>>
>>>> Andre, do you have any insights on this?
>>
>> Agree on the compiler barrier, frankly I don't see how this should break
>> with caches on or off unless the actual instruction order is wrong or
>> the compiler optimized something away.
>> Regardless of the write buffer the core should make sure the subsequent
>> reads return the value written before - especially if we are talking UP
>> here.
> 
> "the core should make sure the subsequent reads return the value written
> before"
> that is exactly the problem, we are writing 2 different values
> to so DRAM_BASE and DRAM_BASE + 512MiB, then read them both back
> and compare them, expecting them to be the same (both reads returning
> the last written value) if the ramsize is 512MiB (this is used in
> several places
> in the dram controller code to auto-config number of rows, columns, etc.).
> 
> But the core seems to just return the last written value,
> rather then actually going out to the RAM and reading it from
> there, which results in the function always returning false
> (i.o.w. it claims no DRAM phys address wraparound is happening
>  at 512MiB).

Oh, right, I missed that part, sorry.
So this is about physical aliasing?
The DRAM controller has only n address lines connected, and changing a
line >n shouldn't make a difference, right?
And the write succeeds and does trigger an asynchronous abort?

In this case you would indeed need some kind of "flushing", with caches
on I'd say a DCCIMVAC (Clean and Invalidate data or unified cache line
by MVA to PoC).

So I did a quick poll around the office and people say that "dsb" is the
right thing to do here (with MMU off).
As this is backed by practical experience, I'd just say: good to go!

> The DSB seems to fix this, but it might very well be the
> compiler being to clever (although all accesses are done
> through volatile pointers, so it really should not).

Plus those writel and readl macros already have a compiler barrier,
though on the "wrong" side for our purpose (before the write and after
the read).

Cheers,
Andre.

> 
> I'll try the barrier() fix when I've some time.
> 
> Regards,
> 
> Hans
> 
> 
> 
> 
>>
>>>
>>> Andre here is the original mail/patch for reference:
>>>
>>>      sunxi: mctl_mem_matches: Add missing memory barrier
>>>
>>>      We are running with the caches disabled when mctl_mem_matches gets
>>> called,
>>>      but the cpu's write buffer is still there and can still get in
>>> the way,
>>>      add a memory barrier to fix this.
>>>
>>>      This avoids mctl_mem_matches always returning false in some cases,
>>> which
>>>      was resulting in:
>>>
>>> <snip>
>>>
>>> @@ -31,6 +32,7 @@ bool mctl_mem_matches(u32 offset)
>>>       /* Try to write different values to RAM at two addresses */
>>>       writel(0, CONFIG_SYS_SDRAM_BASE);
>>>       writel(0xaa55aa55, (ulong)CONFIG_SYS_SDRAM_BASE + offset);
>>> +    DSB;
>>>       /* Check if the same value is actually observed when reading
>>> back */
>>>       return readl(CONFIG_SYS_SDRAM_BASE) ==
>>>              readl((ulong)CONFIG_SYS_SDRAM_BASE + offset);
>>>
>>>
>>> What this code is trying to do is determine RAM (chip) size by seeing
>>> when
>>> writing to RAM wrapsaround.
>>>
>>> This works with the DSB but not without (without it always returns
>>> false)
>>> this is on a Cortex A7 with the mmu (and data caches) disabled.
>>>
>>> Ian, I can try using just a compiler barrier, but I've never done so
>>> before, how do I insert one ?
>>
>> barrier();
>>
>> I am busy at the moment, but will take a look later.
>>
>> Cheers,
>> Andre.
>>
>