[U-Boot] [PATCH] ARM: socfpga: Clear PL310 early in SPL

Tue Feb 19 20:07:12 UTC 2019

On 2/19/19 8:29 PM, Simon Goldschmidt wrote:
> Am 19.02.2019 um 17:39 schrieb Marek Vasut:
>> On 2/19/19 5:31 PM, Simon Goldschmidt wrote:
>>> Am 19.02.2019 um 17:00 schrieb Marek Vasut:
>>>> On 2/19/19 4:55 PM, Simon Goldschmidt wrote:
>>>>> Am 19.02.2019 um 15:00 schrieb Marek Vasut:
>>>>>> On 2/19/19 2:58 PM, Simon Goldschmidt wrote:
>>>>>>>
>>>>>>>
>>>>>>> Am Di., 19. Feb. 2019, 14:45 hat Marek Vasut <marex at denx.de
>>>>>>> <mailto:marex at denx.de>> geschrieben:
>>>>>>>
>>>>>>>        On 2/19/19 2:28 PM, Simon Goldschmidt wrote:
>>>>>>>        > On Tue, Feb 19, 2019 at 1:53 PM Marek Vasut <marex at denx.de
>>>>>>>        <mailto:marex at denx.de>> wrote:
>>>>>>>        >>
>>>>>>>        >> On SoCFPGA Gen5 systems, it can rarely happen that a
>>>>>>> reboot from
>>>>>>>        Linux
>>>>>>>        >> will result in stale data in PL310 L2 cache controller.
>>>>>>> Even if
>>>>>>>        the L2
>>>>>>>        >> cache controller is disabled via the CTRL register CTRL_EN
>>>>>>> bit, those
>>>>>>>        >> data can interfere with operation of devices using DMA,
>>>>>>> like
>>>>>>> e.g. the
>>>>>>>        >> DWMMC controller. This can in turn cause e.g. SPL to fail
>>>>>>> reading
>>>>>>>        data
>>>>>>>        >> from SD/MMC.
>>>>>>>        >>
>>>>>>>        >> The obvious solution here would be to fully reset the L2
>>>>>>> cache
>>>>>>>        controller
>>>>>>>        >> via the reset manager MPUMODRST L2 bit, however this
>>>>>>> causes bus
>>>>>>>        hang even
>>>>>>>        >> if executed entirely from L1 I-cache to avoid generating
>>>>>>> any bus
>>>>>>>        traffic
>>>>>>>        >> through the L2 cache controller.
>>>>>>>        >>
>>>>>>>        >> This patch thus configures and enables the L2 cache
>>>>>>> controller
>>>>>>>        very early
>>>>>>>        >> in the SPL boot process, clears the L2 cache and disables
>>>>>>> the L2
>>>>>>>        cache
>>>>>>>        >> controller again.
>>>>>>>        >>
>>>>>>>        >> The reason for doing it in SPL is because we need to avoid
>>>>>>>        accessing any
>>>>>>>        >> of the potentially stale data in the L2 cache, and we are
>>>>>>> certain
>>>>>>>        any of
>>>>>>>        >> the stale data will be below the OCRAM address range. To
>>>>>>> further
>>>>>>>        reduce
>>>>>>>        >> bus traffic during the L2 cache invalidation, we enable L1
>>>>>>>        I-cache and
>>>>>>>        >> run the invalidation code entirely out of the L1 I-cache.
>>>>>>>        >>
>>>>>>>        >> Signed-off-by: Marek Vasut <marex at denx.de
>>>>>>> <mailto:marex at denx.de>>
>>>>>>>        >> Cc: Dalon Westergreen <dwesterg at gmail.com
>>>>>>>        <mailto:dwesterg at gmail.com>>
>>>>>>>        >> Cc: Dinh Nguyen <dinguyen at kernel.org
>>>>>>> <mailto:dinguyen at kernel.org>>
>>>>>>>        >
>>>>>>>        > We've tested this and it seems to fix the issue, so:
>>>>>>>        >
>>>>>>>        > Tested-by: Simon Goldschmidt
>>>>>>> <simon.k.r.goldschmidt at gmail.com
>>>>>>>        <mailto:simon.k.r.goldschmidt at gmail.com>>
>>>>>
>>>>> Seems like I was a little fast there: the patch does not apply cleanly
>>>>> to master. It did apply to our tree, obviously.
>>>>>
>>>>> It's missing an include to <asm/pl310.h> and 'pl310' is undefined.
>>>>> Probably a result of rebasing it...
>>>>>
>>>>>>>        >
>>>>>>>        > However, I don't really get why clearing the L2 cache
>>>>>>> later in
>>>>>>> U-Boot
>>>>>>>        > isn't good enough.
>>>>>>>
>>>>>>>        Because when U-Boot is running, it is already running from
>>>>>>> RAM,
>>>>>>> which is
>>>>>>>        on different PL310 master than OCRAM, and you're likely to
>>>>>>> hit the
>>>>>>>        corrupted cache lines on the DRAM master which is primed by
>>>>>>> prior Linux
>>>>>>>        operation. Such cacheline can be hit between the code
>>>>>>> enabling
>>>>>>> the PL310
>>>>>>>        and the code invalidating it, which is a C code, using
>>>>>>> stack and
>>>>>>> calling
>>>>>>>        functions, thus accessing memory which would likely reside in
>>>>>>> different
>>>>>>>        PL310 cachelines. If one of those cachelines contains
>>>>>>> invalid/corrupted
>>>>>>>        data, they can be provided to the CPU before the cacheline is
>>>>>>>        invalidated.
>>>>>>>
>>>>>>>        To further reduce the likelihood of hitting any such cache
>>>>>>> line,
>>>>>>> the
>>>>>>>        code which enables the PL310 and invalidates it runs from
>>>>>>> primed L1
>>>>>>>        icache lines, thus generating no bus traffic at all.
>>>>>>>
>>>>>>>
>>>>>>> Ah, right. I somehow didn't realize that invalidating is done after
>>>>>>> enabling:-)
>>>>>>
>>>>>> Right. If it could be done before (like e.g. by whacking the L2CC
>>>>>> reset), that'd be fantastic.
>>>>>
>>>>> I haven't yet found a way to cleanly reproduce this on my socrates, so
>>>>> unfortunately, I currently cannot test any suggestions, right now :-(
>>>>
>>>> Did you ever have this hang happen on SoCrates ?
>>>
>>> I think so, yes.
>>>
>>> What I did was running some lmb tests loading from mmc, removing the
>>> sdcard to write an updated version of U-Boot, replacing it and then
>>> typing 'reset'. It would then sometimes hang in cache initialization - I
>>> added debug prints throughout cache initialization and was debug prints
>>> from one function and then nothing more after a simple function call...
>>>
>>> That would very much fit to your explanation. However, I cannot
>>> reproduce this currently :-(
>>
>> So basically a board which uses DMA (like the DWMMC DMA) is likely to be
>> affected. Surely, the DMA does trigger PL310 operations.
>>
>>>>> Wouldn't it be better to place this function in some central position
>>>>> near the cache initialization code instead of running it from SPL?
>>>>> Or is
>>>>> it required to run it in SPL?
>>>>
>>>> It's required in SPL as soon as possible, to avoid accessing the DRAM,
>>>> see above. The current location is close to the rest of PL310 init in
>>>> SPL.
>>>
>>> OK, I just did not understand why this is necessary before accessing
>>> DRAM.
>>
>> :)
> 
> Sorry, I phrased that wrong. I *do* (still) not understand why this is
> necessary before accessing DRAM. Sorry if I'm annoying you with that,
> but I'd like to understand.

Because we know that once the L2 cache is enabled and until it's content
is invalidated, it may return stale entries (*). The stale entries in
the L2 cache can only come from a cacheable area. Linux treats the DRAM
as cacheable area, the registers and OCRAM are not cacheable. Therefore,
if we do not fetch anything from DRAM until the L2 cache is invalidated,
we will not get any stale entries in return, and thus we are safe.

The best way to assure that we don't access the DRAM is to execute from
OCRAM, so to do the invalidation in SPL. The best way to assure we won't
generate any traffic on the bus is to execute the code from primed L1
I-cache lines.

* U-Boot is complex code and spans multiple L2 cache lines. The code
which enables L2 cache can very well reside in a different L2 cache line
than the code which invalidates the L2 cache. Furthermore, the code also
spans multiple L1 I-cache lines.

The code which enables the L2 cache is likely in active L1 I-cache line,
however the code which invalidates the L2 cache is not in L1 I-cache
yet, so it needs to be fetched. But (!) now the L2 cache is enabled, so
the system first checks whether the code might be in the L2 cache ; if
there is a stale entry in just the right L2 cache line, the L2 cache
will refill the L1 I-cache with undefined stale content, which the CPU
will happily execute and crash.

Does it make sense now ? Just keep asking until it does make sense ...

> Does the DWMMC DMA really trigger PL310 operations?

No

> I thought DMA would only affect cache if it goes through ACP/SCU?

Correct

I suspect shuffling data between the SD card (DWMMC puts data directly
in DRAM using DMA) and CPU (which looks at the DRAM through multiple
levels of caches, so it has to invalidate/refill/flush the caches to
access the data delivered by DWMMC DMA in DRAM) just primes the L2 cache
with the right kind of entries to trigger the issue.

[...]

-- 
Best regards,
Marek Vasut