[U-Boot] [Resend RFC PATCH 1/2] armv8: Fix dcache disable function

Thu Oct 20 07:06:51 CEST 2016

On 10/19/2016 06:01 PM, Stephen Warren wrote:
> On 10/19/2016 04:32 PM, york sun wrote:
>> On 10/19/2016 12:18 PM, Stephen Warren wrote:
>>> On 10/19/2016 09:25 AM, Stephen Warren wrote:
>>>> On 10/14/2016 02:17 PM, York Sun wrote:
>>>>> Current code turns off d-cache first, then flush all levels of cache.
>>>>> This results data loss. As soon as d-cache is off, the dirty cache
>>>>> is discarded according to the test on LS2080A. This issue was not
>>>>> seen as long as external L3 cache was flushed to push the data to
>>>>> main memory. However, external L3 cache is not guaranteed to have
>>>>> the data. To fix this, flush the d-cache by way/set first to make
>>>>> sure cache is clean before turning it off.
>>>>
>>>>> diff --git a/arch/arm/cpu/armv8/cache_v8.c
>>>>> b/arch/arm/cpu/armv8/cache_v8.c
>>>>
>>>>> @@ -478,9 +478,9 @@ void dcache_disable(void)
>>>>
>>>>> +    flush_dcache_all();
>>>>>      set_sctlr(sctlr & ~(CR_C|CR_M));
>>>>>
>>>>> -    flush_dcache_all();
>>>>>      __asm_invalidate_tlb_all();
>>>>
>>>> I talked to Mark Rutland at ARM, and I believe the current code is
>>>> correct. Here's my interpretation of what he said:
>>>>
>>>> The dcache must be disabled first. This prevents allocation of new
>>>> entries in the cache during the flush operation, which prevents the race
>>>> conditions that were mentioned in the other thread.
>>>>
>>>> Then, the flush operation must be invoked. Since the cache is now
>>>> disabled, this can fully flush the cache without worrying about racing
>>>> with things being added to the cache.
>>>>
>>>> This all implies that the implementation of dcache_disable(),
>>>> set_sctlr(), flush_dcache_all(), and any code they call must not access
>>>> data in DRAM at all; since because the dcache is off, any DRAM access
>>>> will[1] read potentially stale data from DRAM, rather than any dirty
>>>> data that might be in the cache.
>>>>
>>>> [1] I'm not sure if that's "will" or "may", i.e. whether this is
>>>> architecturally guaranteed in ARMv8 or is implementation defined. At
>>>> least the Cortex A72 TRM says "will" for that CPU; not sure about others.
>>>>
>>>> Perhaps the most obvious upshot of this is that the stack can't be used.
>>>> This implies to me that we need to recode all those functions purely in
>>>> assembly, or just possibly play some tricks to 100% force gcc not to
>>>> touch memory anywhere inside dcache_disable() or the functions it calls.
>>>> We're just getting lucky here right now since everything happens to be
>>>> inlined, but I don't think we're doing anything to 100% guarantee this.
>>>>
>>>> What worries me here is that at least on Tegra, a "flush the entire
>>>> dcache" operation requires an SMC call to the secure monitor. That will
>>>> certainly access DRAM when the secure monitor runs, but perhaps this
>>>> doesn't matter since that's at a different exception level, and we know
>>>> the secure monitor accesses DRAM regions that are separate from U-Boot's
>>>> DRAM? I suspect life isn't that convenient. I'm wondering if this all
>>>> implies that, like patch 2 in this series, we need to get 100% away from
>>>> flush-by-set/way, even with SoC-specific hooks to make that work
>>>> reliably, and just flush everything by VA, which IIRC is architecturally
>>>> guaranteed to work without SoC-specific logic. That way, we can
>>>> encapsulate everything into an assembly function without worrying about
>>>> calling SMCs or SoC-specific hook functions without using DRAM. Of
>>>> course, how that assembly function knows which VAs to flush without
>>>> decoding the page tables or other data structure is another matter:-(
>>>
>>> Re: the last paragraph there:
>>>
>>> After reading the ARMv8 ARM, I see that EL1, EL2, and EL3 all have
>>> separate cache enable bits in SCTLR_ELx. I believe U-Boot only needs to
>>> flush/... its own RAM out of the dcache, since that's all that's
>>> relevant at the EL U-Boot is running at, and doesn't care about anything
>>> EL3 might have mapped cached. So, it's safe to invoke SMCs from the
>>> cache flush code in U-Boot even if the EL3 code touches its own DRAM.
>>> There might be a corner case where this isn't true if EL3 has some
>>> EL1/EL2-owned RAM mapped, but I don't expect that to be the case here.
>>>
>>> Re: my other series to add more cache hooks: I'll re-implement the Tegra
>>> hook in assembly so it's guaranteed not to touch RAM, retest, and resumbit.
>>>
>>> If it turns out that dcache_disable() ever starts touching DRAM at the
>>> wrong time, we can deal with that then; it doesn't now at least.
>>>
>>
>> Stephen,
>>
>> I think one of our difference is whether the data is returned correctly
>> with dirty dcache being disabled. With current code flow, the dcache is
>> disabled, followed by flushing/invalidating by way/set, then L3 cache is
>> flushed. I see correct stack after these steps. During my recent attempt
>> to remove flushing L3, I added code to flush the stack by VA (which I
>> already confirmed the data will end up in main memory) to replace
>> flushing L3. I found as soon as the dcache is disabled, the dirty cache
>> data is lost when trying to access from the core (debugged by JTAG
>> tool). This makes me to think the order may be wrong. By reverting the
>> order, i.e. flushing dcache first then disable it, I can see correct
>> data in main memory. I understand the proposed new code requires not
>> touching stack or any data after the first flush. I prefer to not to
>> make this change if we have an explanation of what I saw.
>
> I think the confusion is: what does "lost" mean?
>
> On ARMv8 at least:
>
> Starting off with dcache enabled and containing some dirty data:
>
> CPU reads will read the dirty data from the cache.
>
> Now if dcache is disabled:
>
> CPU reads will be of the uncached/uncachable type since dcache is off.
> The dcache will not respond. So, the CPU will receive (potentially)
> stale data directly from RAM.
>
> In this state, it is not possible to access the dirty data in the
> caches. However, it's not (permanently) lost...
>
> Now if the dcache is fully flushed:
>
> All dirty data that was in the dcache will be written to RAM. After this
> process is complete, any reads from RAM will return the most up-to-date
> possible data; the temporarily "lost" dirty data is now no longer lost.
>
>
> The cache disable process must proceed as above; we can't flush the
> dcache and then disable it. Attempting to do things in that order would
> introduce race conditions. At all times before the dcache is disabled,
> any CPU writes to memory will allocate new lines in the dcache. Also,
> the dcache is fully responsive to the cache coherency protocol, so some
> other agent on the bus (another CPU, cluster, DMA agent, ...) could
> cause a (clean or dirty) line to be brought into the cache. This might
> happen after that part of the cache would have been flushed. In other
> words, it's impossible to fully flush the cache (by set/way) while it's
> enabled.
>
>
> I'm not sure that flushing by VA instead of by set/way solves anything
> much here. I believe flushing by VA runs less risk of interference due
> to the coherency protocol bringing lines back into the cache, since
> flush by VA will affect other caches? However this still doesn't allow
> use of the CPU's own stack or other DRAM writes; if we:
>
> a) Flush by VA
> (this step is assumed to perform some function calls/returns, or other
> DRAM writes)
>
> b) Disable cache
>
> ... then irrespective of how step (a) was performed, since the DRAM
> writes performed in that step are likely still in the cache as dirty
> data, so step (b) will cause them to be "lost" until a full flush
> operation is performed. Thus we must always flush after disabling the
> cache, so there's no point also flushing before.
>
>
Stephen,

Sorry for late response. I am still traveling...
I understand the data in dirty cache is not lost when the dcache is 
disabled. It is just not accessible. In my case, after flushing L1/L2 by 
way/set, the data is probably in L3 cache. Without flushing L3, I have 
stalled data (including the stack) in main memory.

My previous test was trying to prove I can skip flushing L3 if I flush 
the necessary VA. Now I when recall it carefully, I think I made a 
mistake by flushing by VA _after_ flushing the cache by way/set. I might 
have a positive result if I flushed the cache by VA first. I will repeat 
this test when I get back to prove this theory.

That being said, do you or anyone see any value by skipping flushing L3? 
My target was to avoid making a SMC call for EL2/EL1, and to avoid 
implementing flushing L3 function in U-Boot (even we already have one 
for CCN-504). I understand this is different from what dcache_disable() 
implies.

York