[U-Boot] [PATCH 1/2] armv8: add hooks for all cache-wide operations

Wed Oct 19 17:36:52 CEST 2016

On 10/18/2016 08:41 PM, Simon Glass wrote:
> Hi Stephen,
>
> On 18 October 2016 at 17:33, Stephen Warren <swarren at wwwdotorg.org> wrote:
>> On 10/18/2016 05:08 PM, Simon Glass wrote:
>>>
>>> Hi Stephen,
>>>
>>> On 18 October 2016 at 16:54, Stephen Warren <swarren at wwwdotorg.org> wrote:
>>>>
>>>> On 10/18/2016 01:56 PM, Simon Glass wrote:
>>>>>
>>>>>
>>>>> Hi Stephen,
>>>>>
>>>>> On 18 October 2016 at 13:10, Stephen Warren <swarren at wwwdotorg.org>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On 10/18/2016 01:03 PM, Simon Glass wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi Stephen,
>>>>>>>
>>>>>>> On 18 October 2016 at 12:58, Stephen Warren <swarren at wwwdotorg.org>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/18/2016 10:23 AM, Simon Glass wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Stephen,
>>>>>>>>>
>>>>>>>>> On 17 October 2016 at 15:35, Stephen Warren <swarren at wwwdotorg.org>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> From: Stephen Warren <swarren at nvidia.com>
>>>>>>>>>>
>>>>>>>>>> SoC-specific logic may be required for all forms of cache-wide
>>>>>>>>>> operations; invalidate and flush of both dcache and icache (note
>>>>>>>>>> that
>>>>>>>>>> only 3 of the 4 possible combinations make sense, since the icache
>>>>>>>>>> never
>>>>>>>>>> contains dirty lines). This patch adds an optional hook for all
>>>>>>>>>> implemented cache-wide operations, and renames the one existing
>>>>>>>>>> hook
>>>>>>>>>> to
>>>>>>>>>> better represent exactly which operation it is implementing. A
>>>>>>>>>> dummy
>>>>>>>>>> no-op implementation of each hook is provided. These dummy
>>>>>>>>>> implementations are moved into C code, since there's no need to
>>>>>>>>>> implement them in assembly.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Stephen Warren <swarren at nvidia.com>
>>>>>>>>>> ---
>>>>>>>>>>  arch/arm/cpu/armv8/cache.S                   |  6 ------
>>>>>>>>>>  arch/arm/cpu/armv8/cache_v8.c                | 23
>>>>>>>>>> ++++++++++++++++++++---
>>>>>>>>>>  arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S |  4 ++--
>>>>>>>>>>  arch/arm/include/asm/system.h                |  5 ++++-
>>>>>>>>>>  arch/arm/mach-tegra/tegra186/cache.c         |  2 +-
>>>>>>>>>>  5 files changed, 27 insertions(+), 13 deletions(-)
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think we should have a proper interface for this stuff rather than
>>>>>>>>> weak functions. Maybe we need a linker-list approach, or a cache
>>>>>>>>> uclass?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> What's improper about this interface? Presumably we could argue that
>>>>>>>> no
>>>>>>>> function in the entire of U-Boot be called by symbol name, which
>>>>>>>> doesn't
>>>>>>>> seem useful.
>>>>>>>>
>>>>>>>> I'm not sure exactly what you envisage by a linker-list approach. Can
>>>>>>>> you
>>>>>>>> provide some background? I understand how the linker can construct
>>>>>>>> list
>>>>>>>> of
>>>>>>>> objects/implementations/..., but that doesn't seem useful here since
>>>>>>>> there's
>>>>>>>> a known-ahead-of-time single implementation of these functions in a
>>>>>>>> single
>>>>>>>> build of U-Boot.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Your own commit messages says that this is SoC-specific. I'm
>>>>>>> suggesting that we define an interface (which I think you have already
>>>>>>> done with your header file additions), and allow SoCs to implement it
>>>>>>> via a linker list.
>>>>>>>
>>>>>>> IMO the cache code in U-Boot is starting to get a bit creaky.
>>>>>>>
>>>>>>>> A cache uclass seems like /massive/ overkill, especially since I'd
>>>>>>>> expect
>>>>>>>> these very low-level functions to be required well before any higher
>>>>>>>> level
>>>>>>>> code like DM/classes/... to be available.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> DM is available very early. But it's not clear from your patch when
>>>>>>> this code is actually called.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I believe that weak functions are a perfectly acceptable approach here.
>>>>>>
>>>>>> Yes, the implementation of these functions is SoC-specific. The
>>>>>> Makefiles
>>>>>> will pull in the appropriate implementation for that SoC whenever
>>>>>> U-Boot
>>>>>> is
>>>>>> built, just like every other board- or SoC-specific function in the
>>>>>> entire
>>>>>> of U-Boot.
>>>>>>
>>>>>> There's no need for linker lists since there is only ever one
>>>>>> implementation.
>>>>>
>>>>>
>>>>>
>>>>> If there is only ever one implementation, why do you need weak
>>>>> functions?
>>>>
>>>>
>>>>
>>>> As I explicitly stated above, each SoC can have a different
>>>> implementation,
>>>> yet only a single implementation is ever needed for a particular U-Boot
>>>> build.
>>>>
>>>>> Just call them directly.
>>>>
>>>>
>>>>
>>>> The code is doing that, both before and after my patch.
>>>
>>>
>>> I mean call them without needing weak functions.
>>>
>>>>
>>>>> I think in fact you mean that
>>>>> there can be no implementation (but perhaps an empty one), or one
>>>>> implementation. You are effectively using multiple weak functions to
>>>>> provide default code. I think it would be better if this were
>>>>> explicit.
>>>>>
>>>>> I still think that the cache functions could do with a rethink.
>>>>
>>>>
>>>>
>>>> In my opinion, this patch doesn't change the code structure at all. There
>>>> is
>>>> already an interface between the core (L1/L2) cache management code and
>>>> the
>>>> SoC-specific cache management code. That interface already uses a weak
>>>> function to provide the default no-op implementation. This patch doesn't
>>>> change any of that. All this patch does is fix that existing interface to
>>>> cover all 3 combinations of dcache_flush, dcache_invalidate, and
>>>> icache_invalidate, rather than just one of those combinations. It's more
>>>> of
>>>> a bug-fix than anything else.
>>>
>>>
>>> Yes I see that.
>>>
>>>>
>>>> If you want to rework this interface sometime, be my guest. However, I
>>>> don't
>>>> think it's fair to require that someone who simply wants to fix the
>>>> existing
>>>> code (in a way that is orthogonal to your desired interface refactoring)
>>>> do
>>>> that refactoring first, rather than doing it yourself.
>>>
>>>
>>> I understand what you are saying, but isn't that how open source
>>> software works? Believe me, I have done my fair share of refactoring
>>> :-)
>>>
>>> At least can you look at not making it any harder to fix up later? The
>>> more we pile onto this interface, the harder it will be later. We
>>> should aim to make ARMv8 really nice as it is the new thing.
>>
>>
>> I believe moving one or three functions into any replacement scheme will
>> have an identical level of difficulty. So, I believe the patch already
>> satisfies the "no harder" requirement.
>
> Well it seems your mind is already made up!

Well, I don't believe there are any viable or reasonable alternatives. 
I'm not prolonging this thread because I find it enjoyable, but because 
of the lack of alternatives to what this patch does.

Doing this via DM doesn't simplify anything or make it more 
maintainable; it's just using DM for the sake of it. DM is great when it 
makes life simpler and code more maintainable, but we use it because of 
those benefits, not just for the sake of it. Using DM adds complexity, 
and so there has to be a benefit of doing so. I don't believe there is here.

Doing this by linker scripts simply doesn't make sense:

Any given U-Boot binary will only contain a single implementation of 
these functions, so there's no need for any kind of runtime lookup, and 
if there was a runtime lookup, what would be the key and where would it 
come from? I believe we'd still have some compile-time-seledted 
function/data to drive the runtime lookup process, and so we'd be in 
exactly the same situation as we already are, just with more complexity 
added on top.

Using linker scripts to switch between implementations at compile time 
is much more complex than just letting the linker link stuff together 
directly. That's what it's good at. Using linker scripts would just add 
extra complexity without any benefit. We'd still end up with a single 
implementation in the binary, yet to call it code would have to indirect 
through the linker-defined table, rather than simply calling the same 
code by symbol name. Uggh!

Note that per the discussions in other forks of this thread, it's likely 
we'll need to code all these cache operations in assembly to avoid 
accessing DRAM while turning the cache on/off. This implies to me that 
we'll need to keep all the cache code extremely simple and direct, 
without any form of runtime or compile time indirection.