ARM A53 and initial MMU mapping for EL0/1/2/3 ?

Andre Przywara andre.przywara at arm.com
Thu Feb 17 16:13:36 CET 2022


On Fri, 11 Feb 2022 17:00:48 +0000
Joakim Tjernlund <Joakim.Tjernlund at infinera.com> wrote:

Hi,

> On Fri, 2022-02-11 at 15:00 +0100, Joakim Tjernlund wrote:
> > On Fri, 2022-02-11 at 01:26 +0000, Andre Przywara wrote:  
> > > On Fri, 11 Feb 2022 00:22:25 +0000
> > > Joakim Tjernlund <Joakim.Tjernlund at infinera.com> wrote:
> > >   
> > > > On Thu, 2022-02-10 at 22:43 +0000, Andre Przywara wrote:  
> > > > > On Thu, 10 Feb 2022 21:58:30 +0000
> > > > > Joakim Tjernlund <Joakim.Tjernlund at infinera.com> wrote:
> > > > > 
> > > > > Hi,
> > > > >     
> > > > > > On Thu, 2022-02-10 at 10:22 +0000, Andre Przywara wrote:    
> > > > > > > On Wed, 9 Feb 2022 12:03:47 +0000
> > > > > > > Joakim Tjernlund <Joakim.Tjernlund at infinera.com> wrote:
> > > > > > > 
> > > > > > > Hi,
> > > > > > >       
> > > > > > > > On Wed, 2022-02-09 at 10:45 +0000, Andre Przywara wrote:      
> > > > > > > > > On Wed, 9 Feb 2022 08:35:04 +0000
> > > > > > > > > Joakim Tjernlund <Joakim.Tjernlund at infinera.com> wrote:
> > > > > > > > > 
> > > > > > > > > Hi,
> > > > > > > > >         
> > > > > > > > > > On Wed, 2022-02-09 at 00:33 +0000, Andre Przywara wrote:        
> > > > > > > > > > > On Tue, 8 Feb 2022 22:05:00 +0000
> > > > > > > > > > > Joakim Tjernlund <Joakim.Tjernlund at infinera.com> wrote:
> > > > > > > > > > > 
> > > > > > > > > > > Hi Joakim,
> > > > > > > > > > >           
> > > > > > > > > > > > Trying to figure out how I should map the MMU for normal RAM so it acessible
> > > > > > > > > > > > from all ELx security states.          
> > > > > > > > > > > 
> > > > > > > > > > >        ^^^^^^^
> > > > > > > > > > > 
> > > > > > > > > > > This does not make much sense. U-Boot is typically running in one
> > > > > > > > > > > exception level only, and sets up the page table for exactly that EL.
> > > > > > > > > > > Each EL uses a separate translation regime (with some twists for stage
> > > > > > > > > > > 2 EL2 and combined EL1/0, plus VHE). If you map your memory in EL3, then
> > > > > > > > > > > drop to EL2, the EL3 page tables become irrelevant.
> > > > > > > > > > > 
> > > > > > > > > > > So in U-Boot we just set up the page tables for the EL we are running
> > > > > > > > > > > in, and leave the paging for the lower exception levels to be set up at
> > > > > > > > > > > the discretion of our payloads (kernels, hypervisors).
> > > > > > > > > > > 
> > > > > > > > > > > Please not that *secure* memory is a separate concept, and handled by
> > > > > > > > > > > external hardware, typically using regions, not page tables.          
> > > > > > > > > > 
> > > > > > > > > > I am a beginner w.r.t ARM and Secure/Non secure so thank you for above.
> > > > > > > > > > 
> > > > > > > > > > The problem I have is that I boot a custom SOC into u-boot and when u-boot tries
> > > > > > > > > > to boot linux I get an error exception when u-boot calls armv8_switch_to_el2 to enter linux.        
> > > > > > > > > 
> > > > > > > > > So that means that U-Boot runs in EL3, is that the first and only firmware
> > > > > > > > > that you run? I think the EL3 part of U-Boot is not widely used and tested
> > > > > > > > > beyond the very few platforms that use it.        
> > > > > > > > 
> > > > > > > > Yes, u-boot is first firmware and runs in EL3(ATM, may change once initial bringup is complete) 
> > > > > > > > Maybe u-boot then lacks some critical init? Do you have an example of a board in u-boot
> > > > > > > > that starts in EL3(from reset) using an A53 cpu?       
> > > > > > > 
> > > > > > > As you have probably figured out by now, the whole Layerscape family uses
> > > > > > > that approach. However most other platforms go with Trusted-Firmware as the
> > > > > > > EL3 setup and secure runtime service provider, so the U-Boot EL3 code in
> > > > > > > here is not well tested or looked after. For initial bringup it might be
> > > > > > > OK, but maybe the problems you run into are due to issues in this code.
> > > > > > >       
> > > > > > > > > Do you have the exact address that fails? That should be in ELR, it would
> > > > > > > > > be great if you can pinpoint the exact instruction in macro.h that fails.        
> > > > > > > > 
> > > > > > > > Yes, the address is the first address where kernel is loaded and you can branch there without problems.      
> > > > > > > 
> > > > > > > You mean if you load the kernel and branch to the entry point, it starts
> > > > > > > running, but crashes as soon as it realises that in runs in EL3?
> > > > > > >       
> > > > > > > > It is the eret instruction(last insn in macro armv8_switch_to_el2_m) that fails.      
> > > > > > > 
> > > > > > > Interesting. Maybe there is something missing in the EL2 setup, but my
> > > > > > > understanding is that this is the part that is actually used by
> > > > > > > Layerscape, for instance.
> > > > > > >       
> > > > > > > > > > I think the exception means "Instruction Abort taken without a change in Exception level."
> > > > > > > > > > I was thinking it could be some privilege missing in MMU map.        
> > > > > > > > > 
> > > > > > > > > Could be. One thing that made me wonder is your rather miserly mapping of
> > > > > > > > > only 32MB, which sounds a bit on the small side. Typically we just map the        
> > > > > > > > 
> > > > > > > > We only have 32 MB ATM :( a bit small but it may increase to 64MB      
> > > > > > > 
> > > > > > > That sounds very miserly. Can you actually run an arm64 Linux kernel with
> > > > > > > that little RAM? IIRC for QEMU we need at least 128 MB, and I haven't seen
> > > > > > > an ARMv8 hardware platform with less than 512MB (maybe 256MB) DRAM yet.
> > > > > > >       
> > > > > > > > > whole first DRAM bank, regardless of whether you actually have memory
> > > > > > > > > there or not. U-Boot should know how much DRAM you have, so will not go
> > > > > > > > > beyond that. Having page tables covering more address space does not
> > > > > > > > > really hurt, but avoids all kind of problems.
> > > > > > > > > And please note that U-Boot loves to move things around: itself from the
> > > > > > > > > load address to the end of DRAM (that it knows of); possibly the kernel,
> > > > > > > > > when the alignment is not right, or the DT and initrd if it sees fit.
> > > > > > > > > So there is little point in mapping just portions of the memory.        
> > > > > > > > 
> > > > > > > > U-boot moves around a lot, I know :) In this case u-boot lives
> > > > > > > > in is own 4MB SRAM but kernel lives in a 32MB HyperRAM.      
> > > > > > > 
> > > > > > > Interesting. I wonder if this works well with U-Boot's memory management,
> > > > > > > which assumes it has quite some DRAM to play with.      
> > > > > > 
> > > > > > Found it, all memory spaces were set to secure mode, the req. spec does not agree :(    
> > > > > 
> > > > > Ah, yes, if the DRAM is configured as secure only, running in EL2
> > > > > (always non-secure on the A53) will not end well.
> > > > >     
> > > > > > Anyhow, now kernel enters into EL2 then EL1 to EL0, all is well until kernel tries
> > > > > > to do simple cache ops like dc ivac, x0 or mrs x3,ctr_el0 when I again just get an error exception:
> > > > > >   EXC [0x400] Synchronous Lower EL using AArch64    
> > > > > 
> > > > > Was this with Linux, or some other kernel? IIRC cache maintenance    
> > > > 
> > > > Yes, 5.14.x  
> > > 
> > > Ah, I see. And that really runs with 32MB? I think we need at least
> > > 64MB. Maybe the issues you see are related to that? IIRC the effects can
> > > look rather random.
> > >   
> > > > > instructions in EL0 need to be enabled in SCTLR_EL1 (.UCI and .DZE, for
> > > > > instance, plus maybe more registers), and those and other operations
> > > > > should not be trapped to EL2 as well.    
> > > > 
> > > > SCTLR_EL1 is 0x30500800 and does not seem to match with above. looks like it is kernel that sets this reg?
> > > > how can kernel get that wrong ?  
> > > 
> > > That can't be really the kernel value, because the MMU needs to be on
> > > (bit 0). Is this the reset value, read in U-Boot? The kernel sets those
> > > bits, check the definition of INIT_SCTLR_EL1_MMU_ON in the kernel
> > > source.
> > > Maybe (the generic EL3) U-Boot code misses to set some EL3 registers,
> > > so some stuff is blocked already there, and the kernel is helpless?  
> > 
> > This is before MMU is on, kernel has forced SCTLR_EL1 to ENDIAN_SET_EL1 | SCTLR_EL1_RES1 via INIT_SCTLR_EL1_MMU_OFF
> > I hacked the define to:
> >  #define INIT_SCTLR_EL1_MMU_OFF \
> > -       (ENDIAN_SET_EL1 | SCTLR_EL1_RES1)
> > +       (ENDIAN_SET_EL1 | SCTLR_EL1_RES1 | SCTLR_EL1_DZE | SCTLR_EL1_UCI)
> >  
> > but that didn't change anything. The only thing I can think of is some prep
> > u-boot must do while in EL3 or maybe the A53 core has been oddly wired into the ASIC(own custom ASIC)
> > and changed som default setting in HW ?
> >   
> 
> Found it! A kernel bug actually:
> diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
> index 3198acb2aad8..7f3c87f7a0ce 100644
> --- a/arch/arm64/include/asm/el2_setup.h
> +++ b/arch/arm64/include/asm/el2_setup.h
> @@ -106,7 +106,7 @@
>         msr_s   SYS_ICC_SRE_EL2, x0
>         isb                                     // Make sure SRE is now set
>         mrs_s   x0, SYS_ICC_SRE_EL2             // Read SRE back,
> -       tbz     x0, #0, 1f                      // and check that it sticks
> +       tbz     x0, #0, .Lskip_gicv3_\@         // and check that it sticks
>         msr_s   SYS_ICH_HCR_EL2, xzr            // Reset ICC_HCR_EL2 to defaults
>  .Lskip_gicv3_\@:
>  .endm
> 
> branching to 1f got you way off and into el0 when you were supposed be in el2/el1 still.

Well, as the list confirmed, that is indeed a bug, but the more
important question is why. The bug wasn't noticed because this is some
kind of error path only anyway, so any sane setup wouldn't trigger this.

So do you actually enable the EL3 GICv3 setup in your U-Boot build?
That would be CONFIG_GICV3, and you need to define the GICD base
address, I believe.
But as Marc hinted already, this code is not well tested. Not sure it
covers the WAKER setup that the GIC500 requires.

> Not sure why GIC init fails there, we got a GIC-500v4 but I think it should pass this test still ?
> If so I guess we need to something with GIC in uboot before booting Linux?
> Any idea what I might be missing?

There is a list of requirements that Linux expects to be fulfilled by
the firmware, check the section "system registers" under
https://www.kernel.org/doc/html/v5.15/arm64/booting.html#call-the-kernel-image

In general the GIC resets to be accessible from secure state only, and
needs to be setup to be usable from non-secure lower ELs. This affects
some GICD registers and some GICv3 system registers. U-Boot *should* do
the basics, but either it's not enabled, or it's missing something.

Cheers,
Andre


More information about the U-Boot mailing list