[PATCH] ARM: Prevent the compiler from using NEON registers

Andre Przywara andre.przywara at arm.com
Mon Aug 16 13:34:06 CEST 2021


On Sun, 15 Aug 2021 22:14:37 -0500
Samuel Holland <samuel at sholland.org> wrote:

Hi,

in general I think the patch makes sense, and we should use that option
since we also specify -msoft-float.

> For ARMv8-A, NEON is standard,

It should be noted that the ARMv8-A architecture itself treats FP and
AdvSIMD as optional, and little cores like Cortex-A53 make this even an
integration option [1]. This also gives another reason for this patch,
as we cannot assume NEON support for *every* core we are compiling for
(even though most A53s out there seem to include NEON).
 
Anyway GCC decides to include both +fp and +simd when the basic (and
probably default) "armv8-a" is used for -march [2], so we must indeed
restrict it explicitly, when we want to avoid it.

[1]https://developer.arm.com/documentation/ddi0500/j/Introduction/Implementation-options
[2]https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html -march=name

> so the compiler can use it even when no
> special target flags are provided. For example, it can use stores from
> NEON registers to zero-initialize large structures. GCC 11 decides to
> do this inside the DRAM init code for the Allwinner H6, which breaks
> boot on that platform, as NEON is not available in SPL.

And that brings up the question: why? The Cortex cores in all Allwinner
SoCs support NEON, and we always clear CPTR_EL3 in start.S, so it
should be usable.
So I did some experiments, and I guess it's our old friend
"unaligned access" again, because SIMD instructions themselves work
(movi, str q0). But the SPL runs with the MMU off, so everything
is device memory, and natural alignment is mandatory, even with SCTLR.A
cleared. "stp q0, q0, [x0]" worked when x0 was 16 bytes aligned, but
hang when it was not. The same applied to "stur q0, [x0]", which is
used with an unaligned offset in the generated code
(https://tpaste.us/qPEw).

So this deserves some more research, for instance to find out if GCC
ignores -mstrict-align here?

Cheers,
Andre

> Fix this by
> restricting the compiler to using GPRs only, not vector registers.
> 
> Signed-off-by: Samuel Holland <samuel at sholland.org>
> ---
>  arch/arm/config.mk | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm/config.mk b/arch/arm/config.mk
> index 16c63e12667..964c6b026ec 100644
> --- a/arch/arm/config.mk
> +++ b/arch/arm/config.mk
> @@ -25,6 +25,7 @@ endif
>  
>  PLATFORM_RELFLAGS += -fno-common -ffixed-r9
>  PLATFORM_RELFLAGS += $(call cc-option, -msoft-float) \
> +		     $(call cc-option,-mgeneral-regs-only) \
>        $(call cc-option,-mshort-load-bytes,$(call cc-option,-malignment-traps,))
>  
>  # LLVM support



More information about the U-Boot mailing list