[U-Boot] [PATCH 5/7] riscv: add support for multi-hart systems
Rick Chen
rickchen36 at gmail.com
Tue Mar 12 01:15:02 UTC 2019
Hi Lukas
Auer, Lukas <lukas.auer at aisec.fraunhofer.de> 於 2019年3月11日 週一 上午2:12寫道:
>
> On Sun, 2019-03-10 at 20:24 +0530, Anup Patel wrote:
> > On Sun, Mar 10, 2019 at 7:28 PM Auer, Lukas
> > <lukas.auer at aisec.fraunhofer.de> wrote:
> > > Hi Rick,
> > >
> > > On Thu, 2019-03-07 at 17:30 +0800, Rick Chen wrote:
> > > > Hi Lukas
> > > >
> > > > > > From: Lukas Auer [mailto:lukas.auer at aisec.fraunhofer.de]
> > > > > > Sent: Tuesday, February 12, 2019 6:14 AM
> > > > > > To: u-boot at lists.denx.de
> > > > > > Cc: Atish Patra; Anup Patel; Bin Meng; Andreas Schwab; Palmer
> > > > > > Dabbelt;
> > > > > > Alexander Graf; Lukas Auer; Anup Patel; Rick Jian-Zhi
> > > > > > Chen(陳建志);
> > > > > > Baruch Siach;
> > > > > > Stefan Roese
> > > > > > Subject: [PATCH 5/7] riscv: add support for multi-hart
> > > > > > systems
> > > > > >
> > > > > > On RISC-V, all harts boot independently. To be able to run on
> > > > > > a
> > > > > > multi-hart system,
> > > > > > U-Boot must be extended with the functionality to manage all
> > > > > > harts in the
> > > > > > system. A new config option, CONFIG_MAIN_HART, is used to
> > > > > > select
> > > > > > the hart
> > > > > > U-Boot runs on. All other harts are halted.
> > > > > > U-Boot can delegate functions to them using
> > > > > > smp_call_function().
> > > > > >
> > > > > > Every hart has a valid pointer to the global data structure
> > > > > > and a
> > > > > > 8KiB stack by
> > > > > > default. The stack size is set with CONFIG_STACK_SIZE_SHIFT.
> > > > > >
> > > > > > Signed-off-by: Lukas Auer <lukas.auer at aisec.fraunhofer.de>
> > > > > > ---
> > > > > >
> > > > > > arch/riscv/Kconfig | 12 +++++
> > > > > > arch/riscv/cpu/start.S | 102
> > > > > > ++++++++++++++++++++++++++++++++++-
> > > > > > arch/riscv/include/asm/csr.h | 1 +
> > > > > > 3 files changed, 114 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index
> > > > > > 3a51339c4d..af8d0f8d67
> > > > > > 100644
> > > > > > --- a/arch/riscv/Kconfig
> > > > > > +++ b/arch/riscv/Kconfig
> > > > > > @@ -140,4 +140,16 @@ config SBI_IPI
> > > > > > default y if RISCV_SMODE
> > > > > > depends on SMP
> > > > > >
> > > > > > +config MAIN_HART
> > > > > > + int "Main hart in system"
> > > > > > + default 0
> > > > > > + help
> > > > > > + Some SoCs include harts of various sizes, some of
> > > > > > which
> > > > > > might not
> > > > > > + be suitable for running U-Boot. CONFIG_MAIN_HART is
> > > > > > used
> > > > > > to select
> > > > > > + the hart U-Boot runs on.
> > > > > > +
> > > > > > +config STACK_SIZE_SHIFT
> > > > > > + int
> > > > > > + default 13
> > > > > > +
> > > > > > endmenu
> > > > > > diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
> > > > > > index
> > > > > > a30f6f7194..ce7230df37 100644
> > > > > > --- a/arch/riscv/cpu/start.S
> > > > > > +++ b/arch/riscv/cpu/start.S
> > > > > > @@ -13,6 +13,7 @@
> > > > > > #include <config.h>
> > > > > > #include <common.h>
> > > > > > #include <elf.h>
> > > > > > +#include <asm/csr.h>
> > > > > > #include <asm/encoding.h>
> > > > > > #include <generated/asm-offsets.h>
> > > > > >
> > > > > > @@ -45,6 +46,23 @@ _start:
> > > > > > /* mask all interrupts */
> > > > > > csrw MODE_PREFIX(ie), zero
> > > > > >
> > > > > > +#ifdef CONFIG_SMP
> > > > > > + /* check if hart is within range */
> > > > > > + /* s0: hart id */
> > > > > > + li t0, CONFIG_NR_CPUS
> > > > > > + bge s0, t0, hart_out_of_bounds_loop
> > > > > > +#endif
> > > > > > +
> > > > > > +#ifdef CONFIG_SMP
> > > > > > + /* set xSIE bit to receive IPIs */
> > > > > > +#ifdef CONFIG_RISCV_MMODE
> > > > > > + li t0, MIE_MSIE
> > > > > > +#else
> > > > > > + li t0, SIE_SSIE
> > > > > > +#endif
> > > > > > + csrs MODE_PREFIX(ie), t0
> > > > > > +#endif
> > > > > > +
> > > > > > /*
> > > > > > * Set stackpointer in internal/ex RAM to call board_init_f
> > > > > > */
> > > > > > @@ -56,7 +74,25 @@ call_board_init_f:
> > > > > > call_board_init_f_0:
> > > > > > mv a0, sp
> > > > > > jal board_init_f_alloc_reserve
> > > > > > +
> > > > > > + /*
> > > > > > + * Set global data pointer here for all harts,
> > > > > > uninitialized at this
> > > > > > + * point.
> > > > > > + */
> > > > > > + mv gp, a0
> > > > > > +
> > > > > > + /* setup stack */
> > > > > > +#ifdef CONFIG_SMP
> > > > > > + /* s0: hart id */
> > > > > > + slli t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > > > > + sub sp, a0, t0
> > > > > > +#else
> > > > > > mv sp, a0
> > > > > > +#endif
> > > > > > +
> > > > > > + /* Continue on main hart, others branch to
> > > > > > secondary_hart_loop */
> > > > > > + li t0, CONFIG_MAIN_HART
> > > > > > + bne s0, t0, secondary_hart_loop
> > > > > >
> > > > > > la t0, prior_stage_fdt_address
> > > > > > SREG s1, 0(t0)
> > > > > > @@ -95,7 +131,14 @@ relocate_code:
> > > > > > *Set up the stack
> > > > > > */
> > > > > > stack_setup:
> > > > > > +#ifdef CONFIG_SMP
> > > > > > + /* s0: hart id */
> > > > > > + slli t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > > > > + sub sp, s2, t0
> > > > > > +#else
> > > > > > mv sp, s2
> > > > > > +#endif
> > > > > > +
> > > > > > la t0, _start
> > > > > > sub t6, s4, t0 /* t6 <- relocation
> > > > > > offset
> > > > > > */
> > > > > > beq t0, s4, clear_bss /* skip relocation */
> > > > > > @@ -175,13 +218,30 @@ clear_bss:
> > > > > > add t0, t0, t6 /* t0 <- rel
> > > > > > __bss_start in
> > > > > > RAM */
> > > > > > la t1, __bss_end /* t1 <- rel __bss_end
> > > > > > in
> > > > > > FLASH */
> > > > > > add t1, t1, t6 /* t1 <- rel __bss_end
> > > > > > in
> > > > > > RAM */
> > > > > > - beq t0, t1, call_board_init_r
> > > > > > + beq t0, t1, relocate_secondary_harts
> > > > > >
> > > > > > clbss_l:
> > > > > > SREG zero, 0(t0) /* clear loop... */
> > > > > > addi t0, t0, REGBYTES
> > > > > > bne t0, t1, clbss_l
> > > > > >
> > > > > > +relocate_secondary_harts:
> > > > > > +#ifdef CONFIG_SMP
> > > > > > + /* send relocation IPI */
> > > > > > + la t0, secondary_hart_relocate
> > > > > > + add a0, t0, t6
> > > > > > +
> > > > > > + /* store relocation offset */
> > > > > > + mv s5, t6
> > > > > > +
> > > > > > + mv a1, s2
> > > > > > + mv a2, s3
> > > > > > + jal smp_call_function
> > > > > > +
> > > > > > + /* restore relocation offset */
> > > > > > + mv t6, s5
> > > > > > +#endif
> > > > > > +
> > > > > > /*
> > > > > > * We are done. Do not return, instead branch to second part
> > > > > > of
> > > > > > board
> > > > > > * initialization, now running from RAM.
> > > > > > @@ -202,3 +262,43 @@ call_board_init_r:
> > > > > > * jump to it ...
> > > > > > */
> > > > > > jr t4 /* jump to
> > > > > > board_init_r()
> > > > > > */
> > > > > > +
> > > > > > +#ifdef CONFIG_SMP
> > > > > > +hart_out_of_bounds_loop:
> > > > > > + /* Harts in this loop are out of bounds, increase
> > > > > > CONFIG_NR_CPUS. */
> > > > > > + wfi
> > > > > > + j hart_out_of_bounds_loop
> > > > > > +#endif
> > > > > > +
> > > > > > +#ifdef CONFIG_SMP
> > > > > > +/* SMP relocation entry */
> > > > > > +secondary_hart_relocate:
> > > > > > + /* a1: new sp */
> > > > > > + /* a2: new gd */
> > > > > > + /* s0: hart id */
> > > > > > +
> > > > > > + /* setup stack */
> > > > > > + slli t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > > > > + sub sp, a1, t0
> > > > > > +
> > > > > > + /* update global data pointer */
> > > > > > + mv gp, a2
> > > > > > +#endif
> > > > > > +
> > > > > > +secondary_hart_loop:
> > > > > > + wfi
> > > > > > +
> > > > > > +#ifdef CONFIG_SMP
> > > > > > + csrr t0, MODE_PREFIX(ip)
> > > > > > +#ifdef CONFIG_RISCV_MMODE
> > > > > > + andi t0, t0, MIE_MSIE
> > > > > > +#else
> > > > > > + andi t0, t0, SIE_SSIE
> > > > > > +#endif
> > > > > > + beqz t0, secondary_hart_loop
> > > > > > +
> > > > > > + mv a0, s0
> > > > > > + jal handle_ipi
> > > >
> > > > I found that s0 maybe corrupted after execute handle_ipi.
> > > > Because smp_function will be treated as a return function by
> > > > compiler,
> > > > so compiler will generate codes to execute restore after
> > > > smp_function().
> > > >
> > > > But actually it is a no-return function. So there maybe no chance
> > > > to
> > > > execute
> > > > restore. And s0 will be corrupted somehow.
> > > >
> > > > The usage of s0 in v2 flow seem the same as v1.
> > > > So I reply mail in v1 patch.
> > > >
> > >
> > > Thanks for reporting this issue!
> > > What compiler version are you using? I never saw this issue with
> > > GCC
> > > 8.2.0, because optimization removes the ret following the call to
> > > smp_function(), it is called using jr. The registers are therefore
> > > restored before jumping to smp_function().
> > >
I found this issue with Andestech's toolchain (GCC version is 7.3.0)
And it's code gen will be as below:
00000e94 <handle_ipi>:
e94: 4785 c.li a5,1
e96: 04a7e663 bltu a5,a0,ee2 <handle_ipi+0x4e>
e9a: 456362ef jal t0,372f0 <__riscv_save_0>
e9e: 84aa c.mv s1,a0
ea0: 3539 c.jal cae <riscv_clear_ipi>
ea2: cd19 c.beqz a0,ec0 <handle_ipi+0x2c>
ea4: 0003d517 auipc a0,0x3d
ea8: 7c450513 addi a0,a0,1988 # 3e668
<freqs.8638+0x430>
eac: 17c320ef jal ra,33028 <printf>
eb0: 0003d517 auipc a0,0x3d
eb4: 7b850513 addi a0,a0,1976 # 3e668
<freqs.8638+0x430>
eb8: 170320ef jal ra,33028 <printf>
ebc: 4583606f j 37314 <__riscv_restore_0>
ec0: 47b1 c.li a5,12
ec2: 02f48433 mul s0,s1,a5 s0 : 1 => 0xc
ec6: 003407b3 add a5,s0,gp
eca: 0bc7a903 lw s2,188(a5) # 800000bc
<__bss_end+0x7ff8f718>
ece: 3b21 c.jal be6 <invalidate_icache_all>
ed0: 008187b3 add a5,gp,s0
ed4: 0c47a603 lw a2,196(a5)
ed8: 0c07a583 lw a1,192(a5)
edc: 8526 c.mv a0,s1
ede: 9902 c.jalr s2 s2=0x3ff8f194 , s0=0xcsou
ee0: bff1 c.j ebc <handle_ipi+0x28>
ee2: 8082 c.jr ra
As I said last mail, compiler will treat smp_function() as a return
call, so restore() will be executed after it.
This behavior shall comply with the calling convention.
Thanks for your understanding and tolerant.
> > > This system is probably too fragile. A simple solution would be to
> > > store the hart ID somewhere else, for example register tp. What do
> > > you
> > > think?
> >
> > I think you can also use scratch/mscratch (if it is not used anywhere
> > else).
> >
>
> You are right, sscratch and mscratch are also not currently used in
> U-Boot. Is there an advantage to use the scratch CSRs instead of tp?
>
I will prefer to use tp.
xscratch shall consider mode distinguish additionally.
Rick
> Thanks,
> Lukas
More information about the U-Boot
mailing list