RISCV: the machanism of available_harts may cause other harts boot failure
Nikita Shubin
nikita.shubin at maquefel.me
Mon Sep 5 09:47:35 CEST 2022
Hi Rick!
On Mon, 5 Sep 2022 14:22:41 +0800
Rick Chen <rickchen36 at gmail.com> wrote:
> Hi,
>
> When I free-run a SMP system, I once hit a failure case where some
> harts didn't boot to the kernel shell successfully.
> However it can't be duplicated anymore even if I try many times.
>
> But when I set a break during debugging with GDB, it can trigger the
> failure case each time.
If hart fails to register itself to available_harts before
send_ipi_many is hit by the main hart:
https://elixir.bootlin.com/u-boot/v2022.10-rc3/source/arch/riscv/lib/smp.c#L50
it won't exit the secondary_hart_loop:
https://elixir.bootlin.com/u-boot/v2022.10-rc3/source/arch/riscv/cpu/start.S#L433
As no ipi will be sent to it.
This might be exactly your case.
> I think the mechanism of available_harts does not provide a method
> that guarantees the success of the SMP system.
> Maybe we shall think of a better way for the SMP booting or just
> remove it ?
I haven't experienced any unexplained problem with hart_lottery or
available_harts_lock unless:
1) harts are started non-simultaneously
2) SPL/U-Boot is in some kind of TCM, OCRAM, etc... which is not cleared
on reset which leaves available_harts dirty
3) something is wrong with atomics
Also there might be something wrong with IPI send/recieve.
>
> Thread 8 hit Breakpoint 1, harts_early_init ()
>
> (gdb) c
> Continuing.
> [Switching to Thread 7]
>
> Thread 7 hit Breakpoint 1, harts_early_init ()
>
> (gdb)
> Continuing.
> [Switching to Thread 6]
>
> Thread 6 hit Breakpoint 1, harts_early_init ()
>
> (gdb)
> Continuing.
> [Switching to Thread 5]
>
> Thread 5 hit Breakpoint 1, harts_early_init ()
>
> (gdb)
> Continuing.
> [Switching to Thread 4]
>
> Thread 4 hit Breakpoint 1, harts_early_init ()
>
> (gdb)
> Continuing.
> [Switching to Thread 3]
>
> Thread 3 hit Breakpoint 1, harts_early_init ()
> (gdb)
> Continuing.
> [Switching to Thread 2]
>
> Thread 2 hit Breakpoint 1, harts_early_init ()
> (gdb)
> Continuing.
> [Switching to Thread 1]
>
> Thread 1 hit Breakpoint 1, harts_early_init ()
> (gdb)
> Continuing.
> [Switching to Thread 5]
>
>
> Thread 5 hit Breakpoint 3, 0x0000000001200000 in ?? ()
> (gdb) info threads
> Id Target Id Frame
> 1 Thread 1 (hart 1) secondary_hart_loop () at
> arch/riscv/cpu/start.S:436 2 Thread 2 (hart 2) secondary_hart_loop
> () at arch/riscv/cpu/start.S:436 3 Thread 3 (hart 3)
> secondary_hart_loop () at arch/riscv/cpu/start.S:436 4 Thread 4
> (hart 4) secondary_hart_loop () at arch/riscv/cpu/start.S:436
> * 5 Thread 5 (hart 5) 0x0000000001200000 in ?? ()
> 6 Thread 6 (hart 6) 0x000000000000b650 in ?? ()
> 7 Thread 7 (hart 7) 0x000000000000b650 in ?? ()
> 8 Thread 8 (hart 8) 0x0000000000005fa0 in ?? ()
> (gdb) c
> Continuing.
Do they all "offline" harts remain in SPL/U-Boot secondary_hart_loop ?
>
>
>
> [ 0.175619] smp: Bringing up secondary CPUs ...
> [ 1.230474] CPU1: failed to come online
> [ 2.282349] CPU2: failed to come online
> [ 3.334394] CPU3: failed to come online
> [ 4.386783] CPU4: failed to come online
> [ 4.427829] smp: Brought up 1 node, 4 CPUs
>
>
> /root # cat /proc/cpuinfo
> processor : 0
> hart : 4
> isa : rv64i2p0m2p0a2p0c2p0xv5-1p1
> mmu : sv39
>
> processor : 5
> hart : 5
> isa : rv64i2p0m2p0a2p0c2p0xv5-1p1
> mmu : sv39
>
> processor : 6
> hart : 6
> isa : rv64i2p0m2p0a2p0c2p0xv5-1p1
> mmu : sv39
>
> processor : 7
> hart : 7
> isa : rv64i2p0m2p0a2p0c2p0xv5-1p1
> mmu : sv39
>
> /root #
>
> Thanks,
> Rick
More information about the U-Boot
mailing list