[PATCH v5 27/33] riscv: Fix race conditions when initializing IPI
Rick Chen
rickchen36 at gmail.com
Thu Mar 5 03:18:01 CET 2020
Hi Sean
> Hi Sean
>
> > On Mon, 2020-03-02 at 10:43 -0500, Sean Anderson wrote:
> >
> > > On 3/2/20 4:08 AM, Rick Chen wrote:
> > > > Hi Sean
> > > >
> > > > > The IPI code could have race conditions in several places.
> > > > > * Several harts could race on the value of gd->arch->clint/plic
> > > > > * Non-boot harts could race with the main hart on the DM subsystem In
> > > > > addition, if an IPI was pending when U-Boot started, it would cause the
> > > > > IPI handler to jump to address 0.
> > > > >
> > > > > To address these problems, a new function riscv_init_ipi is introduced. It
> > > > > is called once during arch_cpu_init_dm. Before this point, no riscv_*_ipi
> > > > > functions may be called. Access is synchronized by gd->arch->ipi_ready.
> > > > >
> > > > > Signed-off-by: Sean Anderson <seanga2 at gmail.com>
> > > > > ---
> > > > >
> > > > > Changes in v5:
> > > > > - New
> > > > >
> > > > > arch/riscv/cpu/cpu.c | 9 ++++
> > > > > arch/riscv/include/asm/global_data.h | 1 +
> > > > > arch/riscv/include/asm/smp.h | 43 ++++++++++++++++++
> > > > > arch/riscv/lib/andes_plic.c | 34 +++++---------
> > > > > arch/riscv/lib/sbi_ipi.c | 5 ++
> > > > > arch/riscv/lib/sifive_clint.c | 33 +++++---------
> > > > > arch/riscv/lib/smp.c | 68 ++++++++--------------------
> > > > > 7 files changed, 101 insertions(+), 92 deletions(-)
> > > > >
> > > > > diff --git a/arch/riscv/cpu/cpu.c b/arch/riscv/cpu/cpu.c
> > > > > index e457f6acbf..a971ec8694 100644
> > > > > --- a/arch/riscv/cpu/cpu.c
> > > > > +++ b/arch/riscv/cpu/cpu.c
> > > > > @@ -96,6 +96,15 @@ int arch_cpu_init_dm(void)
> > > > > csr_write(CSR_SATP, 0);
> > > > > }
> > > > >
> > > > > +#ifdef CONFIG_SMP
> > > > > + ret = riscv_init_ipi();
> > > > > + if (ret)
> > > > > + return ret;
> > > > > +
> > > > > + /* Atomically set a flag enabling IPI handling */
> > > > > + WRITE_ONCE(gd->arch.ipi_ready, 1);
> > > >
> > > > I think it shall not have race condition here.
> > > > Can you explain more detail why there will occur race condition ?
> > > >
> > > > Hi Lukas
> > > >
> > > > Do you have any comments ?
> > > >
> > > > Thanks
> > > > Rick
> > >
> > > On the K210, there may already be an IPI pending when U-Boot starts.
> > > (Perhaps the prior stage sends an IPI but does not clear it). As soon as
> > > interrupts are enabled, the hart then tries to call riscv_clear_ipi().
> > > Because the clint/plic has not yet been enabled, the clear_ipi function
> > > will try and bind/probe the device. This can have really nasty effects, since
> > > the boot hart is *also* trying to bind/probe devices.
> > >
> > > In addition, a hart could end up trying to probe the clint/plic because
> > > it could receive the IPI before (from its perspective) gd->arch.clint
> > > (or plic) gets initialized.
> > >
> >
> > We did not have a problem with pending IPIs on other platforms. It
> > should suffice to clear SSIP / MSIP before enabling the interrupts.
> >
>
> Can you try to clear mip/sip in startup flow before secondary_hart_loop:
> Maybe it can help to overcome the problem of calling riscv_clear_ipi()
> before riscv_init_ipi() that you added.
How about the verified result ?
Is this can help to fix your problem without any modification (the
original flow)
Avoid unexpected condition can increase the robustness indeed.
But clarify the root cause more precisely is still necessary here.
Thanks,
Rick
>
> Thanks,
> Rick
>
> > > Aside from the above, I think the macro approach is a bit confusing,
> > > since it's unclear at first glance what function will be initializing
> > > the clint/plic. Given U-Boot's otherwise completely SMP-unsafe design, I
> > > think it's better to be explicit and conservative in these areas.
> > >
> >
> > I agree, the patch makes this more clear and helps make the code more
> > robust.
> >
> > Thanks,
> > Lukas
More information about the U-Boot
mailing list