[PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function

Wed Sep 9 12:16:24 CEST 2020

On 9/9/20 5:01 AM, Rick Chen wrote:
> Hi Sean
> 
>> Hi Sean
>>
>>> Some IPIs may already be pending when U-Boot is started. This could be a
>>> problem if a secondary hart tries to handle an IPI before the boot hart has
>>> initialized the IPI device.
>>>
>>> This commit uses NULL as a sentinel for secondary harts so they know when
>>> the IPI is initialized, and it is safe to use the IPI API. The smp addr
>>> parameter is initialized to NULL by board_init_f_init_reserve. Before this,
>>> secondary harts wait in wait_for_gd_init.
>>>
>>> This imposes a minor restriction because harts may no longer jump to NULL.
>>> However, given that the RISC-V debug device is likely to be located at
>>> 0x400, it is unlikely for any RISC-V implementation to have usable ram
>>> located at 0x0.
>>
>> The ram location of AE350 is at 0x0.

Huh. Does it not have a debug device?

>>
>>>
>>> Signed-off-by: Sean Anderson <seanga2 at gmail.com>
>>> ---
>>>
>>>  arch/riscv/lib/smp.c | 26 ++++++++++++++++++++++----
>>>  1 file changed, 22 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c
>>> index ab6d8bd7fa..8c25755330 100644
>>> --- a/arch/riscv/lib/smp.c
>>> +++ b/arch/riscv/lib/smp.c
>>> @@ -18,6 +18,12 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
>>>         u32 reg;
>>>         int ret, pending;
>>>
>>> +       /* NULL is used as a sentinel value */
>>> +       if (!ipi->addr) {
>>> +               pr_err("smp_function cannot be set to 0x0\n");
>>> +               return -EINVAL;
>>> +       }
>>> +
>>
>> This conflict with memory configurations of AE350.
>> Please check about doc\board\AndesTech\ax25-ae350.rst, and you can
>> find BBL is configured as zero address on AE350 platform.

Ok, that is a strange choice because any accidental NULL-pointer
dereference turns into code modification. In the next revision, I will
add an arch.ipi[reg].valid variable for the same prupose, instead of
re-using addr.

>>>         cpus = ofnode_path("/cpus");
>>>         if (!ofnode_valid(cpus)) {
>>>                 pr_err("Can't find cpus node!\n");
>>> @@ -50,11 +56,16 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
>>>                         continue;
>>>  #endif
>>>
>>> -               gd->arch.ipi[reg].addr = ipi->addr;
>>>                 gd->arch.ipi[reg].arg0 = ipi->arg0;
>>>                 gd->arch.ipi[reg].arg1 = ipi->arg1;
>>>
>>> -               __smp_mb();
> 
> Why do you add this in [PATCH 2/7] but remove it in  [PATCH 3/7] ?

Because conceptually, patch 2 is independent of this patch. It is still
a bug even if this patch is not applied. I think by making this change
over two patches, it is more obvious why the barrier was added, and then
weakened, as opposed to if I made the change in one patch.

> 
> Thanks,
> Rick
> 
>>> +               /*
>>> +                * Ensure addr only becomes non-NULL when arg0 and arg1 are
>>> +                * valid. An IPI may already be pending on other harts, so we
>>> +                * need a way to signal that the IPI device has been
>>> +                * initialized, and that it is ok to call the function.
>>> +                */
>>> +               __smp_store_release(&gd->arch.ipi[reg].addr, ipi->addr);
>>
>> It is too tricky and hack by using zero address to be a signal for the
>> other pending harts waiting the IPI device been initialized.
>>
>>>
>>>                 ret = riscv_send_ipi(reg);
>>>                 if (ret) {
>>> @@ -83,9 +94,16 @@ void handle_ipi(ulong hart)
>>>         if (hart >= CONFIG_NR_CPUS)
>>>                 return;
>>>
>>> -       __smp_mb();
>>> +       smp_function = (void (*)(ulong, ulong, ulong))
>>> +                       __smp_load_acquire(&gd->arch.ipi[hart].addr);
>>> +       /*
>>> +        * If the function is NULL, then U-Boot has not requested the IPI. The
>>> +        * IPI device may not be initialized, so all we can do is wait for
>>> +        * U-Boot to initialize it and send an IPI
>>> +        */
>>> +       if (!smp_function)
>>> +               return;
>>
>> It will boot BBL+Kernel payload fail here on AE350 platforms with this check.
>>
>> Thanks,
>> Rick
>>
>>>
>>> -       smp_function = (void (*)(ulong, ulong, ulong))gd->arch.ipi[hart].addr;
>>>         invalidate_icache_all();
>>>
>>>         /*
>>> --
>>> 2.28.0
>>>