[PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function

Wed Sep 9 12:36:33 CEST 2020

On 9/9/20 6:26 AM, Heinrich Schuchardt wrote:
> On 9/9/20 12:16 PM, Sean Anderson wrote:
>> On 9/9/20 5:01 AM, Rick Chen wrote:
>>> Hi Sean
>>>
>>>> Hi Sean
>>>>
>>>>> Some IPIs may already be pending when U-Boot is started. This could be a
>>>>> problem if a secondary hart tries to handle an IPI before the boot hart has
>>>>> initialized the IPI device.
>>>>>
>>>>> This commit uses NULL as a sentinel for secondary harts so they know when
>>>>> the IPI is initialized, and it is safe to use the IPI API. The smp addr
>>>>> parameter is initialized to NULL by board_init_f_init_reserve. Before this,
>>>>> secondary harts wait in wait_for_gd_init.
>>>>>
>>>>> This imposes a minor restriction because harts may no longer jump to NULL.
>>>>> However, given that the RISC-V debug device is likely to be located at
>>>>> 0x400, it is unlikely for any RISC-V implementation to have usable ram
>>>>> located at 0x0.
>>>>
>>>> The ram location of AE350 is at 0x0.
>>
>> Huh. Does it not have a debug device?
> 
> Wouldn't it make sense to use an odd value (e.g. (void *)-1UL) instead
> NULL as sentinal value? RISC-V code addresses are always even.

Well, the advantage of NULL is that it gets set when gd gets
initialized, so we don't have to do anything else. On the K210, we are
effectively already using it as a sentinel value for this reason (except
then we jump to it and crash).

To implement your suggestion, we would have to add something like

void riscv_early_ipi_init(void)
{
	int i;

	for (i = 0; i < CONFIG_NR_CPUS; i++)
		gd->arch.ipi[i].addr = -1;
}

and call that after board_init_f_init_reserve but before releasing
available_harts_lock.

--Sean

> 
> Best regards
> 
> Heinrich
> 
>>
>>>>
>>>>>
>>>>> Signed-off-by: Sean Anderson <seanga2 at gmail.com>
>>>>> ---
>>>>>
>>>>>  arch/riscv/lib/smp.c | 26 ++++++++++++++++++++++----
>>>>>  1 file changed, 22 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c
>>>>> index ab6d8bd7fa..8c25755330 100644
>>>>> --- a/arch/riscv/lib/smp.c
>>>>> +++ b/arch/riscv/lib/smp.c
>>>>> @@ -18,6 +18,12 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
>>>>>         u32 reg;
>>>>>         int ret, pending;
>>>>>
>>>>> +       /* NULL is used as a sentinel value */
>>>>> +       if (!ipi->addr) {
>>>>> +               pr_err("smp_function cannot be set to 0x0\n");
>>>>> +               return -EINVAL;
>>>>> +       }
>>>>> +
>>>>
>>>> This conflict with memory configurations of AE350.
>>>> Please check about doc\board\AndesTech\ax25-ae350.rst, and you can
>>>> find BBL is configured as zero address on AE350 platform.
>>
>> Ok, that is a strange choice because any accidental NULL-pointer
>> dereference turns into code modification. In the next revision, I will
>> add an arch.ipi[reg].valid variable for the same prupose, instead of
>> re-using addr.
>>
>>>>>         cpus = ofnode_path("/cpus");
>>>>>         if (!ofnode_valid(cpus)) {
>>>>>                 pr_err("Can't find cpus node!\n");
>>>>> @@ -50,11 +56,16 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
>>>>>                         continue;
>>>>>  #endif
>>>>>
>>>>> -               gd->arch.ipi[reg].addr = ipi->addr;
>>>>>                 gd->arch.ipi[reg].arg0 = ipi->arg0;
>>>>>                 gd->arch.ipi[reg].arg1 = ipi->arg1;
>>>>>
>>>>> -               __smp_mb();
>>>
>>> Why do you add this in [PATCH 2/7] but remove it in  [PATCH 3/7] ?
>>
>> Because conceptually, patch 2 is independent of this patch. It is still
>> a bug even if this patch is not applied. I think by making this change
>> over two patches, it is more obvious why the barrier was added, and then
>> weakened, as opposed to if I made the change in one patch.
>>
>>>
>>> Thanks,
>>> Rick
>>>
>>>>> +               /*
>>>>> +                * Ensure addr only becomes non-NULL when arg0 and arg1 are
>>>>> +                * valid. An IPI may already be pending on other harts, so we
>>>>> +                * need a way to signal that the IPI device has been
>>>>> +                * initialized, and that it is ok to call the function.
>>>>> +                */
>>>>> +               __smp_store_release(&gd->arch.ipi[reg].addr, ipi->addr);
>>>>
>>>> It is too tricky and hack by using zero address to be a signal for the
>>>> other pending harts waiting the IPI device been initialized.
>>>>
>>>>>
>>>>>                 ret = riscv_send_ipi(reg);
>>>>>                 if (ret) {
>>>>> @@ -83,9 +94,16 @@ void handle_ipi(ulong hart)
>>>>>         if (hart >= CONFIG_NR_CPUS)
>>>>>                 return;
>>>>>
>>>>> -       __smp_mb();
>>>>> +       smp_function = (void (*)(ulong, ulong, ulong))
>>>>> +                       __smp_load_acquire(&gd->arch.ipi[hart].addr);
>>>>> +       /*
>>>>> +        * If the function is NULL, then U-Boot has not requested the IPI. The
>>>>> +        * IPI device may not be initialized, so all we can do is wait for
>>>>> +        * U-Boot to initialize it and send an IPI
>>>>> +        */
>>>>> +       if (!smp_function)
>>>>> +               return;
>>>>
>>>> It will boot BBL+Kernel payload fail here on AE350 platforms with this check.
>>>>
>>>> Thanks,
>>>> Rick
>>>>
>>>>>
>>>>> -       smp_function = (void (*)(ulong, ulong, ulong))gd->arch.ipi[hart].addr;
>>>>>         invalidate_icache_all();
>>>>>
>>>>>         /*
>>>>> --
>>>>> 2.28.0
>>>>>
>>
>