[PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

Tue Aug 18 11:00:03 CEST 2020

On 11.08.20 12:32, Sean Anderson wrote:
> On 8/11/20 3:50 AM, Heinrich Schuchardt wrote:
>> On 11.08.20 08:20, Rick Chen wrote:
>>> Hi Heinrich
>>>
>>>> From: Heinrich Schuchardt [mailto:xypron.glpk at gmx.de]
>>>> Sent: Tuesday, August 11, 2020 11:57 AM
>>>> To: Rick Jian-Zhi Chen(陳建志)
>>>> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel Schwierzeck; u-boot at lists.denx.de; Heinrich Schuchardt
>>>> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
>>>>
>>>> At least on the Kendryte K210:
>>>>
>>>> gd->arch.available_harts= 0x0000000000000003 arch.ipi[0].addr=
>>>> gd->0x0000000000000000 arch.ipi[0].arg0= 0x0000000000000000
>>>> gd->arch.ipi[0].arg1= 0x0000000000000000 arch.ipi[1].addr=
>>>> gd->0x0000000000000000 arch.ipi[1].arg0= 0x0000000000000000
>>>> gd->arch.ipi[1].arg1= 0x0000000000000000
>>>>
>>>> We should not jump to 0x0 to handle an interrupt.
>>>
>>> Can you explain why K210 be affected by it ?
>>
>> I have been running U-Boot on the MaixDuino.
>>
>> Without this patch secondary_hart_loop() is reached only once. With the
>> patch it is reached several thousand times.
>
> Hm, interesting. To me, this is a symptom of something else going
> terribly wrong. I originally had this check in place so that it would
> be easier to detect these sorts of errors. I don't think this is the
> correct fix, however. We should really try and find the root cause of
> the bug.
>
>> I would not expect NULL to contain any code that should be executed by
>> the secondary hart. See doc/board/sipeed/maix.rst:
>>
>> Address    Size      Description
>> ========== ========= ===========
>> 0x00000000 0x1000    debug
>> 0x00001000 0x1000    rom
>> 0x02000000 0xC000    clint
>>
>>>
>>>>
>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk at gmx.de>
>>>> ---
>>>>  arch/riscv/lib/smp.c | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index ac22136314..d725fa32e8 100644
>>>> --- a/arch/riscv/lib/smp.c
>>>> +++ b/arch/riscv/lib/smp.c
>>>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
>>>>                 return;
>>>>         }
>>>>
>>>> +       if (!smp_function)
>>>> +               return;
>>>>         smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);  }
>>>
>>> I remember Sean add this check in
>>> [v10,14/21] riscv: Clean up IPI initialization code
>>> . And I ask him to remove.
>>> https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-seanga2@gmail.com/
>>
>> Your comment was: "Please remove the sanity check. If zero address is
>> the intended jump point, then system will work abnormal."
>>
>> The only place where gd->arch.ipi[hart].addr is set is:
>>
>> arch/riscv/lib/smp.c:53 send_ipi_many():
>> gd->arch.ipi[reg].addr = ipi->addr;
>>
>> send_ipi_many() is only called in smp_call_function().
>>
>> So the line
>>
>> smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);
>>
>> can only work if smp_function() has been called before this point at any
>> time. The start code only calls it in spl_secondary_hart_stack_gd_setup().
>
> Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi
> in [2] as part of a larger revert. The actual revert-of-revert is in
> [3], though it really should be split out into its own patch. The
> original commit making these changes is [4].

Patch series [1] makes not difference. You still have

gd->arch.ipi[0].addr= 0x0000000000000000
gd->arch.ipi[1].addr= 0x0000000000000000

and the secondary hart jumps to NULL and never returns.

>
> Note that the situation before [4] was that the IPI got initialized by
> the first hart to call any IPI function. If that hart was not the boot
> hart, Bad Things started to happen (e.g. race conditions, memory
> corruption, etc). In that patch, I moved the initialization to its own
> function so we would not have any race conditions and instead have
> (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of
> course, when everything is working properly, we should get neither of
> these.
>
> [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047
> [2] https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng.cn@gmail.com/
> [3] https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-seanga2@gmail.com/
> [4] https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-seanga2@gmail.com/
>
>> Why do we call handle_ipi() on the secondary hart at all?
>
> Presumably to handle the IPI it got sent? Sorry, I'm confused as to what
> you're getting at.
>

I cannot see anything to be done by a secondary hart in case of a
software interrupt.

Best regards

Heinrich

>> Where do you expect it to jump if U-Boot is the first binary loaded?
>>
>> Even if spl_secondary_hart_stack_gd_setup() has initialized the jump
>> address to secondary_hart_relocate(), do we want the secondary hart to
>> execute it again and again?
>
> --Sean
>