[PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

Tue Aug 11 12:32:20 CEST 2020

On 8/11/20 3:50 AM, Heinrich Schuchardt wrote:
> On 11.08.20 08:20, Rick Chen wrote:
>> Hi Heinrich
>>
>>> From: Heinrich Schuchardt [mailto:xypron.glpk at gmx.de]
>>> Sent: Tuesday, August 11, 2020 11:57 AM
>>> To: Rick Jian-Zhi Chen(陳建志)
>>> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel Schwierzeck; u-boot at lists.denx.de; Heinrich Schuchardt
>>> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
>>>
>>> At least on the Kendryte K210:
>>>
>>> gd->arch.available_harts= 0x0000000000000003 arch.ipi[0].addr=
>>> gd->0x0000000000000000 arch.ipi[0].arg0= 0x0000000000000000
>>> gd->arch.ipi[0].arg1= 0x0000000000000000 arch.ipi[1].addr=
>>> gd->0x0000000000000000 arch.ipi[1].arg0= 0x0000000000000000
>>> gd->arch.ipi[1].arg1= 0x0000000000000000
>>>
>>> We should not jump to 0x0 to handle an interrupt.
>>
>> Can you explain why K210 be affected by it ?
> 
> I have been running U-Boot on the MaixDuino.
> 
> Without this patch secondary_hart_loop() is reached only once. With the
> patch it is reached several thousand times.

Hm, interesting. To me, this is a symptom of something else going
terribly wrong. I originally had this check in place so that it would
be easier to detect these sorts of errors. I don't think this is the
correct fix, however. We should really try and find the root cause of
the bug.

> I would not expect NULL to contain any code that should be executed by
> the secondary hart. See doc/board/sipeed/maix.rst:
> 
> Address    Size      Description
> ========== ========= ===========
> 0x00000000 0x1000    debug
> 0x00001000 0x1000    rom
> 0x02000000 0xC000    clint
> 
>>
>>>
>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk at gmx.de>
>>> ---
>>>  arch/riscv/lib/smp.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index ac22136314..d725fa32e8 100644
>>> --- a/arch/riscv/lib/smp.c
>>> +++ b/arch/riscv/lib/smp.c
>>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
>>>                 return;
>>>         }
>>>
>>> +       if (!smp_function)
>>> +               return;
>>>         smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);  }
>>
>> I remember Sean add this check in
>> [v10,14/21] riscv: Clean up IPI initialization code
>> . And I ask him to remove.
>> https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-seanga2@gmail.com/
> 
> Your comment was: "Please remove the sanity check. If zero address is
> the intended jump point, then system will work abnormal."
> 
> The only place where gd->arch.ipi[hart].addr is set is:
> 
> arch/riscv/lib/smp.c:53 send_ipi_many():
> gd->arch.ipi[reg].addr = ipi->addr;
> 
> send_ipi_many() is only called in smp_call_function().
> 
> So the line
> 
> smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);
> 
> can only work if smp_function() has been called before this point at any
> time. The start code only calls it in spl_secondary_hart_stack_gd_setup().

Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi
in [2] as part of a larger revert. The actual revert-of-revert is in
[3], though it really should be split out into its own patch. The
original commit making these changes is [4].

Note that the situation before [4] was that the IPI got initialized by
the first hart to call any IPI function. If that hart was not the boot
hart, Bad Things started to happen (e.g. race conditions, memory
corruption, etc). In that patch, I moved the initialization to its own
function so we would not have any race conditions and instead have
(easier-to-debug imo) calls to handle_ipi with bogus arguments. Of
course, when everything is working properly, we should get neither of
these.

[1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047
[2] https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng.cn@gmail.com/
[3] https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-seanga2@gmail.com/
[4] https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-seanga2@gmail.com/

> Why do we call handle_ipi() on the secondary hart at all?

Presumably to handle the IPI it got sent? Sorry, I'm confused as to what
you're getting at.

> Where do you expect it to jump if U-Boot is the first binary loaded?
> 
> Even if spl_secondary_hart_stack_gd_setup() has initialized the jump
> address to secondary_hart_relocate(), do we want the secondary hart to
> execute it again and again?

--Sean