[PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

Sean Anderson seanga2 at gmail.com
Tue Sep 1 16:38:04 CEST 2020


On 9/1/20 7:23 AM, Heinrich Schuchardt wrote:
> On 01.09.20 13:14, Sean Anderson wrote:
>> On 9/1/20 6:51 AM, Heinrich Schuchardt wrote:
>>> When resetting the sipeed_maix_bitm_defconfig without the patch I see a
>>> crash:
>>>
>>> => reset
>>> resetting ...
>>> Unhandled exception: Illegal instruction
>>> EPC: 0000000000000000 RA: 000000008000021a TVAL: 00000002d947c509
>>> ### ERROR ### Please RESET the board ###
>>
>> Hm, interesting. I don't see that error. What commit are you testing with?
> 
> origin/master 23e333a5c083a000d0cabc
> 
> sipeed_maix_bitm_defconfig plus
> 
>     CONFIG_DEBUG_UART=y
>     CONFIG_DEBUG_UART_SIFIVE=y
>     CONFIG_DEBUG_UART_BASE=0x38000000
>     CONFIG_DEBUG_UART_CLOCK=390000000
> 
> on a Maixduino.

Ok, I can reproduce that crash, but only with the debug uart enabled. I
suspect it happens every time, but we only get an error with the debug
uart. I enabled CONFIG_SHOW_REGS, but there was some interference
between the output and the (next) instance of U-Boot. 

> 
> Best regards
> 
> Heinrich
>>
>>>
>>> Objdump shows that it is related to the secondary_hart_loop:
>>>
>>>         j       secondary_hart_loop
>>>     8000021a:   b7ed                    j       80000204
>>> <secondary_hart_loop>

It is actually related to the previous instruction

80000216:   56e000ef                jal     ra,80000784 <handle_ipi>

secondary_hart_loop is never reached because handle_ipi jumps to 0x0.

I decided to do a bit more experimentation with the following patch

diff --git i/arch/riscv/lib/smp.c w/arch/riscv/lib/smp.c
index ac22136314..8be320f6ae 100644
--- i/arch/riscv/lib/smp.c
+++ w/arch/riscv/lib/smp.c
@@ -54,6 +54,10 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
                gd->arch.ipi[reg].arg0 = ipi->arg0;
                gd->arch.ipi[reg].arg1 = ipi->arg1;
 
+               printf("sending %lx(%lx, %lx) to hart %u\n", ipi->addr,
+                      ipi->arg0, ipi->arg1, reg);
+
+               __smp_mb();
                ret = riscv_send_ipi(reg);
                if (ret) {
                        pr_err("Cannot send IPI to hart %d\n", reg);
@@ -77,15 +81,23 @@ void handle_ipi(ulong hart)
 {
        int ret;
        void (*smp_function)(ulong hart, ulong arg0, ulong arg1);
+       int ipi;
 
        if (hart >= CONFIG_NR_CPUS)
                return;
+       riscv_get_ipi(hart, &ipi);
 
        __smp_mb();
 
        smp_function = (void (*)(ulong, ulong, ulong))gd->arch.ipi[hart].addr;
        invalidate_icache_all();
 
+       printf("calling %p(%lx, %lx) on hart %lu, ipi = %d\n", smp_function,
+              gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1, hart, ipi);
+
+       if (!ipi)
+               return;
+
        /*
         * Clear the IPI to acknowledge the request before jumping to the
         * requested function.

With this patch I get the following output

=> reset
resetting ...
calling 0000000000000000(0, 0) on hart 1, ipi = 0
calling 0000000000000000(0, 0) on hart 1, ipi = 0
calling 0000000000000000(0, 0) on hart 1, ipi = 0
calling 0000000000000000(0, 0) on hart 1, ipi = 0
calling 0000000000000000(0, 0) on hart 1, ipi = 1


U-Boot 2020.10-rc3-00033-g23e333a5c0-dirty (Sep 01 2020 - 09:55:43 -0400)

DRAM:  8 MiB
sending 805cc1fa(80589040, 8058cdd0) to hart 1
In:    serial at 38000000
Out:   serial at 38000000
Err:   serial at 38000000
=>

There is no "calling" line after the "sending" line because hart 1 is
hung from jumping to 0x0.

>From this log, it is clear that something other than smp_call_function
is sending an ipi, and, further, that there is a delay between when the
IRQ_M_SOFT bit is asserted and when the clint has a pending interrupt.

To further investigate this, I enabled debug output. This causes some
garbling in the console, as both harts try and write at the same time.
To alleviate this, I applied the following patch which adds locking to
puts and putc (NB this only works on K210)


diff --git i/common/console.c w/common/console.c
index 8e50af1f9d..31622df9b3 100644
--- i/common/console.c
+++ w/common/console.c
@@ -515,7 +515,35 @@ static inline void pre_console_puts(const char *s) {}
 static inline void print_pre_console_buffer(int flushpoint) {}
 #endif
 
-void putc(const char c)
+/* Static location which will survive across resets */
+u32 *console_spinlock = (void *)0x80400000;
+
+void console_lock(void)
+{
+       u32 tmp = 1;
+
+       /*
+        * We only ever write 1 to the lock, so if anything else is there then
+        * this is the first access to the spinlock, and we have thus acquired
+        * it. This will not work if the spinlock is ever (un)initialized to
+        * exactly 1, but I'll take my chances :)
+        */
+       while (tmp == 1)
+               asm volatile ("amoswap.w.aq %0, %0, %1"
+                             : "+r" (tmp), "+A" (console_spinlock)
+                             :
+                             : "memory");
+}
+
+void console_unlock(void)
+{
+       asm volatile ("amoswap.w.rl x0, x0, %0"
+                     : "+A" (console_spinlock)
+                     :
+                     : "memory");
+}
+
+static void _putc(const char c)
 {
 #ifdef CONFIG_SANDBOX
        /* sandbox can send characters to stdout before it has a console */
@@ -563,7 +591,14 @@ void putc(const char c)
        }
 }
 
-void puts(const char *s)
+void putc(const char c)
+{
+       console_lock();
+       _putc(c);
+       console_unlock();
+}
+
+static void _puts(const char *s)
 {
 #ifdef CONFIG_SANDBOX
        /* sandbox can send characters to stdout before it has a console */
@@ -614,6 +649,13 @@ void puts(const char *s)
        }
 }
 
+void puts(const char *s)
+{
+       console_lock();
+       _puts(s);
+       console_unlock();
+}
+
 #ifdef CONFIG_CONSOLE_RECORD
 int console_record_init(void)
 {

With this patch applied the output is [1] (and also far too long for
this email). Note that hart 1 first recieves a software interrupt on
line 46, which is also the first output after reset. The next line
(initcall: 0000000080005cc0) is from initcall_run_list calling
initf_bootstage, the first initcall after log_init. This indicates to me
that the pending interrupt eithe was triggered by start.S, or was never
cleared. Perhaps line 69 (csrc    MODE_PREFIX(ip), t0) is a no-op.

The clint only indicates an interrupt is present on line 723. Line 722
(initcall: 000000008001ce3a) indicates a call to timer_init. This is the
initcall immediately following arch_cpu_init_dm. Although this function
initializes the IPI, afact it does not make any writes to it. This does
suggest that some kind of effect does happen.

--Sean

[1] https://gist.github.com/Forty-Bot/e010480c34f31f0ecc150bbacd552ede


More information about the U-Boot mailing list