[PATCH v5 27/33] riscv: Fix race conditions when initializing IPI

Sean Anderson seanga2 at gmail.com
Tue Mar 3 22:57:59 CET 2020


On 3/3/20 4:53 PM, Lukas Auer wrote:
> On Mon, 2020-03-02 at 18:43 -0500, Sean Anderson wrote:
>> On 3/2/20 6:17 PM, Lukas Auer wrote:
>>> Don't move this. It is intended to be run before the IPI is cleared.
>>
>> Hm, ok. I think I moved it to after because of the 'if (!smp_function)'
>> check, but those two don't really need to be done together.
>>
> 
> Thanks! We had problems with code corruption in some situations,
> because some secondary harts entered OpenSBI after the main hart while
> OpenSBI expected all harts to be running OpenSBI by that time. Moving
> this code block was part of the fix for this situation, see [1].
> 
> [1]: 
> https://gitlab.denx.de/u-boot/u-boot/commit/90ae28143700bae4edd23930a7772899ad259058

Ah, this makes a lot more sense why it was located where it was.

>>>>  	/*
>>>>  	 * Clear the IPI to acknowledge the request before jumping to the
>>>>  	 * requested function.
>>>>  	 */
>>>>  	ret = riscv_clear_ipi(hart);
>>>>  	if (ret) {
>>>> -		pr_err("Cannot clear IPI of hart %ld\n", hart);
>>>> +		pr_err("Cannot clear IPI of hart %ld (error %d)\n", hart, ret);
>>>>  		return;
>>>>  	}
>>>>  
>>>> +	__smp_mb();
>>>> +
>>>> +	smp_function = (void (*)(ulong, ulong, ulong))gd->arch.ipi[hart].addr;
>>>> +	/*
>>>> +	 * There may be an IPI raised before u-boot begins execution, so check
>>>> +	 * to ensure we actually have a function to call.
>>>> +	 */
>>>> +	if (!smp_function)
>>>> +		return;
>>>> +	log_debug("hart = %lu func = %p\n", hart, smp_function);
>>>
>>> The log messages might be corrupted if multiple harts are calling the
>>> log function here. I have not looked into the details so this might not
>>> be an issue. In that case it is fine to keep, otherwise please remove
>>> it.
>>
>> I ran into this problem a lot when debugging. I ended up implementing a
>> spinlock around puts/putc. I agree it's probably better to remove this,
>> but I worry that concurrency bugs will become much harder to track down
>> without some kind of feedback. (This same criticism applies to the log
>> message above as well).
>>
> 
> Especially with your changes, I hope we already have or will soon reach
> a code robustness level where we won't have too many concurrency bugs
> in the future. :)
> Let's remove it for now until the logging backend can handle this
> cleanly.

Ok. Should the error message above ("Cannot clear IPI of hart...") also
be removed? I found it tended to corrupt the log output if it was ever
triggered.

--Sean


More information about the U-Boot mailing list