[PATCH] sandbox: Support signal handling only when requested

Sean Anderson seanga2 at gmail.com
Sun Jun 6 20:07:31 CEST 2021


On 6/6/21 1:57 PM, Heinrich Schuchardt wrote:
> On 6/6/21 7:52 PM, Sean Anderson wrote:
>> On 6/6/21 1:28 PM, Heinrich Schuchardt wrote:
>>> On 6/6/21 6:44 PM, Simon Glass wrote:
>>>> Hi Heinrich,
>>>>
>>>> On Mon, 22 Mar 2021 at 18:56, Simon Glass <sjg at chromium.org> wrote:
>>>>>
>>>>> Hi Heinrich,
>>>>>
>>>>> On Mon, 22 Mar 2021 at 23:02, Heinrich Schuchardt
>>>>> <xypron.glpk at gmx.de> wrote:
>>>>>>
>>>>>> On 22.03.21 06:21, Simon Glass wrote:
>>>>>>> At present if sandbox crashes it prints a message and tries to
>>>>>>> exit. But
>>>>>>> with the recently introduced signal handler, it often seems to get
>>>>>>> stuck
>>>>>>> in a loop until the stack overflows:
>>>>>>>
>>>>>>> Segmentation violation
>>>>>>>
>>>>>>> Segmentation violation
>>>>>>>
>>>>>>> Segmentation violation
>>>>>>>
>>>>>>> Segmentation violation
>>>>>>>
>>>>>>> Segmentation violation
>>>>>>>
>>>>>>> Segmentation violation
>>>>>>>
>>>>>>> Segmentation violation
>>>>>>> ...
>>>>>>
>>>>>> Hello Simon,
>>>>>>
>>>>>> do you have a reproducible example? I never have seen this.
>>>>>
>>>>> https://source.denx.de/u-boot/custodians/u-boot-dm/-/jobs/242433
>>>>>
>>>>> You need to run that commit with pytest though...it does not happen
>>>>> when run directly.
>>>>>
>>>>> BTW this sems to expose some rather nasty bug in dlmalloc or how it is
>>>>> used. I notice that as soon as the first test is run, the 'top' value
>>>>> in dlmalloc is outside the range of the malloc pool, which seems
>>>>> wrong. I wonder if there is something broken with how
>>>>> dm_test_pre_run() and dm_test_post_run() work.
>>>>>
>>>>>>
>>>>>> Corrupting gd could cause an endless recursive loop, as these lines
>>>>>> follow printing the observed string:
>>>>>>
>>>>>>          printf("pc = 0x%lx, ", pc);
>>>>>>          printf("pc_reloc =0x%lx\n\n", pc - gd->reloc_off);
>>>>>
>>>>> Yes I suspect printf() is dead.
>>>>>
>>>>>>
>>>>>> If we remove SA_NODEFER from the signal mask in arch/sandbox/cpu/os.c,
>>>>>> recursion cannot occur anymore. If a segmentation violation occurs
>>>>>> inside the handler it will be delegated to the default handler.
>>>>>>
>>>>>> Furthermore we could consider removing the signal handler at the start
>>>>>> of os_signal_action().
>>>>>
>>>>> The issue is that if you get a segfault you really don't know if you
>>>>> can continue and do anything else.
>>>>>
>>>>> What is the goal with the signal handler? I don't think the user can
>>>>> do anything about it.
>>>
>>> Hello Simon,
>>>
>>> the signal handler prints out the crash location and this makes
>>> analyzing problems much easier. It proved valuable to me several times.
>>
>> Can't you just rerun with gdb?
> 
> This would require that the problem is easily reproducible which may not
> be the case.

Hm, perhaps you could keep track of how many times we've tried to catch a signal, and bail if this is the second time around. E.g. something like

static void os_signal_handler(int sig, siginfo_t *info, void *con)
{
	/* other variables */
	static int level = 0;

	switch (level++) {
	case 0:
		break;
	case 1:
		sandbox_exit();
	default:
		os_exit(0);
	}

	/* rest of the handler */
}

--Sean

> 
> Best regards
> 
> Heinrich
> 
>>
>>>
>>>>
>>>> I keep hitting this problem during development with sandbox, so I
>>>> think I need to apply this patch.
>>>>
>>>> Does anything need to be updated in the tests?
>>>>
>>>> Regards,
>>>> Simon
>>>>
>>>
>>> Did you try removing SA_NODEFER as proposed?
>>>
>>> Best regards
>>>
>>> Heinrich
>>
>>
> 



More information about the U-Boot mailing list