[PATCH] sandbox: Support signal handling only when requested

Heinrich Schuchardt xypron.glpk at gmx.de
Sun Jun 6 23:35:35 CEST 2021


Am 6. Juni 2021 20:07:31 MESZ schrieb Sean Anderson <seanga2 at gmail.com>:
>On 6/6/21 1:57 PM, Heinrich Schuchardt wrote:
>> On 6/6/21 7:52 PM, Sean Anderson wrote:
>>> On 6/6/21 1:28 PM, Heinrich Schuchardt wrote:
>>>> On 6/6/21 6:44 PM, Simon Glass wrote:
>>>>> Hi Heinrich,
>>>>>
>>>>> On Mon, 22 Mar 2021 at 18:56, Simon Glass <sjg at chromium.org>
>wrote:
>>>>>>
>>>>>> Hi Heinrich,
>>>>>>
>>>>>> On Mon, 22 Mar 2021 at 23:02, Heinrich Schuchardt
>>>>>> <xypron.glpk at gmx.de> wrote:
>>>>>>>
>>>>>>> On 22.03.21 06:21, Simon Glass wrote:
>>>>>>>> At present if sandbox crashes it prints a message and tries to
>>>>>>>> exit. But
>>>>>>>> with the recently introduced signal handler, it often seems to
>get
>>>>>>>> stuck
>>>>>>>> in a loop until the stack overflows:
>>>>>>>>
>>>>>>>> Segmentation violation
>>>>>>>>
>>>>>>>> Segmentation violation
>>>>>>>>
>>>>>>>> Segmentation violation
>>>>>>>>
>>>>>>>> Segmentation violation
>>>>>>>>
>>>>>>>> Segmentation violation
>>>>>>>>
>>>>>>>> Segmentation violation
>>>>>>>>
>>>>>>>> Segmentation violation
>>>>>>>> ...
>>>>>>>
>>>>>>> Hello Simon,
>>>>>>>
>>>>>>> do you have a reproducible example? I never have seen this.
>>>>>>
>>>>>> https://source.denx.de/u-boot/custodians/u-boot-dm/-/jobs/242433
>>>>>>
>>>>>> You need to run that commit with pytest though...it does not
>happen
>>>>>> when run directly.
>>>>>>
>>>>>> BTW this sems to expose some rather nasty bug in dlmalloc or how
>it is
>>>>>> used. I notice that as soon as the first test is run, the 'top'
>value
>>>>>> in dlmalloc is outside the range of the malloc pool, which seems
>>>>>> wrong. I wonder if there is something broken with how
>>>>>> dm_test_pre_run() and dm_test_post_run() work.
>>>>>>
>>>>>>>
>>>>>>> Corrupting gd could cause an endless recursive loop, as these
>lines
>>>>>>> follow printing the observed string:
>>>>>>>
>>>>>>>          printf("pc = 0x%lx, ", pc);
>>>>>>>          printf("pc_reloc =0x%lx\n\n", pc - gd->reloc_off);
>>>>>>
>>>>>> Yes I suspect printf() is dead.
>>>>>>
>>>>>>>
>>>>>>> If we remove SA_NODEFER from the signal mask in
>arch/sandbox/cpu/os.c,
>>>>>>> recursion cannot occur anymore. If a segmentation violation
>occurs
>>>>>>> inside the handler it will be delegated to the default handler.
>>>>>>>
>>>>>>> Furthermore we could consider removing the signal handler at the
>start
>>>>>>> of os_signal_action().
>>>>>>
>>>>>> The issue is that if you get a segfault you really don't know if
>you
>>>>>> can continue and do anything else.
>>>>>>
>>>>>> What is the goal with the signal handler? I don't think the user
>can
>>>>>> do anything about it.
>>>>
>>>> Hello Simon,
>>>>
>>>> the signal handler prints out the crash location and this makes
>>>> analyzing problems much easier. It proved valuable to me several
>times.
>>>
>>> Can't you just rerun with gdb?
>> 
>> This would require that the problem is easily reproducible which may
>not
>> be the case.
>
>Hm, perhaps you could keep track of how many times we've tried to catch
>a signal, and bail if this is the second time around. E.g. something
>like
>

Removing SA_NODEFER from the signal mask will let the OS kick in if an exception occurs in a signal handler.

No counter is needed.

Best regards

Heinrich

>static void os_signal_handler(int sig, siginfo_t *info, void *con)
>{
>	/* other variables */
>	static int level = 0;
>
>	switch (level++) {
>	case 0:
>		break;
>	case 1:
>		sandbox_exit();
>	default:
>		os_exit(0);
>	}
>
>	/* rest of the handler */
>}
>
>--Sean
>
>> 
>> Best regards
>> 
>> Heinrich
>> 
>>>
>>>>
>>>>>
>>>>> I keep hitting this problem during development with sandbox, so I
>>>>> think I need to apply this patch.
>>>>>
>>>>> Does anything need to be updated in the tests?
>>>>>
>>>>> Regards,
>>>>> Simon
>>>>>
>>>>
>>>> Did you try removing SA_NODEFER as proposed?
>>>>
>>>> Best regards
>>>>
>>>> Heinrich
>>>
>>>
>> 



More information about the U-Boot mailing list