[PATCH] sandbox: Support signal handling only when requested
Heinrich Schuchardt
xypron.glpk at gmx.de
Sun Jun 6 23:35:35 CEST 2021
Am 6. Juni 2021 20:07:31 MESZ schrieb Sean Anderson <seanga2 at gmail.com>:
>On 6/6/21 1:57 PM, Heinrich Schuchardt wrote:
>> On 6/6/21 7:52 PM, Sean Anderson wrote:
>>> On 6/6/21 1:28 PM, Heinrich Schuchardt wrote:
>>>> On 6/6/21 6:44 PM, Simon Glass wrote:
>>>>> Hi Heinrich,
>>>>>
>>>>> On Mon, 22 Mar 2021 at 18:56, Simon Glass <sjg at chromium.org>
>wrote:
>>>>>>
>>>>>> Hi Heinrich,
>>>>>>
>>>>>> On Mon, 22 Mar 2021 at 23:02, Heinrich Schuchardt
>>>>>> <xypron.glpk at gmx.de> wrote:
>>>>>>>
>>>>>>> On 22.03.21 06:21, Simon Glass wrote:
>>>>>>>> At present if sandbox crashes it prints a message and tries to
>>>>>>>> exit. But
>>>>>>>> with the recently introduced signal handler, it often seems to
>get
>>>>>>>> stuck
>>>>>>>> in a loop until the stack overflows:
>>>>>>>>
>>>>>>>> Segmentation violation
>>>>>>>>
>>>>>>>> Segmentation violation
>>>>>>>>
>>>>>>>> Segmentation violation
>>>>>>>>
>>>>>>>> Segmentation violation
>>>>>>>>
>>>>>>>> Segmentation violation
>>>>>>>>
>>>>>>>> Segmentation violation
>>>>>>>>
>>>>>>>> Segmentation violation
>>>>>>>> ...
>>>>>>>
>>>>>>> Hello Simon,
>>>>>>>
>>>>>>> do you have a reproducible example? I never have seen this.
>>>>>>
>>>>>> https://source.denx.de/u-boot/custodians/u-boot-dm/-/jobs/242433
>>>>>>
>>>>>> You need to run that commit with pytest though...it does not
>happen
>>>>>> when run directly.
>>>>>>
>>>>>> BTW this sems to expose some rather nasty bug in dlmalloc or how
>it is
>>>>>> used. I notice that as soon as the first test is run, the 'top'
>value
>>>>>> in dlmalloc is outside the range of the malloc pool, which seems
>>>>>> wrong. I wonder if there is something broken with how
>>>>>> dm_test_pre_run() and dm_test_post_run() work.
>>>>>>
>>>>>>>
>>>>>>> Corrupting gd could cause an endless recursive loop, as these
>lines
>>>>>>> follow printing the observed string:
>>>>>>>
>>>>>>> printf("pc = 0x%lx, ", pc);
>>>>>>> printf("pc_reloc =0x%lx\n\n", pc - gd->reloc_off);
>>>>>>
>>>>>> Yes I suspect printf() is dead.
>>>>>>
>>>>>>>
>>>>>>> If we remove SA_NODEFER from the signal mask in
>arch/sandbox/cpu/os.c,
>>>>>>> recursion cannot occur anymore. If a segmentation violation
>occurs
>>>>>>> inside the handler it will be delegated to the default handler.
>>>>>>>
>>>>>>> Furthermore we could consider removing the signal handler at the
>start
>>>>>>> of os_signal_action().
>>>>>>
>>>>>> The issue is that if you get a segfault you really don't know if
>you
>>>>>> can continue and do anything else.
>>>>>>
>>>>>> What is the goal with the signal handler? I don't think the user
>can
>>>>>> do anything about it.
>>>>
>>>> Hello Simon,
>>>>
>>>> the signal handler prints out the crash location and this makes
>>>> analyzing problems much easier. It proved valuable to me several
>times.
>>>
>>> Can't you just rerun with gdb?
>>
>> This would require that the problem is easily reproducible which may
>not
>> be the case.
>
>Hm, perhaps you could keep track of how many times we've tried to catch
>a signal, and bail if this is the second time around. E.g. something
>like
>
Removing SA_NODEFER from the signal mask will let the OS kick in if an exception occurs in a signal handler.
No counter is needed.
Best regards
Heinrich
>static void os_signal_handler(int sig, siginfo_t *info, void *con)
>{
> /* other variables */
> static int level = 0;
>
> switch (level++) {
> case 0:
> break;
> case 1:
> sandbox_exit();
> default:
> os_exit(0);
> }
>
> /* rest of the handler */
>}
>
>--Sean
>
>>
>> Best regards
>>
>> Heinrich
>>
>>>
>>>>
>>>>>
>>>>> I keep hitting this problem during development with sandbox, so I
>>>>> think I need to apply this patch.
>>>>>
>>>>> Does anything need to be updated in the tests?
>>>>>
>>>>> Regards,
>>>>> Simon
>>>>>
>>>>
>>>> Did you try removing SA_NODEFER as proposed?
>>>>
>>>> Best regards
>>>>
>>>> Heinrich
>>>
>>>
>>
More information about the U-Boot
mailing list