[PATCH] sandbox: Support signal handling only when requested

Heinrich Schuchardt xypron.glpk at gmx.de
Mon Jul 5 19:30:54 CEST 2021


On 7/4/21 10:15 PM, Simon Glass wrote:
> Hi Heinrich,
>
> On Sun, 6 Jun 2021 at 15:35, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
>>
>> Am 6. Juni 2021 20:07:31 MESZ schrieb Sean Anderson <seanga2 at gmail.com>:
>>> On 6/6/21 1:57 PM, Heinrich Schuchardt wrote:
>>>> On 6/6/21 7:52 PM, Sean Anderson wrote:
>>>>> On 6/6/21 1:28 PM, Heinrich Schuchardt wrote:
>>>>>> On 6/6/21 6:44 PM, Simon Glass wrote:
>>>>>>> Hi Heinrich,
>>>>>>>
>>>>>>> On Mon, 22 Mar 2021 at 18:56, Simon Glass <sjg at chromium.org>
>>> wrote:
>>>>>>>>
>>>>>>>> Hi Heinrich,
>>>>>>>>
>>>>>>>> On Mon, 22 Mar 2021 at 23:02, Heinrich Schuchardt
>>>>>>>> <xypron.glpk at gmx.de> wrote:
>>>>>>>>>
>>>>>>>>> On 22.03.21 06:21, Simon Glass wrote:
>>>>>>>>>> At present if sandbox crashes it prints a message and tries to
>>>>>>>>>> exit. But
>>>>>>>>>> with the recently introduced signal handler, it often seems to
>>> get
>>>>>>>>>> stuck
>>>>>>>>>> in a loop until the stack overflows:
>>>>>>>>>>
>>>>>>>>>> Segmentation violation
>>>>>>>>>>
>>>>>>>>>> Segmentation violation
>>>>>>>>>>
>>>>>>>>>> Segmentation violation
>>>>>>>>>>
>>>>>>>>>> Segmentation violation
>>>>>>>>>>
>>>>>>>>>> Segmentation violation
>>>>>>>>>>
>>>>>>>>>> Segmentation violation
>>>>>>>>>>
>>>>>>>>>> Segmentation violation
>>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> Hello Simon,
>>>>>>>>>
>>>>>>>>> do you have a reproducible example? I never have seen this.
>>>>>>>>
>>>>>>>> https://source.denx.de/u-boot/custodians/u-boot-dm/-/jobs/242433
>>>>>>>>
>>>>>>>> You need to run that commit with pytest though...it does not
>>> happen
>>>>>>>> when run directly.
>>>>>>>>
>>>>>>>> BTW this sems to expose some rather nasty bug in dlmalloc or how
>>> it is
>>>>>>>> used. I notice that as soon as the first test is run, the 'top'
>>> value
>>>>>>>> in dlmalloc is outside the range of the malloc pool, which seems
>>>>>>>> wrong. I wonder if there is something broken with how
>>>>>>>> dm_test_pre_run() and dm_test_post_run() work.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Corrupting gd could cause an endless recursive loop, as these
>>> lines
>>>>>>>>> follow printing the observed string:
>>>>>>>>>
>>>>>>>>>           printf("pc = 0x%lx, ", pc);
>>>>>>>>>           printf("pc_reloc =0x%lx\n\n", pc - gd->reloc_off);
>>>>>>>>
>>>>>>>> Yes I suspect printf() is dead.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> If we remove SA_NODEFER from the signal mask in
>>> arch/sandbox/cpu/os.c,
>>>>>>>>> recursion cannot occur anymore. If a segmentation violation
>>> occurs
>>>>>>>>> inside the handler it will be delegated to the default handler.
>>>>>>>>>
>>>>>>>>> Furthermore we could consider removing the signal handler at the
>>> start
>>>>>>>>> of os_signal_action().
>>>>>>>>
>>>>>>>> The issue is that if you get a segfault you really don't know if
>>> you
>>>>>>>> can continue and do anything else.
>>>>>>>>
>>>>>>>> What is the goal with the signal handler? I don't think the user
>>> can
>>>>>>>> do anything about it.
>>>>>>
>>>>>> Hello Simon,
>>>>>>
>>>>>> the signal handler prints out the crash location and this makes
>>>>>> analyzing problems much easier. It proved valuable to me several
>>> times.
>>>>>
>>>>> Can't you just rerun with gdb?
>>>>
>>>> This would require that the problem is easily reproducible which may
>>> not
>>>> be the case.
>>>
>>> Hm, perhaps you could keep track of how many times we've tried to catch
>>> a signal, and bail if this is the second time around. E.g. something
>>> like
>>>
>>
>> Removing SA_NODEFER from the signal mask will let the OS kick in if an exception occurs in a signal handler.
>>
>> No counter is needed.
>
> Yes that is correct.
>
> However I am still going to apply this patch, since I would prefer
> that the default behaviour is to exit with a signal, rather than
> continue. It indicates some sort of error (except if we are running a
> strange test), so it seems wrong to try to continue. It may produce
> other issues and sandbox is in an unknown state.
>
> I am happy to discuss another way / patch for doing what you need.


Your patch is not needed to exit if a signal occurs.
If the U-Boot sandbox resets or exits is controlled by SANDBOX_CRASH_RESET.

As said, to solve your problem just remove SA_NODEFER from the signal mask.

Best regards

Heinrich


More information about the U-Boot mailing list