[PATCH] sandbox: Support signal handling only when requested

Heinrich Schuchardt xypron.glpk at gmx.de
Sun Jun 6 19:50:04 CEST 2021


On 6/6/21 7:37 PM, Simon Glass wrote:
> Hi Heinrich,
>
> On Sun, 6 Jun 2021 at 11:28, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
>>
>> On 6/6/21 6:44 PM, Simon Glass wrote:
>>> Hi Heinrich,
>>>
>>> On Mon, 22 Mar 2021 at 18:56, Simon Glass <sjg at chromium.org> wrote:
>>>>
>>>> Hi Heinrich,
>>>>
>>>> On Mon, 22 Mar 2021 at 23:02, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
>>>>>
>>>>> On 22.03.21 06:21, Simon Glass wrote:
>>>>>> At present if sandbox crashes it prints a message and tries to exit. But
>>>>>> with the recently introduced signal handler, it often seems to get stuck
>>>>>> in a loop until the stack overflows:
>>>>>>
>>>>>> Segmentation violation
>>>>>>
>>>>>> Segmentation violation
>>>>>>
>>>>>> Segmentation violation
>>>>>>
>>>>>> Segmentation violation
>>>>>>
>>>>>> Segmentation violation
>>>>>>
>>>>>> Segmentation violation
>>>>>>
>>>>>> Segmentation violation
>>>>>> ...
>>>>>
>>>>> Hello Simon,
>>>>>
>>>>> do you have a reproducible example? I never have seen this.
>>>>
>>>> https://source.denx.de/u-boot/custodians/u-boot-dm/-/jobs/242433
>>>>
>>>> You need to run that commit with pytest though...it does not happen
>>>> when run directly.
>>>>
>>>> BTW this sems to expose some rather nasty bug in dlmalloc or how it is
>>>> used. I notice that as soon as the first test is run, the 'top' value
>>>> in dlmalloc is outside the range of the malloc pool, which seems
>>>> wrong. I wonder if there is something broken with how
>>>> dm_test_pre_run() and dm_test_post_run() work.
>>>>
>>>>>
>>>>> Corrupting gd could cause an endless recursive loop, as these lines
>>>>> follow printing the observed string:
>>>>>
>>>>>           printf("pc = 0x%lx, ", pc);
>>>>>           printf("pc_reloc = 0x%lx\n\n", pc - gd->reloc_off);
>>>>
>>>> Yes I suspect printf() is dead.
>>>>
>>>>>
>>>>> If we remove SA_NODEFER from the signal mask in arch/sandbox/cpu/os.c,
>>>>> recursion cannot occur anymore. If a segmentation violation occurs
>>>>> inside the handler it will be delegated to the default handler.
>>>>>
>>>>> Furthermore we could consider removing the signal handler at the start
>>>>> of os_signal_action().
>>>>
>>>> The issue is that if you get a segfault you really don't know if you
>>>> can continue and do anything else.
>>>>
>>>> What is the goal with the signal handler? I don't think the user can
>>>> do anything about it.
>>
>> Hello Simon,
>>
>> the signal handler prints out the crash location and this makes
>> analyzing problems much easier. It proved valuable to me several times.
>
> Well I think we are at a draw on that point, as the patch has caused
> me pain many times!
>
>>
>>>
>>> I keep hitting this problem during development with sandbox, so I
>>> think I need to apply this patch.
>>>
>>> Does anything need to be updated in the tests?
>>>
>>> Regards,
>>> Simon
>>>
>>
>> Did you try removing SA_NODEFER as proposed?
>
> But what is the goal here...do you mean you want it to crash later? I
> just want it to crash immediately.

If you have not messed up gd, U-Boot will print the program counter
address and the program will exit. You can try this with the exception
command.

If you have messed up gd, you will still exit but you will get a SIGSEGV
warning from the operating system.

>
> What's actually wrong with putting this behaviour behind a flag? You
> could always run with the flag enabled if needed. But I just don't
> think it makes sense for the default behaviour to be to try to
> continue operation.

I want to know the relocated PC counter when the sandbox crashes.
Why do you want to hide it by default?

Best regards

Heinrich


More information about the U-Boot mailing list