[PATCH] sandbox: Support signal handling only when requested

Simon Glass sjg at chromium.org
Sun Jun 6 19:37:43 CEST 2021


Hi Heinrich,

On Sun, 6 Jun 2021 at 11:28, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
>
> On 6/6/21 6:44 PM, Simon Glass wrote:
> > Hi Heinrich,
> >
> > On Mon, 22 Mar 2021 at 18:56, Simon Glass <sjg at chromium.org> wrote:
> >>
> >> Hi Heinrich,
> >>
> >> On Mon, 22 Mar 2021 at 23:02, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
> >>>
> >>> On 22.03.21 06:21, Simon Glass wrote:
> >>>> At present if sandbox crashes it prints a message and tries to exit. But
> >>>> with the recently introduced signal handler, it often seems to get stuck
> >>>> in a loop until the stack overflows:
> >>>>
> >>>> Segmentation violation
> >>>>
> >>>> Segmentation violation
> >>>>
> >>>> Segmentation violation
> >>>>
> >>>> Segmentation violation
> >>>>
> >>>> Segmentation violation
> >>>>
> >>>> Segmentation violation
> >>>>
> >>>> Segmentation violation
> >>>> ...
> >>>
> >>> Hello Simon,
> >>>
> >>> do you have a reproducible example? I never have seen this.
> >>
> >> https://source.denx.de/u-boot/custodians/u-boot-dm/-/jobs/242433
> >>
> >> You need to run that commit with pytest though...it does not happen
> >> when run directly.
> >>
> >> BTW this sems to expose some rather nasty bug in dlmalloc or how it is
> >> used. I notice that as soon as the first test is run, the 'top' value
> >> in dlmalloc is outside the range of the malloc pool, which seems
> >> wrong. I wonder if there is something broken with how
> >> dm_test_pre_run() and dm_test_post_run() work.
> >>
> >>>
> >>> Corrupting gd could cause an endless recursive loop, as these lines
> >>> follow printing the observed string:
> >>>
> >>>          printf("pc = 0x%lx, ", pc);
> >>>          printf("pc_reloc = 0x%lx\n\n", pc - gd->reloc_off);
> >>
> >> Yes I suspect printf() is dead.
> >>
> >>>
> >>> If we remove SA_NODEFER from the signal mask in arch/sandbox/cpu/os.c,
> >>> recursion cannot occur anymore. If a segmentation violation occurs
> >>> inside the handler it will be delegated to the default handler.
> >>>
> >>> Furthermore we could consider removing the signal handler at the start
> >>> of os_signal_action().
> >>
> >> The issue is that if you get a segfault you really don't know if you
> >> can continue and do anything else.
> >>
> >> What is the goal with the signal handler? I don't think the user can
> >> do anything about it.
>
> Hello Simon,
>
> the signal handler prints out the crash location and this makes
> analyzing problems much easier. It proved valuable to me several times.

Well I think we are at a draw on that point, as the patch has caused
me pain many times!

>
> >
> > I keep hitting this problem during development with sandbox, so I
> > think I need to apply this patch.
> >
> > Does anything need to be updated in the tests?
> >
> > Regards,
> > Simon
> >
>
> Did you try removing SA_NODEFER as proposed?

But what is the goal here...do you mean you want it to crash later? I
just want it to crash immediately.

What's actually wrong with putting this behaviour behind a flag? You
could always run with the flag enabled if needed. But I just don't
think it makes sense for the default behaviour to be to try to
continue operation.

Regards,
Simon


More information about the U-Boot mailing list