[PATCH] sandbox: Support signal handling only when requested

Simon Glass sjg at chromium.org
Tue Jul 6 16:42:28 CEST 2021


Hi Heinrich,

On Mon, 5 Jul 2021 at 11:31, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
>
> On 7/4/21 10:15 PM, Simon Glass wrote:
> > Hi Heinrich,
> >
> > On Sun, 6 Jun 2021 at 15:35, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
> >>
> >> Am 6. Juni 2021 20:07:31 MESZ schrieb Sean Anderson <seanga2 at gmail.com>:
> >>> On 6/6/21 1:57 PM, Heinrich Schuchardt wrote:
> >>>> On 6/6/21 7:52 PM, Sean Anderson wrote:
> >>>>> On 6/6/21 1:28 PM, Heinrich Schuchardt wrote:
> >>>>>> On 6/6/21 6:44 PM, Simon Glass wrote:
> >>>>>>> Hi Heinrich,
> >>>>>>>
> >>>>>>> On Mon, 22 Mar 2021 at 18:56, Simon Glass <sjg at chromium.org>
> >>> wrote:
> >>>>>>>>
> >>>>>>>> Hi Heinrich,
> >>>>>>>>
> >>>>>>>> On Mon, 22 Mar 2021 at 23:02, Heinrich Schuchardt
> >>>>>>>> <xypron.glpk at gmx.de> wrote:
> >>>>>>>>>
> >>>>>>>>> On 22.03.21 06:21, Simon Glass wrote:
> >>>>>>>>>> At present if sandbox crashes it prints a message and tries to
> >>>>>>>>>> exit. But
> >>>>>>>>>> with the recently introduced signal handler, it often seems to
> >>> get
> >>>>>>>>>> stuck
> >>>>>>>>>> in a loop until the stack overflows:
> >>>>>>>>>>
> >>>>>>>>>> Segmentation violation
> >>>>>>>>>>
> >>>>>>>>>> Segmentation violation
> >>>>>>>>>>
> >>>>>>>>>> Segmentation violation
> >>>>>>>>>>
> >>>>>>>>>> Segmentation violation
> >>>>>>>>>>
> >>>>>>>>>> Segmentation violation
> >>>>>>>>>>
> >>>>>>>>>> Segmentation violation
> >>>>>>>>>>
> >>>>>>>>>> Segmentation violation
> >>>>>>>>>> ...
> >>>>>>>>>
> >>>>>>>>> Hello Simon,
> >>>>>>>>>
> >>>>>>>>> do you have a reproducible example? I never have seen this.
> >>>>>>>>
> >>>>>>>> https://source.denx.de/u-boot/custodians/u-boot-dm/-/jobs/242433
> >>>>>>>>
> >>>>>>>> You need to run that commit with pytest though...it does not
> >>> happen
> >>>>>>>> when run directly.
> >>>>>>>>
> >>>>>>>> BTW this sems to expose some rather nasty bug in dlmalloc or how
> >>> it is
> >>>>>>>> used. I notice that as soon as the first test is run, the 'top'
> >>> value
> >>>>>>>> in dlmalloc is outside the range of the malloc pool, which seems
> >>>>>>>> wrong. I wonder if there is something broken with how
> >>>>>>>> dm_test_pre_run() and dm_test_post_run() work.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Corrupting gd could cause an endless recursive loop, as these
> >>> lines
> >>>>>>>>> follow printing the observed string:
> >>>>>>>>>
> >>>>>>>>>           printf("pc = 0x%lx, ", pc);
> >>>>>>>>>           printf("pc_reloc =0x%lx\n\n", pc - gd->reloc_off);
> >>>>>>>>
> >>>>>>>> Yes I suspect printf() is dead.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> If we remove SA_NODEFER from the signal mask in
> >>> arch/sandbox/cpu/os.c,
> >>>>>>>>> recursion cannot occur anymore. If a segmentation violation
> >>> occurs
> >>>>>>>>> inside the handler it will be delegated to the default handler.
> >>>>>>>>>
> >>>>>>>>> Furthermore we could consider removing the signal handler at the
> >>> start
> >>>>>>>>> of os_signal_action().
> >>>>>>>>
> >>>>>>>> The issue is that if you get a segfault you really don't know if
> >>> you
> >>>>>>>> can continue and do anything else.
> >>>>>>>>
> >>>>>>>> What is the goal with the signal handler? I don't think the user
> >>> can
> >>>>>>>> do anything about it.
> >>>>>>
> >>>>>> Hello Simon,
> >>>>>>
> >>>>>> the signal handler prints out the crash location and this makes
> >>>>>> analyzing problems much easier. It proved valuable to me several
> >>> times.
> >>>>>
> >>>>> Can't you just rerun with gdb?
> >>>>
> >>>> This would require that the problem is easily reproducible which may
> >>> not
> >>>> be the case.
> >>>
> >>> Hm, perhaps you could keep track of how many times we've tried to catch
> >>> a signal, and bail if this is the second time around. E.g. something
> >>> like
> >>>
> >>
> >> Removing SA_NODEFER from the signal mask will let the OS kick in if an exception occurs in a signal handler.
> >>
> >> No counter is needed.
> >
> > Yes that is correct.
> >
> > However I am still going to apply this patch, since I would prefer
> > that the default behaviour is to exit with a signal, rather than
> > continue. It indicates some sort of error (except if we are running a
> > strange test), so it seems wrong to try to continue. It may produce
> > other issues and sandbox is in an unknown state.
> >
> > I am happy to discuss another way / patch for doing what you need.
>
>
> Your patch is not needed to exit if a signal occurs.
> If the U-Boot sandbox resets or exits is controlled by SANDBOX_CRASH_RESET.
>
> As said, to solve your problem just remove SA_NODEFER from the signal mask.

Perhaps we should discuss this on a contributor call. This thread is
getting too long...!

Regards,
Simon


More information about the U-Boot mailing list