[Bug] Buidling sandbox_vpl_defconfig is unstable

Simon Glass sjg at chromium.org
Sun Jul 31 20:41:22 CEST 2022


Hi,

On Sun, 31 Jul 2022 at 06:29, Tom Rini <trini at konsulko.com> wrote:
>
> On Sun, Jul 31, 2022 at 02:08:13PM +0200, Heinrich Schuchardt wrote:
> > On 7/31/22 13:47, Tom Rini wrote:
> > > On Sun, Jul 31, 2022 at 12:51:51PM +0200, Heinrich Schuchardt wrote:
> > > > Hello Simon,
> > > >
> > > > Something is wrong with building sandbox_vpl_defconfig on Gitlab:
> > > >
> > > > The following build job failed:
> > > > https://source.denx.de/u-boot/custodians/u-boot-efi/-/jobs/475640
> > > >
> > > > The following build job succeeded:
> > > > https://source.denx.de/u-boot/custodians/u-boot-efi/-/jobs/475673
> > >
> > > It's not vpl related, it's just one of the odd gitlab specific races
> > > that pops up from time to time.
> >
> > I can't imagine that building TPL leading to a race condition is Gitlab
> > specific. There must be some fundamental issue in our build system.
>
> Never seen it outside of Gitlab, unlike the RTC race.  Not sure what
> trips it up but any of the sandbox builds can do it.  I mean gitlab
> specific here in that whatever conditions need to occur for the race to
> happen, it's only seen under gitlab.  Almost seems like some stale
> object directory problem, which shouldn't happen.

Yes I've not seen it either.

We did have a problem of building with and without
SPL_OF_PLATDATA_INST. With that symbol defined, we have struct udevice
objects in the linker list, otherwise we don't. There was a race on
this, if we built sandbox_spl and sandbox_noinst in the same build
directory.

The race happened because in one build the dt-plat.c file had devices
and in another it was empty. I believe I fixed it by separating out
the files, so that a separate dt-device.c file is used for the
SPL_OF_PLATDATA_INST case.

But perhaps my fix was not sufficient.

I don't have any good ideas. It might be possible to track what was
previously built in that thread's directory (.bm-work/00 etc.) and get
some info that way.

The missing symbols in the failed build are supposed to be in
dt-uclass.c but perhaps that file is empty. See Makefile.spl around
line 143 for this. Also see the multiple-file rule at 361 which is
what actually calls dtoc to all generate the files at once. In fact if
I try building sandbox_spl and then sandbox_noinst in the same output
directory (with a 'make sandbox_noinst_defconfig' in between) then
files in that build rule are not actually removed. Also I believe
there is supposed to be a FORCE on that rule, not that it helps in
this case.

Looking at it again now, the 'rm' part is clearly not working since
the 'all' variable isn't present. I'll send a patch for that in the
hope that it might help.

More broadly I wonder if we can test for this sort of thing. Is there
a way to run make with infinite threads, so that absolutely everything
happens in parallel, or in every combination of ordering? I suppose
that would be infeasible.

A 'make' expert could usefully spend some time digging into how things
work at present. Obviously the breakages are very rare, but even one
every month can be a pain.

Regards,
Simon

143:
# Files we need to generate
u-boot-spl-platdata_c := $(patsubst %.o,%.c,$(u-boot-spl-platdata))

# Files we won't generate and should remove
u-boot-spl-old-platdata_c := $(patsubst %.o,%.c,$(u-boot-spl-old-platdata))

361:
# Don't use dts_dir here, since it forces running this expensive rule every time
$(platdata-hdr) $(u-boot-spl-platdata_c) &: $(obj)/$(SPL_BIN).dtb
   @[ -d $(obj)/dts ] || mkdir -p $(obj)/dts
   @# Remove old files since which ones we generate depends on the setting
   @# of OF_PLATDATA_INST and this might change between builds. Leaving old
   @# ones around is confusing and it is possible that switching the
   @# setting again will use the old one instead of regenerating it.
   @rm -f $(u-boot-spl-all-platdata_c) $(u-boot-spl-all-platdata)
   $(call if_changed,dtoc)


More information about the U-Boot mailing list