buildman stops (crashed) on current master

Simon Glass sjg at chromium.org
Tue Oct 19 17:52:19 CEST 2021


Hi Stefano,

On Tue, 19 Oct 2021 at 09:39, Stefano Babic <sbabic at denx.de> wrote:
>
> Hi Simon,
>
> On 07.10.21 15:43, Simon Glass wrote:
> > Hi Stefano,
> >
> > On Thu, 7 Oct 2021 at 04:37, Stefano Babic <sbabic at denx.de> wrote:
> >>
> >> Hi all,
> >>
> >> CI stops by building aarch64 without notice, for reference:
> >>
> >> https://source.denx.de/u-boot/custodians/u-boot-imx/-/jobs/332319
> >>
> >> There is no error, just process is killed. It looks like it stops at
> >> xilinx_zynqmp_virt,
> >>
> >> ./tools/buildman/buildman -o /tmp -P -E -W aarch64but board can be built
> >> without issues.
> >>
> >> If I build on my host (not in docker, anyway), it generally builds fine
> >> - but it crashes sometimes, too. On gitlab instance , it crashes.
> >> Issue does not seem that depends on merged patches, and introduces
> >> boards were already built successfully. Any hint ? I have also no idea
> >> what I should look as what I see is just
> >>
> >> "usr/bin/bash: line 104:    24 Killed
> >> ./tools/buildman/buildman -o /tmp -P -E -W aarch64"
> >
> > I cannot see that link. I am not sure what is going on. Does it say
> > what signal killed it?
>
> Pipelines on our server were not public - I have enbaled now for u-boot-imx.
>
> >
> > Does it sit there for an hour and timeout? If so, then I  did see that
> > myself once recently, when the Kconfig needed stdin, but I could not
> > quitetie it down. I think buildman would provide it, but sometimes
> > not, apparently. So it can happen when there is an existing build
> > there and your new one which adds Kconfig options that don't have
> > defaults, or something like that?
> >
>
> I have investigated further, and I can reproduce it on my host outside
> the gitlab server. buildman causes a OOM, but I cannot find the cause.
>
> Strange enough, this happens with the "aarch64" target, and I cannot
> reproduce it with Tom's master. So it seems that -master is ok, and
> somethin on u-boot-imx generates the OOM.
>
> However....
>
> The OOM happens always when -2 (two boards remain) appears. I can see
> with htop that buildman starts to allocate memory until it is exhausted
> (64GB RAM + 8 GB swap). Then the kernel decides that it is enough and
> kills buildman - this is what I see on Ci.
>
> You can see now the pipelines:
>
> https://source.denx.de/u-boot/custodians/u-boot-imx/-/pipelines/9520
>
> I have then split aarch64 and I built imx8 separately - same result. The
> pipeline stops with xilinx board, but they have nothing to do. In fact,
> I can build all xilinx board separately. If I run buildman -W aarch64 -x
> xilinx, OOM is shown by another board.
>
> Strange enough, I can build each single board with buildman without
> issues, neither errors nor warnongs. Just when buildman runs all
> together (aarch64, 308 boards), the OOM is generated.
>
> Bisect does not help: I started bisect, and at the end this commit was
> presented:
>
> commit 53a24dee86fb72ae41e7579607bafe13442616f2
> Author: Fabio Estevam <festevam at denx.de>
> Date:   Mon Aug 23 21:11:09 2021 -0300
>
>      imx8mm-cl-iot-gate: Split the defconfigs
>
>
> But it is a fake: I can revert it, I get the issue again. And the patch
> has nothing to do.
>
> It looks to me it is something in binman, maybe triggered by some
> changes in tree, but all boards can be built separately without issues.
> I supposed to find the cause in code due to applied patches, but because
> each board can be built and no help from bisect, I am quite puzzled. I
> avoid to send a PR to Tom, else I guess the problem goes into -master,
> but I do not know how to proceed, and I have a lot of patches to be applied.
>
> What can be done ?
>
> > If that is it, you can repeat it by clearing out your .bm-work
>
> On gitlab, the build starts from scratch.

Can you check that there is definitely nothing around from the previous build?

>
> > directory then building just that board for one commit, then the next
> > (with the Kconfig change).
>
> I have run buildman for each single board, all of them were successuful.
> With aarch64, I get OOM from buildman.
>
> >
> > Buildman is supposed to handle this, of course. I'm not sure what has changed.
> >

I still believe this is due to the reason I said, but I'm happy to be
proved wrong.

Regards,
Simon


More information about the U-Boot mailing list