buildman stops (crashed) on current master

Stefano Babic sbabic at denx.de
Tue Oct 19 22:10:12 CEST 2021


Hi Simon,

On 19.10.21 17:52, Simon Glass wrote:
> Hi Stefano,
> 
> On Tue, 19 Oct 2021 at 09:39, Stefano Babic <sbabic at denx.de> wrote:
>>
>> Hi Simon,
>>
>> On 07.10.21 15:43, Simon Glass wrote:
>>> Hi Stefano,
>>>
>>> On Thu, 7 Oct 2021 at 04:37, Stefano Babic <sbabic at denx.de> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> CI stops by building aarch64 without notice, for reference:
>>>>
>>>> https://source.denx.de/u-boot/custodians/u-boot-imx/-/jobs/332319
>>>>
>>>> There is no error, just process is killed. It looks like it stops at
>>>> xilinx_zynqmp_virt,
>>>>
>>>> ./tools/buildman/buildman -o /tmp -P -E -W aarch64but board can be built
>>>> without issues.
>>>>
>>>> If I build on my host (not in docker, anyway), it generally builds fine
>>>> - but it crashes sometimes, too. On gitlab instance , it crashes.
>>>> Issue does not seem that depends on merged patches, and introduces
>>>> boards were already built successfully. Any hint ? I have also no idea
>>>> what I should look as what I see is just
>>>>
>>>> "usr/bin/bash: line 104:    24 Killed
>>>> ./tools/buildman/buildman -o /tmp -P -E -W aarch64"
>>>
>>> I cannot see that link. I am not sure what is going on. Does it say
>>> what signal killed it?
>>
>> Pipelines on our server were not public - I have enbaled now for u-boot-imx.
>>
>>>
>>> Does it sit there for an hour and timeout? If so, then I  did see that
>>> myself once recently, when the Kconfig needed stdin, but I could not
>>> quitetie it down. I think buildman would provide it, but sometimes
>>> not, apparently. So it can happen when there is an existing build
>>> there and your new one which adds Kconfig options that don't have
>>> defaults, or something like that?
>>>
>>
>> I have investigated further, and I can reproduce it on my host outside
>> the gitlab server. buildman causes a OOM, but I cannot find the cause.
>>
>> Strange enough, this happens with the "aarch64" target, and I cannot
>> reproduce it with Tom's master. So it seems that -master is ok, and
>> somethin on u-boot-imx generates the OOM.
>>
>> However....
>>
>> The OOM happens always when -2 (two boards remain) appears. I can see
>> with htop that buildman starts to allocate memory until it is exhausted
>> (64GB RAM + 8 GB swap). Then the kernel decides that it is enough and
>> kills buildman - this is what I see on Ci.
>>
>> You can see now the pipelines:
>>
>> https://source.denx.de/u-boot/custodians/u-boot-imx/-/pipelines/9520
>>
>> I have then split aarch64 and I built imx8 separately - same result. The
>> pipeline stops with xilinx board, but they have nothing to do. In fact,
>> I can build all xilinx board separately. If I run buildman -W aarch64 -x
>> xilinx, OOM is shown by another board.
>>
>> Strange enough, I can build each single board with buildman without
>> issues, neither errors nor warnongs. Just when buildman runs all
>> together (aarch64, 308 boards), the OOM is generated.
>>
>> Bisect does not help: I started bisect, and at the end this commit was
>> presented:
>>
>> commit 53a24dee86fb72ae41e7579607bafe13442616f2
>> Author: Fabio Estevam <festevam at denx.de>
>> Date:   Mon Aug 23 21:11:09 2021 -0300
>>
>>       imx8mm-cl-iot-gate: Split the defconfigs
>>
>>
>> But it is a fake: I can revert it, I get the issue again. And the patch
>> has nothing to do.
>>
>> It looks to me it is something in binman, maybe triggered by some
>> changes in tree, but all boards can be built separately without issues.
>> I supposed to find the cause in code due to applied patches, but because
>> each board can be built and no help from bisect, I am quite puzzled. I
>> avoid to send a PR to Tom, else I guess the problem goes into -master,
>> but I do not know how to proceed, and I have a lot of patches to be applied.
>>
>> What can be done ?
>>
>>> If that is it, you can repeat it by clearing out your .bm-work
>>
>> On gitlab, the build starts from scratch.
> 
> Can you check that there is definitely nothing around from the previous build?

On my host, there is definitely something - because I cannot access the 
docker image on the server, I installed a local runner on my PC. So I 
can take a deeper look and jump on the container.

It runs, and I get the same issue. All boards are built, then it stucks 
until OOM happens and kernel kills it.

The job for aarch64 is:

./tools/buildman/buildman -o /tmp -P -E -W aarch64

and it runs in Tom's image. If I jump in the container (without running 
buildman), there is no .bm-work at all (this should be in /tmp, right ?)

uboot at e4a810aa6d8a:/$ ls -la tmp/
total 8
drwxrwxrwt 1 root root 4096 Sep 30 15:54 .
drwxr-xr-x 1 root root 4096 Oct 19 20:01 ..

So it is sure, before buildman runs there is no .bm-work. Then of 
course, after each job (defined in .gitlab-ci.yml), /tmp/.bm-work is 
present. But I guess you do not mean to drop .bm-work after each job, 
right ? Else this should be also put in .gitla-ci.yml.

Where build stucks...I do not know, but it looks like that all boards 
were already built successfully (just as example 
https://source.denx.de/u-boot/custodians/u-boot-imx/-/jobs/337915).

> 
>>
>>> directory then building just that board for one commit, then the next
>>> (with the Kconfig change).
>>
>> I have run buildman for each single board, all of them were successuful.
>> With aarch64, I get OOM from buildman.
>>
>>>
>>> Buildman is supposed to handle this, of course. I'm not sure what has changed.
>>>
> 
> I still believe this is due to the reason I said,

I am sure you're right, but I do not see any .bm-work before running 
buildman. Is there something I can turn on to get more info ?

> but I'm happy to be
> proved wrong.

Thanks,
Stefano

-- 
=====================================================================
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-53 Fax: +49-8142-66989-80 Email: sbabic at denx.de
=====================================================================


More information about the U-Boot mailing list