buildman stops (crashed) on current master

Stefano Babic sbabic at denx.de
Wed Oct 20 11:54:57 CEST 2021


On 20.10.21 05:42, Simon Glass wrote:
> Hi,
> 
> On Tue, 19 Oct 2021 at 17:01, Tom Rini <trini at konsulko.com> wrote:
>>
>> On Tue, Oct 19, 2021 at 04:59:15PM -0600, Simon Glass wrote:
>>> Hi Tom,
>>>
>>> On Tue, 19 Oct 2021 at 16:53, Tom Rini <trini at konsulko.com> wrote:
>>>>
>>>> On Tue, Oct 19, 2021 at 05:39:12PM +0200, Stefano Babic wrote:
>>>>> Hi Simon,
>>>>>
>>>>> On 07.10.21 15:43, Simon Glass wrote:
>>>>>> Hi Stefano,
>>>>>>
>>>>>> On Thu, 7 Oct 2021 at 04:37, Stefano Babic <sbabic at denx.de> wrote:
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> CI stops by building aarch64 without notice, for reference:
>>>>>>>
>>>>>>> https://source.denx.de/u-boot/custodians/u-boot-imx/-/jobs/332319
>>>>>>>
>>>>>>> There is no error, just process is killed. It looks like it stops at
>>>>>>> xilinx_zynqmp_virt,
>>>>>>>
>>>>>>> ./tools/buildman/buildman -o /tmp -P -E -W aarch64but board can be built
>>>>>>> without issues.
>>>>>>>
>>>>>>> If I build on my host (not in docker, anyway), it generally builds fine
>>>>>>> - but it crashes sometimes, too. On gitlab instance , it crashes.
>>>>>>> Issue does not seem that depends on merged patches, and introduces
>>>>>>> boards were already built successfully. Any hint ? I have also no idea
>>>>>>> what I should look as what I see is just
>>>>>>>
>>>>>>> "usr/bin/bash: line 104:    24 Killed
>>>>>>> ./tools/buildman/buildman -o /tmp -P -E -W aarch64"
>>>>>>
>>>>>> I cannot see that link. I am not sure what is going on. Does it say
>>>>>> what signal killed it?
>>>>>
>>>>> Pipelines on our server were not public - I have enbaled now for u-boot-imx.
>>>>>
>>>>>>
>>>>>> Does it sit there for an hour and timeout? If so, then I  did see that
>>>>>> myself once recently, when the Kconfig needed stdin, but I could not
>>>>>> quitetie it down. I think buildman would provide it, but sometimes
>>>>>> not, apparently. So it can happen when there is an existing build
>>>>>> there and your new one which adds Kconfig options that don't have
>>>>>> defaults, or something like that?
>>>>>>
>>>>>
>>>>> I have investigated further, and I can reproduce it on my host outside the
>>>>> gitlab server. buildman causes a OOM, but I cannot find the cause.
>>>>>
>>>>> Strange enough, this happens with the "aarch64" target, and I cannot
>>>>> reproduce it with Tom's master. So it seems that -master is ok, and somethin
>>>>> on u-boot-imx generates the OOM.
>>>>>
>>>>> However....
>>>>>
>>>>> The OOM happens always when -2 (two boards remain) appears. I can see with
>>>>> htop that buildman starts to allocate memory until it is exhausted (64GB RAM
>>>>> + 8 GB swap). Then the kernel decides that it is enough and kills buildman -
>>>>> this is what I see on Ci.
>>>>>
>>>>> You can see now the pipelines:
>>>>>
>>>>> https://source.denx.de/u-boot/custodians/u-boot-imx/-/pipelines/9520
>>>>>
>>>>> I have then split aarch64 and I built imx8 separately - same result. The
>>>>> pipeline stops with xilinx board, but they have nothing to do. In fact, I
>>>>> can build all xilinx board separately. If I run buildman -W aarch64 -x
>>>>> xilinx, OOM is shown by another board.
>>>>>
>>>>> Strange enough, I can build each single board with buildman without issues,
>>>>> neither errors nor warnongs. Just when buildman runs all together (aarch64,
>>>>> 308 boards), the OOM is generated.
>>>>>
>>>>> Bisect does not help: I started bisect, and at the end this commit was
>>>>> presented:
>>>>>
>>>>> commit 53a24dee86fb72ae41e7579607bafe13442616f2
>>>>> Author: Fabio Estevam <festevam at denx.de>
>>>>> Date:   Mon Aug 23 21:11:09 2021 -0300
>>>>>
>>>>>      imx8mm-cl-iot-gate: Split the defconfigs
>>>>
>>>> I strongly suspect what's going on here is that these new defconfigs are
>>>> out of sync with changes now in Kconfig.  The build itself will just sit
>>>> there, waiting for the "oldconfig" prompt to be answered.
>>>>
>>>> I want to say the problem here is that stdin is open, rather than
>>>> pointing to something closed and would lead to the build failing
>>>> immediately, rather than once a timeout is hit, or OOM kicks in due to
>>>> kconfig chewing up all the memory.
>>>
>>> Yes that's exactly what I saw...
>>>
>>> In fact, see this commit:
>>>
>>> e62a24ce27a buildman: Avoid hanging when the config changes
>>>
>>> But that was 3 years ago.
>>
>> Looks like something else needs to be changed then, I've bisected down
>> similar failures here before very recently.
> 
> I dug into this a bit and I think buildman can detect this situation.
> I'll send a little series.
> 

Patch definetly help ;-)

It breaks build (on CI when build-tools runs), but I get much more 
details when I build locally single boards. I can find for 
kontron-sl-mx8mm several errors due to:

- CONFIG_SYS_LOAD_ADDR not defined in configs, but in header
- CONFIG_SYS_EXTRA_OPTIONS instead of CONFIG_IMX_CONFIG
- CONFIG_SYS_MALLOC_LEN not defined in config, but in header

Your patch are a valueable tool (CI driove me crazy), I can now folow 
what happens. I send a patch for kontron, and I go on with the rest (I 
guess kontron is not the only board causing this deadlock). Many thanks !

Tom, I apply Simon's patches on my tree, I cannot work without them...

Regards,
Stefano

-- 
=====================================================================
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-53 Fax: +49-8142-66989-80 Email: sbabic at denx.de
=====================================================================


More information about the U-Boot mailing list