Strange gitlab idea

Simon Glass sjg at chromium.org
Thu Aug 17 18:58:15 CEST 2023


Hi Tom,

On Thu, 17 Aug 2023 at 09:10, Tom Rini <trini at konsulko.com> wrote:
>
> On Thu, Aug 17, 2023 at 07:41:50AM -0600, Simon Glass wrote:
> > Hi Tom,
> >
> > On Tue, 15 Aug 2023 at 08:56, Tom Rini <trini at konsulko.com> wrote:
> > >
> > > On Tue, Aug 15, 2023 at 08:44:20AM -0600, Simon Glass wrote:
> > > > Hi Tom,
> > > >
> > > > On Sun, 13 Aug 2023 at 09:52, Tom Rini <trini at konsulko.com> wrote:
> > > > >
> > > > > On Sat, Aug 12, 2023 at 09:14:45PM -0600, Simon Glass wrote:
> > > > >
> > > > > > Hi Tom,
> > > > > >
> > > > > > I notice that the runners are not utilised much by the QEMU jobs,
> > > > > > since we only run one at a time.
> > > > > >
> > > > > > I wonder if we could improve this, perhaps by using a different tag
> > > > > > for the QEMU ones and then having a machine that only runs those (and
> > > > > > runs 40 in parallel)?
> > > > > >
> > > > > > In general our use of the runners seems a bit primitive, since the
> > > > > > main use of parallelism is in the world builds.
> > > > >
> > > > > I'm honestly not sure. I think there's a few tweaks that we should do,
> > > > > like putting the opensbi and coreboot files in to the Dockerfile logic
> > > > > instead.  And maybe seeing if just like we can have a docker registry
> > > > > cache, if we can setup local pypi cache too?  I'm not otherwise sure
> > > > > what's taking 23 seconds or so of
> > > > > https://source.denx.de/u-boot/u-boot/-/jobs/673565#L34 since the build
> > > > > and run parts aren't much.
> > > > >
> > > > > My first big worry about running 2 or 3 qemu jobs at the same time on a
> > > > > host is that any wins get from a shorter queue will be lost to buildman
> > > > > doing "make -j$(nproc)" 2 or 3 times at once and so we build slower.
> > > >
> > > > Yes, perhaps.
> > > >
> > > > >
> > > > > My second big worry is that getting the right tags on runners will be a
> > > > > little tricky.
> > > >
> > > > Yes, and error-prone. Also it makes it harder to deal with broken machines.
> > > >
> > > > >
> > > > > My third big worry (but this is something you can test easy enough at
> > > > > least) is that running the big sandbox tests, 2 or 3 times at once on
> > > > > the same host will get much slower. I think, but profiling would be
> > > > > helpful, that those get slow due to I/O and not CPU.
> > > >
> > > > I suspect it would be fast enough.
> > > >
> > > > But actually the other problem is that I am not sure whether the jobs
> > > > would have their own filesystem?
> > >
> > > Yes, they should be properly sandboxed.  If you want to test some of
> > > these ideas, I think the best path is to just un-register temproarily
> > > (comment out the token in config.toml) some of your runners and then
> > > register them with just the DM tree and experiment.
> >
> > OK thanks for the idea. I tried this on tui
> >
> > I used a 'concurrent = 10' and it got up to a load of 70 or so every
> > now and then, but mostly it was much less.
> >
> > The whole run (of just the test.py stage) took 8 minutes, with
> > 'sandbox with clang test' taking the longest.
> >
> > I'm not too sure what that tells us...
>
> Well, looking at
> https://source.denx.de/u-boot/u-boot/-/pipelines/17391/builds the whole
> run took 56 minutes, of which 46 minutes was on 32bit ARM world build.
> And the longest test.py stage was sandbox without LTO at just under 8
> minutes.  So I think trying to get more concurrency in this stage is
> likely to be a wash in terms of overall CI run time.

There is quite a lot of variability. Two of the machines take about
15mins to 32-bit ARM and another two take under 20mins, e.g.:

https://source.denx.de/u-boot/custodians/u-boot-dm/-/jobs/676055

Perhaps we should reserve the big jobs for the fastest machines? But
then what if they all go offline at once?

Regards,
Simon


More information about the U-Boot mailing list