Strange gitlab idea

Fri Aug 18 05:10:22 CEST 2023

Hi Tom,

On Thu, 17 Aug 2023 at 11:07, Tom Rini <trini at konsulko.com> wrote:
>
> On Thu, Aug 17, 2023 at 10:58:15AM -0600, Simon Glass wrote:
> > Hi Tom,
> >
> > On Thu, 17 Aug 2023 at 09:10, Tom Rini <trini at konsulko.com> wrote:
> > >
> > > On Thu, Aug 17, 2023 at 07:41:50AM -0600, Simon Glass wrote:
> > > > Hi Tom,
> > > >
> > > > On Tue, 15 Aug 2023 at 08:56, Tom Rini <trini at konsulko.com> wrote:
> > > > >
> > > > > On Tue, Aug 15, 2023 at 08:44:20AM -0600, Simon Glass wrote:
> > > > > > Hi Tom,
> > > > > >
> > > > > > On Sun, 13 Aug 2023 at 09:52, Tom Rini <trini at konsulko.com>
wrote:
> > > > > > >
> > > > > > > On Sat, Aug 12, 2023 at 09:14:45PM -0600, Simon Glass wrote:
> > > > > > >
> > > > > > > > Hi Tom,
> > > > > > > >
> > > > > > > > I notice that the runners are not utilised much by the QEMU
jobs,
> > > > > > > > since we only run one at a time.
> > > > > > > >
> > > > > > > > I wonder if we could improve this, perhaps by using a
different tag
> > > > > > > > for the QEMU ones and then having a machine that only runs
those (and
> > > > > > > > runs 40 in parallel)?
> > > > > > > >
> > > > > > > > In general our use of the runners seems a bit primitive,
since the
> > > > > > > > main use of parallelism is in the world builds.
> > > > > > >
> > > > > > > I'm honestly not sure. I think there's a few tweaks that we
should do,
> > > > > > > like putting the opensbi and coreboot files in to the
Dockerfile logic
> > > > > > > instead.  And maybe seeing if just like we can have a docker
registry
> > > > > > > cache, if we can setup local pypi cache too?  I'm not
otherwise sure
> > > > > > > what's taking 23 seconds or so of
> > > > > > > https://source.denx.de/u-boot/u-boot/-/jobs/673565#L34 since
the build
> > > > > > > and run parts aren't much.
> > > > > > >
> > > > > > > My first big worry about running 2 or 3 qemu jobs at the same
time on a
> > > > > > > host is that any wins get from a shorter queue will be lost
to buildman
> > > > > > > doing "make -j$(nproc)" 2 or 3 times at once and so we build
slower.
> > > > > >
> > > > > > Yes, perhaps.
> > > > > >
> > > > > > >
> > > > > > > My second big worry is that getting the right tags on runners
will be a
> > > > > > > little tricky.
> > > > > >
> > > > > > Yes, and error-prone. Also it makes it harder to deal with
broken machines.
> > > > > >
> > > > > > >
> > > > > > > My third big worry (but this is something you can test easy
enough at
> > > > > > > least) is that running the big sandbox tests, 2 or 3 times at
once on
> > > > > > > the same host will get much slower. I think, but profiling
would be
> > > > > > > helpful, that those get slow due to I/O and not CPU.
> > > > > >
> > > > > > I suspect it would be fast enough.
> > > > > >
> > > > > > But actually the other problem is that I am not sure whether
the jobs
> > > > > > would have their own filesystem?
> > > > >
> > > > > Yes, they should be properly sandboxed.  If you want to test some
of
> > > > > these ideas, I think the best path is to just un-register
temproarily
> > > > > (comment out the token in config.toml) some of your runners and
then
> > > > > register them with just the DM tree and experiment.
> > > >
> > > > OK thanks for the idea. I tried this on tui
> > > >
> > > > I used a 'concurrent = 10' and it got up to a load of 70 or so every
> > > > now and then, but mostly it was much less.
> > > >
> > > > The whole run (of just the test.py stage) took 8 minutes, with
> > > > 'sandbox with clang test' taking the longest.
> > > >
> > > > I'm not too sure what that tells us...
> > >
> > > Well, looking at
> > > https://source.denx.de/u-boot/u-boot/-/pipelines/17391/builds the
whole
> > > run took 56 minutes, of which 46 minutes was on 32bit ARM world build.
> > > And the longest test.py stage was sandbox without LTO at just under 8
> > > minutes.  So I think trying to get more concurrency in this stage is
> > > likely to be a wash in terms of overall CI run time.
> >
> > There is quite a lot of variability. Two of the machines take about
> > 15mins to 32-bit ARM and another two take under 20mins, e.g.:
> >
> > https://source.denx.de/u-boot/custodians/u-boot-dm/-/jobs/676055
> >
> > Perhaps we should reserve the big jobs for the fastest machines? But
> > then what if they all go offline at once?
>
> Barring some significant donation of resources, we probably are just
> going to have to live with enough variation in build time "about an
> hour" is what we'll end up with.  I see that overall the pipeline the
> above example is from took 50 minutes.

Yes I think so.

Regards,
Simon