Strange gitlab idea

Thu Aug 17 16:28:35 CEST 2023

Hi Tom,

On Thu, 17 Aug 2023 at 07:41, Simon Glass <sjg at chromium.org> wrote:
>
> Hi Tom,
>
> On Tue, 15 Aug 2023 at 08:56, Tom Rini <trini at konsulko.com> wrote:
> >
> > On Tue, Aug 15, 2023 at 08:44:20AM -0600, Simon Glass wrote:
> > > Hi Tom,
> > >
> > > On Sun, 13 Aug 2023 at 09:52, Tom Rini <trini at konsulko.com> wrote:
> > > >
> > > > On Sat, Aug 12, 2023 at 09:14:45PM -0600, Simon Glass wrote:
> > > >
> > > > > Hi Tom,
> > > > >
> > > > > I notice that the runners are not utilised much by the QEMU jobs,
> > > > > since we only run one at a time.
> > > > >
> > > > > I wonder if we could improve this, perhaps by using a different tag
> > > > > for the QEMU ones and then having a machine that only runs those (and
> > > > > runs 40 in parallel)?
> > > > >
> > > > > In general our use of the runners seems a bit primitive, since the
> > > > > main use of parallelism is in the world builds.
> > > >
> > > > I'm honestly not sure. I think there's a few tweaks that we should do,
> > > > like putting the opensbi and coreboot files in to the Dockerfile logic
> > > > instead.  And maybe seeing if just like we can have a docker registry
> > > > cache, if we can setup local pypi cache too?  I'm not otherwise sure
> > > > what's taking 23 seconds or so of
> > > > https://source.denx.de/u-boot/u-boot/-/jobs/673565#L34 since the build
> > > > and run parts aren't much.
> > > >
> > > > My first big worry about running 2 or 3 qemu jobs at the same time on a
> > > > host is that any wins get from a shorter queue will be lost to buildman
> > > > doing "make -j$(nproc)" 2 or 3 times at once and so we build slower.
> > >
> > > Yes, perhaps.
> > >
> > > >
> > > > My second big worry is that getting the right tags on runners will be a
> > > > little tricky.
> > >
> > > Yes, and error-prone. Also it makes it harder to deal with broken machines.
> > >
> > > >
> > > > My third big worry (but this is something you can test easy enough at
> > > > least) is that running the big sandbox tests, 2 or 3 times at once on
> > > > the same host will get much slower. I think, but profiling would be
> > > > helpful, that those get slow due to I/O and not CPU.
> > >
> > > I suspect it would be fast enough.
> > >
> > > But actually the other problem is that I am not sure whether the jobs
> > > would have their own filesystem?
> >
> > Yes, they should be properly sandboxed.  If you want to test some of
> > these ideas, I think the best path is to just un-register temproarily
> > (comment out the token in config.toml) some of your runners and then
> > register them with just the DM tree and experiment.
>
> OK thanks for the idea. I tried this on tui
>
> I used a 'concurrent = 10' and it got up to a load of 70 or so every
> now and then, but mostly it was much less.
>
> The whole run (of just the test.py stage) took 8 minutes, with
> 'sandbox with clang test' taking the longest.
>
> I'm not too sure what that tells us...

After a bit more thought, perhaps we should:

- Give everything except the world builds a special tag like 'single',
meaning it is somewhat single-threaded
- Adjust some runners to have a second registration which only accepts
'single' jobs, with a concurrency of 10, say
- Consider running everything in a single stage

That might be easy to maintain?

Regards,
Simon