[PATCH v3 1/2] CI: Move default image under global defaults
Tom Rini
trini at konsulko.com
Thu Feb 27 18:03:52 CET 2025
On Thu, Feb 27, 2025 at 09:26:10AM -0700, Simon Glass wrote:
> Hi Tom,
>
> On Mon, 24 Feb 2025 at 16:14, Tom Rini <trini at konsulko.com> wrote:
> >
> > On Sat, Feb 22, 2025 at 05:24:05PM -0700, Simon Glass wrote:
> > > Hi Tom,
> > >
> > > On Sat, 22 Feb 2025 at 14:37, Tom Rini <trini at konsulko.com> wrote:
> > > >
> > > > On Sat, Feb 22, 2025 at 10:23:59AM -0700, Simon Glass wrote:
> > > > > Hi Tom,
> > > > >
> > > > > On Fri, 21 Feb 2025 at 17:08, Tom Rini <trini at konsulko.com> wrote:
> > > > > >
> > > > > > On Fri, Feb 21, 2025 at 04:42:09PM -0700, Simon Glass wrote:
> > > > > > > Hi Tom,
> > > > > > >
> > > > > > > On Mon, 17 Feb 2025 at 07:14, Tom Rini <trini at konsulko.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Feb 17, 2025 at 06:14:06AM -0700, Simon Glass wrote:
> > > > > > > > > Hi Tom,
> > > > > > > > >
> > > > > > > > > On Sun, 16 Feb 2025 at 14:52, Tom Rini <trini at konsulko.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Sun, Feb 16, 2025 at 12:39:34PM -0700, Simon Glass wrote:
> > > > > > > > > > > Hi Tom,
> > > > > > > > > > >
> > > > > > > > > > > On Sun, 16 Feb 2025 at 09:07, Tom Rini <trini at konsulko.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Sun, Feb 16, 2025 at 07:10:12AM -0700, Simon Glass wrote:
> > > > > > > > > > > > > Hi Tom,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sat, 15 Feb 2025 at 11:12, Tom Rini <trini at konsulko.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Sat, Feb 15, 2025 at 10:21:16AM -0700, Simon Glass wrote:
> > > > > > > > > > > > > > > Hi Tom,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Sat, 15 Feb 2025 at 07:41, Tom Rini <trini at konsulko.com> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Sat, Feb 15, 2025 at 04:59:40AM -0700, Simon Glass wrote:
> > > > > > > > > > > > > > > > > Hi Tom,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Mon, 10 Feb 2025 at 09:25, Tom Rini <trini at konsulko.com> wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Thu, Feb 06, 2025 at 03:38:55PM -0700, Simon Glass wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > This is a global default, so put it under 'default' like the tags.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Signed-off-by: Simon Glass <sjg at chromium.org>
> > > > > > > > > > > > > > > > > > > Suggested-by: Tom Rini <trini at konsulko.com>
> > > > > > > > > > > > > > > > > > > Reviewed-by: Tom Rini <trini at konsulko.com>
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Please make v4 include the way you redid the second patch and be on top
> > > > > > > > > > > > > > > > > > of mainline, thanks.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > That's enough versions for me, so I'll let you do that, if you'd like.
> > > > > > > > > > > > > > > > > It probably doesn't affect your tree as not as much is done in
> > > > > > > > > > > > > > > > > parallel.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I am disappointed.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I'm sorry to disappoint you.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The background is that I looked at the difference between our trees
> > > > > > > > > > > > > > > and the gitlab files are quite different. My CI runs take about 35
> > > > > > > > > > > > > > > mins and it seems that yours is around 90 mins. I would like to reduce
> > > > > > > > > > > > > > > / remove the delta (for time and patch diff), but I'm not sure how.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > My goal is to get CI runs to below 20 minutes, best case.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I'm sure CI could be quicker still with a number of faster runners. But
> > > > > > > > > > > > > > if you can't be bothered to make changes against mainline, what is the
> > > > > > > > > > > > > > point?
> > > > > > > > > > > > >
> > > > > > > > > > > > > If you recall, I was working with your tree and had various ideas to
> > > > > > > > > > > > > speed things up, but you didn't like it. So I've had to do it in my
> > > > > > > > > > > > > tree. This is not about more runners (although I might have another
> > > > > > > > > > > > > one soon). It is about running jobs in parallel.
> > > > > > > > > > > >
> > > > > > > > > > > > And I wasn't sure more runners in parallel would help (as it would slow
> > > > > > > > > > > > down the fast runner which is what keeps the long jobs from being even
> > > > > > > > > > > > longer) as much as adding more regular runners would (which we've done)
> > > > > > > > > > > > and noted that in the end it's a configuration on the runner side so to
> > > > > > > > > > > > go ahead. And I reviewed and ack'd the patches here which exposed the
> > > > > > > > > > > > issues your path revealed. I just can't apply them because they need to
> > > > > > > > > > > > be rebased (and squashed).
> > > > > > > > > > >
> > > > > > > > > > > You have already added tags for things, but (IIUC) they are around the
> > > > > > > > > > > other way from what I have added.
> > > > > > > > > > >
> > > > > > > > > > > I have a tag called 'single' which means that the machine is only
> > > > > > > > > > > allowed to one of those jobs. The world-build jobs are marked with
> > > > > > > > > > > 'single'.
> > > > > > > > > > >
> > > > > > > > > > > For other jobs, I allow the runners to pick up some in parallel
> > > > > > > > > > > depending on their performance (for moa and tui that is 10).
> > > > > > > > > > >
> > > > > > > > > > > So at most, there is a 'world build' and 10 test.py jobs running on
> > > > > > > > > > > the same machine. It seems to work fine in practice, although I would
> > > > > > > > > > > rather be able to make these two types of jobs mutually exclusive, so
> > > > > > > > > > > that a runner is either running 10 parallel jobs or 1 'single' job,
> > > > > > > > > > > but not both. I'm not sure how to do that.
> > > > > > > > > >
> > > > > > > > > > So unless I'm missing something, in both cases the bottleneck is that
> > > > > > > > > > for world build jobs you don't want anything else going on with the
> > > > > > > > > > underlying build host. You could register 10 "all" runners and 1 "fast
> > > > > > > > > > amd64" runner (and something similar but smaller for alexandra). If you
> > > > > > > > > > update the registrations on source.denx.de can you then shut down your
> > > > > > > > > > gitlab instance?
> > > > > > > > >
> > > > > > > > > I've put a tag of 'single' on things that should run on the single-job
> > > > > > > > > runner. Everything else can run concurrently, e.g. up to 10 jobs. So I
> > > > > > > > > have two runners on the same host. E.g. tui-single has 'limit = 1',
> > > > > > > > > but 'tui' has no limit and is just governed by the 'concurrent = 10'
> > > > > > > > > at the top of the file.
> > > > > > > >
> > > > > > > > Yes. And you could move those runners to the mainline gitlab. There is
> > > > > > > > no "single" tag, that would be the "all" tag. And "tui-single" would be
> > > > > > > > "fast amd64".
> > > > > > >
> > > > > > > They are still attached to the Denx gitlab. Nothing has changed on my
> > > > > > > side. I'm not sure that your new tags are working though. I have a
> > > > > > > feeling something broke along the way when you made all your tag
> > > > > > > changes. One of my servers makes a bit of noise and I haven't heard it
> > > > > > > in quite a while.
> > > > > >
> > > > > > There's a few of your runners that are "stale" and haven't contacted
> > > > > > gitlab in a long time. I'll double check the tags tho.
> > > > > >
> > > > > > > If Denx would like to give me access to their gitlab instances, I'd be
> > > > > > > happy to play around and figure out how to get it going as fast as my
> > > > > > > tree does, and send a patch.
> > > > > >
> > > > > > I'm not sure what you mean by that? The instance itself?
> > > > >
> > > > > Yes. I can fiddle with tags on my runners and try to figure it out.
> > > >
> > > > I'm not sure what you're getting at here. If you mean "tags" in
> > > > /etc/gitlab-runner/config.toml those aren't relevant here I believe.
> > >
> > > No, I mean the tags in CI. If I fiddle with them I can probably come
> > > up with a way to run your CI much faster. Mine is about 35mins.
> >
> > I'm not so sure about that. Yours runs faster because it tests less. Now
> > that we've got some of your other fast runners showing up again, this is
> > more instructive of current times I think:
> > https://source.denx.de/u-boot/u-boot/-/pipelines/24802
>
> But not this? :
You forgot a link. But presumably to some run yesterday which took
longer. And because Ilias was tweaking the currently donated arm64
runners (that have other jobs to run) and also we had two or three
custodians at a time preparing trees, things ran slower.
> > If you want to make mainline CI run faster you will need to catch up
> > with the missing coverage or argue that some things are redundant.
>
> Or perhaps I can actually just make it faster without dropping coverage?
I mean, I don't know how that's physically possible, outside of adding
many more expensive build hosts. We have two-three fast arm64 hosts and
that world builds between 30-45 minutes. That's the biggest time
bottleneck.
The next biggest is that unless sandbox tests are run on a fast host,
they take upwards of 10 minutes, rather than 5.
But please, rebase your work to next and see what you can do. There is
likely some speed-ups possible if we allow for failures to take longer
to happen (and don't gate world builds on all of test.py stage
completing, just say sandbox). And if you do the work on source.denx.de
(as there is *NOTHING* stopping you from registering more runners to
your tree and using whatever tagging scheme you like) you might even see
more of the time variability due to load from other custodians.
> > > > > > > I also have another runner to add.
> > > > > >
> > > > > > I'll contact you off-list with the token.
> > > > > >
> > > > > > > > > From my side, I have found it helpful and refreshing to have a gitlab
> > > > > > > > > instance which I can control, e.g. it runs in half the time and if my
> > > > > > > > > patches are completely blocked by Linaro, etc., I have an escape
> > > > > > > > > valve.
> > > > > > > >
> > > > > > > > Yes, and I have no idea what any of that has to do with anything other
> > > > > > > > than leading to confusion about what tree is or is not mainline. Since
> > > > > > > > you own u-boot.org and ci.u-boot.org is your gitlab and
> > > > > > > > https://ci.u-boot.org/u-boot/u-boot/ is your personal tree.
> > > > > > >
> > > > > > > For now I am working with my tree, so that I am not blocked by Linaro,
> > > > > > > etc. but as you have seen I can rebase series for your tree as needed.
> > > > > >
> > > > > > And you're not addressing my point about using the project domain for
> > > > > > your personal tree. That's my big huge "are you forking the project or
> > > > > > what" problem.
> > > > >
> > > > > I'm just making sure that my work is not blocked or lost, as that has
> > > > > happened too many times in the past few years.
> > > >
> > > > Again, are you intending to fork the project? Putting your personal tree
> > > > in as "https://ci.u-boot.org/u-boot/u-boot.git" is not OK. I keep asking
> > > > you to stop it.
> > >
> > > No, I'm not intending to fork anything. But I need a tree that I can
> > > control and push things into.
> >
> > I don't know how you can call your personal tree being at
> > "https://ci.u-boot.org/u-boot/u-boot.git" and saying it's somewhere you
> > control and can push to while not also saying it's a fork. If you want
> > to close down your gitlab and CNAME ci.u-boot.org to source.denx.de, you
> > can still push things to u-boot-dm. Or if that's too constrained of a
> > namespace you can also get a contributors/sjg/ namespace. But what
> > you're doing today WILL lead to confusion.
>
> I believe I've answered this question before. It is simply that I
> cannot get certain patches (bloblist, EFI, devicetree) into your tree.
> There really isn't any other reason.
Yes, that's still not an answer to my question.
Or is the answer to my question "Yes, I'm trying to confuse people to
thinking my tree is mainline."
> At the moment your CI seems to be flaky as well:
>
> https://source.denx.de/u-boot/custodians/u-boot-dm/-/jobs/1038174
[aside, I think you meant to link to the pipeline itself, which also
passed, but had some retries]
Funny story. Ilias needed to tweak the fast arm64 hosts and also wanted
to explore "What if we have concurrency higher?" and ran in to the
problems you also ran in to with respect to git seeing an existing clone
in progress and bailing. Followed by the problem of multiple non-trivial
jobs running concurrently.
All of which is why I keep trying to tell you that while "single" and
concurrent runners work fine for you on a single user instance it will
not scale.
--
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 659 bytes
Desc: not available
URL: <https://lists.denx.de/pipermail/u-boot/attachments/20250227/6f73eee8/attachment.sig>
More information about the U-Boot
mailing list