[PATCH] CI: Add automatic retry for test.py jobs

Simon Glass sjg at google.com
Sun Jul 16 01:40:25 CEST 2023


Hi Tom,

On Thu, 13 Jul 2023 at 15:57, Tom Rini <trini at konsulko.com> wrote:
>
> On Thu, Jul 13, 2023 at 03:03:57PM -0600, Simon Glass wrote:
> > Hi Tom,
> >
> > On Wed, 12 Jul 2023 at 14:38, Tom Rini <trini at konsulko.com> wrote:
> > >
> > > On Wed, Jul 12, 2023 at 02:32:18PM -0600, Simon Glass wrote:
> > > > Hi Tom,
> > > >
> > > > On Wed, 12 Jul 2023 at 11:09, Tom Rini <trini at konsulko.com> wrote:
> > > > >
> > > > > On Wed, Jul 12, 2023 at 08:00:23AM -0600, Simon Glass wrote:
> > > > > > Hi Tom,
> > > > > >
> > > > > > On Tue, 11 Jul 2023 at 20:33, Tom Rini <trini at konsulko.com> wrote:
> > > > > > >
> > > > > > > It is not uncommon for some of the QEMU-based jobs to fail not because
> > > > > > > of a code issue but rather because of a timing issue or similar problem
> > > > > > > that is out of our control. Make use of the keywords that Azure and
> > > > > > > GitLab provide so that we will automatically re-run these when they fail
> > > > > > > 2 times. If they fail that often it is likely we have found a real issue
> > > > > > > to investigate.
> > > > > > >
> > > > > > > Signed-off-by: Tom Rini <trini at konsulko.com>
> > > > > > > ---
> > > > > > >  .azure-pipelines.yml | 1 +
> > > > > > >  .gitlab-ci.yml       | 1 +
> > > > > > >  2 files changed, 2 insertions(+)
> > > > > >
> > > > > > This seems like a slippery slope. Do we know why things fail? I wonder
> > > > > > if we should disable the tests / builders instead, until it can be
> > > > > > corrected?
> > > > >
> > > > > It happens in Azure, so it's not just the broken runner problem we have
> > > > > in GitLab. And the problem is timing, as I said in the commit.
> > > > > Sometimes we still get the RTC test failing. Other times we don't get
> > > > > QEMU + U-Boot spawned in time (most often m68k, but sometimes x86).
> > > >
> > > > How do we keep this list from growing?
> > >
> > > Do we need to? The problem is in essence since we rely on free
> > > resources, sometimes some heavy lifts take longer.  That's what this
> > > flag is for.
> >
> > I'm fairly sure the RTC thing could be made deterministic.
>
> We've already tried that once, and it happens a lot less often. If we
> make it even looser we risk making the test itself useless.

For sleep, yes, but for rtc it should be deterministic now...next time
you get a failure could you send me the trace?

>
> > The spawning thing...is there a timeout for that? What actually fails?
>
> It doesn't spawn in time for the framework to get to the prompt.  We
> could maybe increase the timeout value.  It's always the version test
> that fails.

Ah OK, yes increasing the timeout makes sense.

>
> > > > > > I'll note that we don't have this problem with sandbox tests.
> > > > >
> > > > > OK, but that's not relevant?
> > > >
> > > > It is relevant to the discussion about using QEMU instead of sandbox,
> > > > e.g. with the TPM. I recall a discussion with Ilias a while back.
> > >
> > > I'm sure we could make sandbox take too long to start as well, if enough
> > > other things are going on with the system.  And sandbox has its own set
> > > of super frustrating issues instead, so I don't think this is a great
> > > argument to have right here (I have to run it in docker, to get around
> > > some application version requirements and exclude event_dump, bootmgr,
> > > abootimg and gpt tests, which could otherwise run, but fail for me).
> >
> > I haven't heard about this before. Is there anything that could be done?
>
> I have no idea what could be done about it since I believe all of them
> run fine in CI, including on this very host, when gitlab invokes it
> rather than when I invoke it. My point here is that sandbox tests are
> just a different kind of picky about things and need their own kind of
> "just hit retry".

Perhaps this is Python dependencies? I'm not sure, but if you see it
again, please let me know in case we can actually fix this.

Regards,
Simon


More information about the U-Boot mailing list