[PATCH 1/2] gitlab: Move the n900 test into its own section

Simon Glass sjg at chromium.org
Sun Jan 31 18:10:56 CET 2021


Hi Pali,

On Sun, 31 Jan 2021 at 10:05, Pali Rohár <pali at kernel.org> wrote:
>
> On Sunday 31 January 2021 09:51:44 Simon Glass wrote:
> > Hi Pali,
> >
> > On Sun, 31 Jan 2021 at 08:52, Pali Rohár <pali at kernel.org> wrote:
> > >
> > > On Sunday 31 January 2021 08:43:19 Simon Glass wrote:
> > > > Hi Pali,
> > > >
> > > > On Sun, 31 Jan 2021 at 08:04, Pali Rohár <pali at kernel.org> wrote:
> > > > >
> > > > > On Sunday 31 January 2021 08:49:20 Tom Rini wrote:
> > > > > > On Sun, Jan 31, 2021 at 01:15:20PM +0100, Pali Rohár wrote:
> > > > > > > On Saturday 30 January 2021 22:17:45 Simon Glass wrote:
> > > > > > > > This test is not reliable. Quite often (20%?) it makes the build fail and
> > > > > > > > a retry succeeds.
> > > > > > >
> > > > > > > This test should work. Are there any logs with issues?
> > > > > >
> > > > > > I don't see it failing any more often than other tests do, due to
> > > > > > network connectivity issues.  That may be helped by, now that we've
> > > > > > dropped Travis, having the container be pre-populated with more of the
> > > > > > downloaded files and pre-building the special QEMU.
> > > > >
> > > > > If there are just network issue problems then pre-downloading required
> > > > > files into cache / container should resolve them.
> > > >
> > > > The flake issues I see are like this:
> > > >
> > > > https://gitlab.denx.de/u-boot/custodians/u-boot-dm/-/jobs/202441
> > > >
> > > > I am not sure of the cause, but it would be good to fix it!
> > >
> > > Hello Simon! This is not a network issue problem but rather some U-Boot
> > > regression in mmc code. Second test failed with error:
> > >
> > >     "Failed to boot kernel from eMMC"
> > >
> > > Other tests succeed:
> > >
> > >     "Kernel was successfully booted from RAM"
> > >     "Kernel was successfully booted from OneNAND"
> > >
> > > So problem is really with second boot attempt from eMMC. U-Boot log is
> > > also available in output (as second run):
> > >
> > >     Check if pads/pull-ups of bus are properly configured
> > >     Timed out in wait_for_event: status=0000
> > >     ...
> > >     Timed out in wait_for_event: status=0000
> > >     Check if pads/pull-ups of bus are properly configured
> > >     Timed out in wait_for_event: status=0000
> > >     Check if pads/pull-ups of bus are properly configured
> > >     Timed out in wait_for_event: status=0000
> > >     Check if pads/pull-ups of bus are properly configured
> > >     test/nokia_rx51_test.sh: line 233:  5946 Killed                  ./qemu-system-arm -M n900 -mtdblock mtd_emmc.img -sd emmc_emmc.img -serial /dev/stdout -display none > qemu_emmc.log
> > >
> > > After 300s was qemu killed and test marked as failure.
> > >
> > > So this is valid failure and regression in u-boot emmc code. So it would
> > > be needed to identify which commit caused it and revert it...
> >
> > The problem is that it is intermittent. Can you repeat it?
>
> So when you run this test more times from same sources / git commit,
> this error appears only sometimes?

Perhaps 1 time in 5 or 10? Every time I click 'retry' in gitlab it
tries again and passes.

>
> This particular issue I have not seen in qemu yet when I run tests on my
> local machine. So I cannot reproduce it.
>
> I saw similar errors, but only on real device (not in qemu) and they
> were visible always (not sometimes). And for all my known problems I
> have sent patches to mailing list. including i2c, mmc and usb. Some of
> them are still waiting for review & merge...

So perhaps it has been fixed, but not yet merged?

>
> ===
>
> I know only one error which is not fixed yet and happens "only
> sometimes" which I was not able to debug yet. Probably if u-boot binary
> has particular size then it completely crashes (and with same binary it
> can be reproduced for every run). But recompiling u-boot binary resolves
> this issue and sometimes even without modifying source code. So I
> suspect that time&date string (which changes for every recompilation)
> must have some effect (maybe some +-1 padding?). Adding new random 100
> characters into env variables seems to fix it.

That's not good.

Re the analsys, that seems a bit of a stretch. While the time/date
changes, its length doesn't normally change.

Uninited values can have any behaviour. I assumes this is in U-Boot
proper, not SPL? You could check that BSS variables are not used
before relocation, perhaps?

>
> > >
> > > > Re the network issues, I have a persistent DNS problem with my
> > > > network. I am really not sure of the root cause but sometimes it will
> > > > fail to find a host, then succeed 5 seconds later. I spent some time
> > > > on it a few weeks ago but will try again.
Regards,
Simon


More information about the U-Boot mailing list