an off-by-one error in dm_test_rtc_set_get()?

Thu Oct 28 13:58:47 CEST 2021

On Thu, Oct 28, 2021 at 09:22:30AM +0200, Rasmus Villemoes wrote:
> On 27/10/2021 15.22, Tom Rini wrote:
> > On Wed, Oct 27, 2021 at 12:43:38PM +0800, Bin Meng wrote:
> >> Hi Simon,
> >>
> >> gitlab reported the following test error below:
> >>
> >> =================================== FAILURES ===================================
> >> __________________________ test_ut[ut_dm_rtc_set_get] __________________________
> >> test/py/tests/test_ut.py:43: in test_ut
> >> assert output.endswith('Failures: 0')
> >> E AssertionError: assert False
> >> E + where False = <built-in method endswith of str object at
> >> 0x7f3bb792dcb0>('Failures: 0')
> >> E + where <built-in method endswith of str object at 0x7f3bb792dcb0> =
> >> 'Test: dm_test_rtc_set_get: rtc.c\r\r\nexpected: 27/10/2021
> >> 03:38:15\r\r\nactual: 27/10/2021 03:38:14\r\r\ntest/dm/rtc...w, &cmp,
> >> 1): Expected 0x0 (0), got 0xffffffea (-22)\r\r\nTest:
> >> dm_test_rtc_set_get: rtc.c (flat tree)\r\r\nFailures: 1'.endswith
> >> ----------------------------- Captured stdout call -----------------------------
> >> =>
> >>
> >> See https://source.denx.de/u-boot/custodians/u-boot-x86/-/jobs/341905
> >>
> >> But the same branch same commit, azure test results passed:
> >> https://dev.azure.com/bmeng/GitHub/_build/results?buildId=460&view=results
> >>
> >> It looks like the error is an off-by-one where actual time is 1 second
> >> behind the expected time?
> >>
> >> expected: 27/10/2021 03:38:15
> >> actual: 27/10/2021 03:38:14
> >>
> >> Is this a known issue?
> > 
> > Yes, which is why the test checks for a certain amount of "fuzz" around
> > the return value. 
> 
> You said the same thing about dm_test_rtc_reset() in
> https://lore.kernel.org/u-boot/20210831124441.GC858@bill-the-cat/ , but
> I can't find anything about any fuzz in the code. Could you point out
> where you think that's implemented? In both cases, the expected and
> actual values were just 1 from each other, and I fail to see how any
> fuzz value could be smaller than that.
> 
>  I've wondered about if we need to increase that value
> > slightly sometimes, or just live with hitting the re-run failed jobs
> > button on whatever CI system was a bit too slow sometimes.
> 
> It has nothing to do with a CI being slow, it's plain and simple buggy
> test code AFAICT. It's essentially "assert(time(NULL) == time(NULL))".
> If a call to time() takes 1us, do this a million times and it will on
> average fail once. Obviously, a loaded system increases the chance of
> being preempted between the two calls and hence effectively increases
> the delta and proportionally the probability of hitting this.

You're right, I confused this with the sleep test, which does have a bit
of fuzz to it.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 659 bytes
Desc: not available
URL: <https://lists.denx.de/pipermail/u-boot/attachments/20211028/bd928885/attachment.sig>