Broken watchdog in u-boot master branch

Pali Rohár pali at kernel.org
Tue Oct 11 09:25:43 CEST 2022


On Tuesday 11 October 2022 09:18:56 Rasmus Villemoes wrote:
> On 10/10/2022 15.55, Tom Rini wrote:
> > On Sun, Oct 09, 2022 at 09:12:25PM +0200, Pali Rohár wrote:
> > 
> >> Hello! Watchdog code seems to be broken in u-boot master branch.
> >> On Nokia N900 I'm getting following message in qemu:
> >>
> >> cyclic function rx51_watchdog took too long: 10000us vs 1000us max, disabling
> >>
> >> Seems that watchdog core code is not prepared for "slower" watchdogs
> >> which communicate over slower i2c bus, like it is the case for N900.
> >>
> >> Disabling slower watchdog is a bad idea as it would result in reboot
> >> loop instead of slower - but working code.
> 
> So, a few thoughts.
> 
> First, I assume that that board has a very coarse-grained tick, probably
> just 1000Hz.

IIRC i2c is running at 100kHz. But u-boot i2c driver contains more
udelay() calls and more busy loops. And I was told that there are also
some hw erratas. So at the end if some other driver calls u-boot dm
function for i2c transfer, it can spend more time than expected...

Also there is another issue is that udelay() itself calls watchdog reset
function. So if watchdog is on i2c which driver calls udelay() it means
that watchdog reset function may be called in infinite loop. For this
reason n900 watchdog driver has lock which prevents and eliminate this
loop (if somebody calls reset during active lock then reset does
nothing).

> Otherwise it would be pretty amazing for cpu_time to come
> out as 10ms exactly. That's not the board's fault, of course, just an
> observation, but it is something we need to bear in mind. If the
> resolution is merely 100Hz, so 10ms is simply the granularity, we cannot
> really meaningfully compare the cpu_time to anything less than that,
> because every once in a while it _will_ happen that we sample "now" just
> before the tick, run the function, then sample again just after, and it
> may only have taken 17us, yet the diff comes out as 10ms.
> 
> Second, perhaps the threshold should not be a compile-time constant, but
> instead a fraction of the requested call frequency (say 1.5%, 1/64).
> I.e., if we've registered a function to be called every 10 seconds, we'd
> check if its runtime exceeded (10000000 >> 6) us. Preferably per above
> that bound is rounded up to a multiple of the timer's granularity (we
> can get that, right?)
> 
> Third, perhaps we shouldn't disable it, but just print a (one-time)
> warning. Adding a "already-warned" field to struct cyclic_info is
> certainly simple enough.
> 
> Rasmus


More information about the U-Boot mailing list