Broken watchdog in u-boot master branch

Tom Rini trini at konsulko.com
Mon Oct 10 19:56:10 CEST 2022


On Mon, Oct 10, 2022 at 07:44:05PM +0200, Pali Rohár wrote:
> On Monday 10 October 2022 13:40:38 Tom Rini wrote:
> > On Mon, Oct 10, 2022 at 07:22:56PM +0200, Pali Rohár wrote:
> > > On Monday 10 October 2022 12:28:18 Tom Rini wrote:
> > > > On Sun, Oct 09, 2022 at 09:12:25PM +0200, Pali Rohár wrote:
> > > > > Hello! Watchdog code seems to be broken in u-boot master branch.
> > > > > On Nokia N900 I'm getting following message in qemu:
> > > > > 
> > > > > cyclic function rx51_watchdog took too long: 10000us vs 1000us max, disabling
> > > > > 
> > > > > Seems that watchdog core code is not prepared for "slower" watchdogs
> > > > > which communicate over slower i2c bus, like it is the case for N900.
> > > > > 
> > > > > Disabling slower watchdog is a bad idea as it would result in reboot
> > > > > loop instead of slower - but working code.
> > > > 
> > > > So, looking at this in more detail, we have
> > > > CONFIG_CYCLIC_MAX_CPU_TIME_US as a configuration option (which is where
> > > > the too long comes from). And picking a random CI run:
> > > > https://source.denx.de/u-boot/u-boot/-/jobs/511177
> > > > I do see we hit this in CI once, but not every time, QEMU runs here. Is
> > > > that the max time is configurable enough to satisfy your concerns here?
> > > 
> > > It is needed to investigate, how to _properly_ fix this issue, not just
> > > workarounded it. Probably other boards may be affected.
> > 
> > So it's the cyclic watchdog code, which we merged as early as possible
> > that's the reason here. And it was merged as early as we could to see if
> > there's problems. Are there problems? We're seeing "system too slow,
> > disabling" on QEMU, sometimes, and the value of too slow is
> > configurable. I know you reported other problems with n900 HW, so we
> > can't see if it's failing there
> 
> I was tested it with older asm code (as described in that other email,
> via git checkout commit -- file) on n900 HW and watchdog problem is
> there too. Phone reboots in about 20 seconds. But as I do not have
> serial console, I do not know if that "disabling" message is printed
> there too (but I guess it is).

I think I'm a bit baffled at this point, honestly. The watchdog timeout
is 60 seconds. If you're confident in it being about 20 seconds,
consistently, changing WATCHDOG_TIMEOUT_MSECS to say 10000 (so, 10
seconds) should let you see if U-Boot has configured the watchdog and
it's being tripped, or if it's still at the prior stage value.

I would have expected that QEMU would see problems that real HW doesn't
(the value in your log is much higher than the one in CI), but I could
be wrong here.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 659 bytes
Desc: not available
URL: <https://lists.denx.de/pipermail/u-boot/attachments/20221010/626a3c28/attachment.sig>


More information about the U-Boot mailing list