[PATCH v4 07/10] watchdog: wdt-uclass.c: handle all DM watchdogs in watchdog_reset()

Wolfgang Denk wd at denx.de
Thu Aug 12 16:21:29 CEST 2021


Dear Tom,

In message <20210812134833.GU858 at bill-the-cat> you wrote:
> 
> Alright, lets take a look at what kind of area of the code we're talking
> about.  uclass_get is a pretty fundamental thing.  If that fails, your
> system is on fire.  Things are massively corrupt.

Full agreement here.

> So yes, return codes need to be checked and passed.  But no, not every
> single error path needs to print to the user along every part of an
> error path either.

So if "the system is on fire" is one of the cases where an error
message should be omitted to save maybe 50 or 100 bytes of image
size?  This sounds wrong to me.

When a system crashes or hangs, it is extremely helpful to gen an
indication of what happened.

Printing messages only with debug enabled is pretty worthless, as in
the real world the failures will happen in the field when running
the production image with debug not enabled.

And as soon as you do enable debug, you will have a different image,
which may or may not show the problem.  We have all been there
before.

> That would be the next patch in the series where the BSP author isn't
> currently checking the return value, and this series doesn't change
> that.  Perhaps it should, and CC the maintainer.

Indeed.

> But I think has been
> said a few times over the course of this series, what exactly is one
> going to do about the failure?  Getting specific for a moment, if you're
> in the case of "shutdown the watchdog" and the watchdog doesn't shutdown
> like you want it to, do you hang and hope the watchdog is alive to kick
> things still?  hang and lock the system?  Figure the system is on fire
> anyhow but add another message to the failure spew?

Adding a message about the _cause_ of the failure, i. e. at least
information about the place where it was first detected, is what
will be helpful to the engineer who is supposed to analyze the
problem.

And yes, if such a fundamental operation fails, it is wise to abort
the operation of this device - either by hard resetting it or by
hanging it, depending of what the chances are that a reboot might
fix the problem.

IMO it is better to fail a broken device in a reliable mode instead
of letting it run and having more spectacular crashes (likely with
more serious consequences) happen later.

> Again, I think the change that's needed to this patch is to make it
> return the error code to the caller.  Let the caller decide.  And make
> sure to CC the board maintainer on the next go-round so they can chime
> in about that.

If we agree that in the disputed case "the system is on fire", then
there is actually very little to decide.  There should be only one
possible action: stop here, before more damage happens.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
... not meant for the weak-at-heart: /^(?=.*one)(?=.*two)/
If you can follow that, you can use it.
          - Randal L. Schwartz in <ukpw67rhl3.fsf at julie.teleport.com>


More information about the U-Boot mailing list