[PATCH v3] dm: core: Do not stop uclass iteration on error

Simon Glass sjg at chromium.org
Sat Sep 17 17:02:53 CEST 2022


Hi Michal,

On Wed, 31 Aug 2022 at 11:44, Simon Glass <sjg at chromium.org> wrote:
>
> Hi Michal,
>
> On Wed, 31 Aug 2022 at 01:39, Michal Suchánek <msuchanek at suse.de> wrote:
> >
> > Hello,
> >
> > On Tue, Aug 30, 2022 at 09:15:12PM -0600, Simon Glass wrote:
> > > Hi Michal,
> > >
> > > On Tue, 30 Aug 2022 at 10:48, Michal Suchánek <msuchanek at suse.de> wrote:
> > > >
> > > > On Tue, Aug 30, 2022 at 09:56:52AM -0600, Simon Glass wrote:
> > > > > Hi Michal,
> > > > >
> > > > > On Tue, 30 Aug 2022 at 04:23, Michal Suchánek <msuchanek at suse.de> wrote:
> > > > > >
> > > > > > On Sat, Aug 27, 2022 at 07:52:27PM -0600, Simon Glass wrote:
> > > > > > > Hi Michal,
> > > > > > >
> > > > > > > On Fri, 19 Aug 2022 at 14:23, Michal Suchanek <msuchanek at suse.de> wrote:
> > > > > > > >
> > > > > > > > When probing a device fails NULL pointer is returned, and other devices
> > > > > > > > cannot be iterated. Skip to next device on error instead.
> > > > > > > >
> > > > > > > > Fixes: 6494d708bf ("dm: Add base driver model support")
> > > > > > >
> > > > > > > I think you should drop this as you are doing a change of behaviour,
> > > > > > > not fixing a bug!
> > > > > >
> > > > > > You can hardly fix a bug without a change in behavior.
> > > > > >
> > > > > > These functions are used for iterating devices, and are not iterating
> > > > > > devices. That's clearly a bug.
> > > > >
> > > > > If it were clear I would have changed this long ago. The new way you
> > > > > have this function ignores errors, so they cannot be reported.
> > > > >
> > > > > We should almost always report errors, which is why I think your
> > > > > methods should be named differently.
> > > > >
> > > > > >
> > > > > > > > Signed-off-by: Michal Suchanek <msuchanek at suse.de>
> > > > > > > > ---
> > > > > > > > v2: - Fix up tests
> > > > > > > > v3: - Fix up API doc
> > > > > > > >     - Correctly forward error from uclass_get
> > > > > > > >     - Do not return an error when last device fails to probe
> > > > > > > >     - Drop redundant initialization
> > > > > > > >     - Wrap at 80 columns
> > > > > > > > ---
> > > > > > > >  drivers/core/uclass.c | 32 ++++++++++++++++++++++++--------
> > > > > > > >  include/dm/uclass.h   | 13 ++++++++-----
> > > > > > > >  test/dm/test-fdt.c    | 20 ++++++++++++++++----
> > > > > > > >  3 files changed, 48 insertions(+), 17 deletions(-)
> > > > > > >
> > > > > > > Unfortunately this still fails one test. Try 'make qcheck' to see it -
> > > > > > > it is ethernet.
> > > > > >
> > > > > > I will look at that.
> > > > > >
> > > > > > > I actually think you should create new functions for this feature,
> > > > > > > e.g.uclass_first_device_ok(), since it makes it impossible to see what
> > > > > > > when wrong with a device in the middle.
> > > > > > >
> > > > > > > I have long had all this in my mind. One idea for a future change is
> > > > > > > to return the error, but set dev, so that the caller knows there is a
> > > > > > > device, which failed. When we are at the end, dev is set to NULL.
> > > > > >
> > > > > > We already have uclass_first_device_check() and
> > > > > > uclass_next_device_check() to iterate all devices, including broken
> > > > > > ones, and getting the errors as well.
> > > > > >
> > > > > > That's for the case you want all the details, and these are for the case
> > > > > > you just want to get devices and don't care about the details.
> > > > > >
> > > > > > That's AFAICT as much as this iteration interface can provide, and we
> > > > > > have both cases covered.
> > > > >
> > > > > I see three cases:
> > > > > - want to see the next device, returning the error if it cannot be
> > > > > probed - uclass_first_device()
> > > >
> > > > And the point of this is what exactly?
> > >
> > > Please can you adjust your tone, It seems too aggressive for this
> > > mailing list. Thank you.
> > >
> > > >
> > > > The device order in the uclass is not well defined - at any time a new
> > > > device which will become the first can be added, fail probe, and block
> > > > what was assumed a loop iterating the uclass from returning any devices
> > > > at all. That's exactly what happened with the new sysreset.
> > >
> > > The order only changes if the device is unbound and rebound. Otherwise
> > > the order set by the device tree is used.
> >
> > So the order is defined by device tree. That does not make it
> > well-defined from the point of view of any kind of code.
> >
> > The point of device tree is that it can be replaced with another device
> > tree describing another board and the code should still work. Otherwise
> > we would not need device trees, and could keep using board files.
>
> We do use the raw ordering in test code, but in general we use the
> sequence number (from DT ordering or aliases) to provide the official
> ordering (the uclass...seq() calls).
>
> >
> > > > What is exactly the point of returning the error and not the pointer to
> > > > the next device?
> > >
> > > Partly, we have existing code which uses the interface, checking 'dev'
> > > to see if the device is valid. I would be happy to change that, so
> > > that the device is always returned. In fact I think it would be
> > > better. But it does need a bit of work with coccinelle, etc.
> >
> > I suppose changing the return type to void would catch the users that do
> > something with the return value but it would still need building all
> > the code.
> >
> > And it does not work for users of uclass_first_device_err which is
> > basically useless after this change but pretty much all users use the
> > return value.
> >
> > > > The only point of these simplified iterators is that the caller can
> > > > check only one value (device pointer) and then not check the error
> > > > because they don't care. If they do cate uclass_first_device_check()
> > > > provides all the details available.
> > >
> > > Yes I think we can have just two sets of iterators, but in that case
> > > it should be:
> > >
> > > - want to see the next device, returning the error if it cannot be
> > > probed, with dev updated to the next device in any case - new version
> > > of uclass_first_device() - basically rename
> > > uclass_first_device_check() to that
> >
> > About 2/3 of users of uclass_first_device don't use the return value at
> > all in current code. Changing uclass_first_device to
> > uclass_first_device_check is counterproductive. The current
> > documentation basically implies the new behavior, and there are a lot of
> > examples in the core code that use uclass_first_device in a for loop
> > without assigning the return value at all.
> >
> > Also renaming uclass_first_device_check would break the 3 existing users
> > of it.
> >
> > > - want to see next device which probes OK - your new function, perhaps
> > > uclass_first_device_ok() ?
> >
> > I don't think any amount of renaming is going to solve the problem at
> > hand: we have bazillion of users of uclass_first_device, and because it
> > was not documented that it does not in fact iterate uclass devices there
> > are users that use it for the purpose. There are also users that expect
> > maningful return value which is basically bogus - they do get a return
> > value of something, but not something specific.
> >
> > What can be done is adding the simple iterator under new name, convert
> > the obvious existing users, and mark the old function deprecated in some
> > way so that any code that uses it generates a warning.
>
> I'm OK with that. But let's rename uclass_first_device() to
> uclass_old_first_device() or something like that.

Just wondered if you have had time to respin this?

-next is open and I'd like to apply this soon so we have maximal testing time.

Regards,
Simon


More information about the U-Boot mailing list