[PATCH] net: eth-uclass: change state before stop() in eth_halt()
Marek Vasut
marex at denx.de
Sun Dec 4 03:17:08 CET 2022
On 12/2/22 11:36, Niel Fourie wrote:
> Hi Marek,
Hi,
> On 01/12/2022 11:44, Marek Vasut wrote:
>> On 12/1/22 09:24, Lukasz Majewski wrote:
>>> On Wed, 30 Nov 2022 17:42:25 +0100
>>> Niel Fourie <lusus at denx.de> wrote:
>>>
>>>> In eth_halt(), change the private uclass state before calling
>>>> stop() instead of afterwards, to avoid writing to memory which
>>>> may have been freed during stop().
>>>>
>>>> In the ethernet gadget implementation, the gadget device gets
>>>> probed during start() and removed during stop(), which includes
>>>> freeing `uclass_priv_` to which `priv` is pointing. Writing to
>>>> `priv` after stop() may corrupt the `fd` member of `struct
>>>> malloc_chunk`, which represents the freed block, and could cause
>>>> hard-to-debug crashes on subsequent calls to malloc()/free().
>>>>
>>>> Signed-off-by: Niel Fourie <lusus at denx.de>
>>>> Cc: Ramon Fried <rfried.dev at gmail.com>
>>>> Cc: Marek Vasut <marex at denx.de>
>>>> Cc: Lukasz Majewski <lukma at denx.de>
>>>> ---
>>>> net/eth-uclass.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/eth-uclass.c b/net/eth-uclass.c
>>>> index f41da4b37b3..bc3b9751e32 100644
>>>> --- a/net/eth-uclass.c
>>>> +++ b/net/eth-uclass.c
>>>> @@ -342,9 +342,9 @@ void eth_halt(void)
>>>> if (!priv || !priv->running)
>>>> return;
>>>> - eth_get_ops(current)->stop(current);
>>>> priv->state = ETH_STATE_PASSIVE;
>>>> priv->running = false;
>>>> + eth_get_ops(current)->stop(current);
>>>> }
>>>> int eth_is_active(struct udevice *dev)
>>>
>>> Reviewed-by: Lukasz Majewski <lukma at denx.de>
>>
>> How come nobody triggered this problem with regular ethernet in U-Boot ?
>>
>> If this is isolated to USB gadget ethernet, then please do not hack
>> around this in core networking code, but rather fix the USB ethernet
>> gadget itself. It seems that gadget code should not unregister the
>> gadget in drivers/usb/gadget/ether.c _usb_eth_halt() , at least not
>> fully.
>
> The reason is simple, the regular ethernet drivers do not get removed on
> stop() like for the gadget ethernet driver, and in their case `priv` is
> still valid.
The suggestion for a proper fix is in the last paragraph above -- do not
unregister the usb ethernet gadget device in halt(), keep the gadget
device registered. Then the priv pointer would be valid. (*)
> I agree on your point that reworking the ethernet gadget code would be
> preferable, but this would be a much bigger effort (and if I were to do
> it, I would probably introduce even more bugs). I am not certain whether
> this would not also affect the non-DM gadget implementation as well,
> which still contain drivers like ci_udc.c which does not appear to have
> been ported to DM gadget yet? (I only see DM USB there...)
>
> That said, I am not certain whether this is not also bug, as I am not
> certain whether the assumption that `priv` should be available after
> stop() is valid or not.
>
> The documentation at
> https://source.denx.de/u-boot/u-boot/-/blob/master/doc/develop/driver-model/ethernet.rst?plain=1#L135 states:
>
> The **stop** function should turn off / disable the hardware and place
> it back in its reset state. It can be called at any time (before any
> call to the related start() function), so make sure it can handle this
> sort of thing.
The ->stop callback is supposed to stop the interface, turn off its DMA,
but NOT deallocate the device (and its associated data) behind it.
> In such a complete reset state I am not certain whether the assumption
> that `priv` should exist is still valid, at least not without another
> call to dev_get_uclass_priv() and revalidating it first?
In case a device is probe()d, its private data are also allocated and
available, so yes, 'priv' pointer should still be valid.
> Lastly, this assumption that priv is still valid is rather new and it
> was introduced here:
>
> https://source.denx.de/u-boot/u-boot/-/commit/fa795f452541ce07b33be603de36cac3c5d7dfcf
I disagree, the device private data are valid during the entire lifespan
of the device. That assumption has been baked into the driver model
itself and far predates that commit.
In case of a usb ethernet, the lifespan of the device starts with the
'bind' command which triggers the ->bind callback, and first use which
triggers its ->probe callback. The lifespan ends with 'unbind' command
or OS boot, which triggers ->remove callback and ->unbind callbacks.
> This commit appears to be a workaround for drivers which cannot deal
> with stop() being called at any time as required in the above quoted
> documentation.
This commit prevents network device ->start() from being called multiple
times, which is a valid precaution, as calling start while the interface
is already up would interfere with existing connection (e.g. the
netconsole as mentioned in the commit message). That does not seem to be
a workaround to me.
> I would consider adding a workaround to a workaround in this case to be
> the lesser evil, as tracking down this bug in the first place was like
> looking for a needle in a haystack. This change would at least save
> everybody else from strange crashes in particular configurations without
> any negative impact. But this is fortunately not my decision. :)
Commit fa795f45254 ("net: eth-uclass: avoid running start() twice
without stop()") is as far as I can tell unrelated to this change and
does not seem to me like a workaround.
It does however show that this patch introduces a bug -- this patch
changes the order in which priv->state = ETH_STATE_PASSIVE; is assigned
from _after_ the ->stop callback to _before_ the -> stop callback. This
breaks drivers/net/ldpaa_eth/ldpaa_eth.c which checks the priv->state in
its ->stop callback, either on its own in non-DM case, or in
eth_is_active() implementation in DM case. With this patch, the
interface would never be stopped in the ->stop callback, because the
condition (net_dev->state == ETH_STATE_PASSIVE) test in the ldpaa stop
callback implementation would always be true.
The proper fix is in the usb ethernet gadget code, see (*) above, let's
not pile workarounds onto already difficult to maintain code.
[...]
More information about the U-Boot
mailing list