[PATCH v3 0/6] Improved sysreset/watchdog uclass integration

Heinrich Schuchardt heinrich.schuchardt at canonical.com
Tue Nov 9 09:00:05 CET 2021



On 11/9/21 02:37, Andre Przywara wrote:
> On Mon, 8 Nov 2021 17:17:33 +0100
> Heinrich Schuchardt <heinrich.schuchardt at canonical.com> wrote:
> 
>> On 11/8/21 17:05, Tom Rini wrote:
>>> On Mon, Nov 08, 2021 at 08:58:33AM -0700, Simon Glass wrote:
>>>> Hi,
>>>>
>>>> On Sun, 7 Nov 2021 at 04:18, Heinrich Schuchardt
>>>> <heinrich.schuchardt at canonical.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 11/6/21 14:53, Tom Rini wrote:
>>>>>> On Sat, Nov 06, 2021 at 04:55:44AM +0100, Heinrich Schuchardt wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11/6/21 02:52, Andre Przywara wrote:
>>>>>>>> On Fri, 5 Nov 2021 18:56:34 -0400
>>>>>>>> Tom Rini <trini at konsulko.com> wrote:
>>>>>>>>   
>>>>>>>>> On Fri, Nov 05, 2021 at 09:38:50PM +0100, Heinrich Schuchardt wrote:
>>>>>>>>>> On 11/5/21 20:17, Tom Rini wrote:
>>>>>>>>>>> On Fri, Nov 05, 2021 at 07:37:02PM +0100, Heinrich Schuchardt wrote:
>>>>>>>>>>>> On 11/5/21 17:12, Simon Glass wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, 5 Nov 2021 at 08:21, Tom Rini <trini at konsulko.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Nov 05, 2021 at 12:14:47PM +0100, Stefan Roese wrote:
>>>>>>>>>>>>>>> Hi Andre,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Added Tom to Cc.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 05.11.21 11:04, Andre Przywara wrote:
>>>>>>>>>>>>>>>> On Thu, 4 Nov 2021 20:02:41 -0600
>>>>>>>>>>>>>>>> Simon Glass <sjg at chromium.org> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>> On Thu, 4 Nov 2021 at 19:22, Stefan Roese <sr at denx.de> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Andre,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 05.11.21 00:11, Andre Przywara wrote:
>>>>>>>>>>>>>>>>>>> On Thu, 4 Nov 2021 11:37:57 +0100
>>>>>>>>>>>>>>>>>>> Stefan Roese <sr at denx.de> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Stefan,
>>>>>>>>>>>>>>>>>>>> On 04.11.21 04:55, Samuel Holland wrote:
>>>>>>>>>>>>>>>>>>>>> This series hooks up the watchdog uclass to automatically register
>>>>>>>>>>>>>>>>>>>>> watchdog devices for use with sysreset, doing a bit of minor cleanup
>>>>>>>>>>>>>>>>>>>>> along the way.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The goal is for this to replace the sunxi board-level non-DM reset_cpu()
>>>>>>>>>>>>>>>>>>>>> function. I was surprised to find that the wdt_reboot driver requires
>>>>>>>>>>>>>>>>>>>>> its own undocumented device tree node, which references the watchdog
>>>>>>>>>>>>>>>>>>>>> device by phandle. This is problematic for us, because sunxi-u-boot.dtsi
>>>>>>>>>>>>>>>>>>>>> file covers 20 different SoCs with varying watchdog node phandle names.
>>>>>>>>>>>>>>>>>>>>> So it would have required adding a -u-boot.dtsi file for each board.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hooking things up automatically makes sense to me; this is what Linux
>>>>>>>>>>>>>>>>>>>>> does. However, I put the code behind a new option to avoid surprises for
>>>>>>>>>>>>>>>>>>>>> other platforms.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Changes in v3:
>>>>>>>>>>>>>>>>>>>>>           - Move condition to wdt-uclass.c to fix build errors.
>>>>>>>>>>>>>>>>>>>>>           - Include watchdog name in error message.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Changes in v2:
>>>>>>>>>>>>>>>>>>>>>           - Extend the "if SYSRESET" block to the end of the file.
>>>>>>>>>>>>>>>>>>>>>           - Also make gpio_reboot_probe function static.
>>>>>>>>>>>>>>>>>>>>>           - Rebase on top of 492ee6b8d0e7 (now handle all watchdogs).
>>>>>>>>>>>>>>>>>>>>>           - Added patches 5-6 as an example of how the new option will be used.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Samuel Holland (6):
>>>>>>>>>>>>>>>>>>>>>            sysreset: Add uclass Kconfig dependency to drivers
>>>>>>>>>>>>>>>>>>>>>            sysreset: Mark driver probe functions as static
>>>>>>>>>>>>>>>>>>>>>            sysreset: watchdog: Move watchdog reference to plat data
>>>>>>>>>>>>>>>>>>>>>            watchdog: Automatically register device with sysreset
>>>>>>>>>>>>>>>>>>>>>            sunxi: Avoid duplicate reset_cpu with SYSRESET enabled
>>>>>>>>>>>>>>>>>>>>>            sunxi: Use sysreset framework for poweroff/reset
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>           arch/arm/Kconfig                     |  3 +++
>>>>>>>>>>>>>>>>>>>>>           arch/arm/mach-sunxi/board.c          |  2 ++
>>>>>>>>>>>>>>>>>>>>>           drivers/sysreset/Kconfig             | 11 ++++++--
>>>>>>>>>>>>>>>>>>>>>           drivers/sysreset/sysreset_gpio.c     |  2 +-
>>>>>>>>>>>>>>>>>>>>>           drivers/sysreset/sysreset_resetctl.c |  2 +-
>>>>>>>>>>>>>>>>>>>>>           drivers/sysreset/sysreset_syscon.c   |  2 +-
>>>>>>>>>>>>>>>>>>>>>           drivers/sysreset/sysreset_watchdog.c | 40 ++++++++++++++++++++++------
>>>>>>>>>>>>>>>>>>>>>           drivers/watchdog/wdt-uclass.c        |  8 ++++++
>>>>>>>>>>>>>>>>>>>>>           include/sysreset.h                   | 10 +++++++
>>>>>>>>>>>>>>>>>>>>>           9 files changed, 67 insertions(+), 13 deletions(-)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Applied to u-boot-marvell
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Mmmh, why u-boot-marvell,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Because I'm handling watchdog related changed since a few years and we
>>>>>>>>>>>>>>>>>> did not create a specific subsystem repo for this and I'm usually
>>>>>>>>>>>>>>>>>> using my "marvell" one for this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And fwiw, there's a few other cases like this.  If it's too confusing,
>>>>>>>>>>>>>> maybe we should just roll out a few more repositories, I think it's
>>>>>>>>>>>>>> easier to do that now than pre-gitlab?
>>>>>>>>>>>>>>>>>>> and why did this end up already in master?
>>>>>>>>>>>>>>>>>>> Isn't that material for the next merge window? After all this changes
>>>>>>>>>>>>>>>>>>> quite a bit, for a lot of boards, and I did not have a closer look at
>>>>>>>>>>>>>>>>>>> the sunxi parts yet.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I was hesitating also a bit. But since this patchset is on the list in
>>>>>>>>>>>>>>>>>> v1 since over 2 months now (2021-08-21) I thought it was "ready" for
>>>>>>>>>>>>>>>>>> inclusion now. We are at -rc1 and I think we still have enough time to
>>>>>>>>>>>>>>>>>> fix any resulting problems in this release cycle.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Why do we have the merge window then? This is clearly not a regression or
>>>>>>>>>>>>>>>> general fix.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> AFAIU, we are a bit less strict here in U-Boot. Patches that were posted
>>>>>>>>>>>>>>> before the merge-window and skipped the review process (most likely
>>>>>>>>>>>>>>> because of lack of time) are often still integrated in the early rcX
>>>>>>>>>>>>>>> cycles. At least this is how I handle it usually.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Tom, is my understanding here correct?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes.  We are not as strict as the kernel is about what can come in
>>>>>>>>>>>>>> between rc1 and rc2 (and to a certain degree, post rc2).  I leave things
>>>>>>>>>>>>>> up to the discretion of the custodians.  People tend of have less time
>>>>>>>>>>>>>> to handle U-Boot changes than other stuff, so I try and be flexible in
>>>>>>>>>>>>>> picking things up.
>>>>>>>>>>>>>>>>> Yes I agree, that should be plenty of time for people to review it.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Well, if there would be people to review the sunxi parts :-(
>>>>>>>>>>>>>>>> I am totally fine with the generic patches (as they have been reviewed),
>>>>>>>>>>>>>>>> but the sunxi integration is somewhat risky.
>>>>>>>>>>>>>>>> I was explicitly deprioritising that in my queue, as it really doesn't
>>>>>>>>>>>>>>>> change, add or fix anything, it's mere refactoring, from the user's point
>>>>>>>>>>>>>>>> of view.
>>>>>>>>>>>>>>>>>> Do you see any specific issues?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Patch 6/6 changes the config for all 157 Allwinner boards, so I think that
>>>>>>>>>>>>>>>> deserves at least some testing, *before* merging it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I expect that Samuel did some testing. But still, I agree that it
>>>>>>>>>>>>>>> would be much better, if these patches - especially the Allwinner parts
>>>>>>>>>>>>>>> got more extensive testing.
>>>>>>>>>>>>>>>> I will do as much testing now as possible, but I am not happy about that
>>>>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Understood. Should we revert patch 6/6 for now?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> FWIW, given Samuel has been doing a number of allwinner changes, I had
>>>>>>>>>>>>>> also assumed it was sufficiently tested, which is why I didn't raise a
>>>>>>>>>>>>>> further concern when I saw the widespread nature of the overall changes,
>>>>>>>>>>>>>> just figured it was a few more ready-to-go cleanups that weren't quite
>>>>>>>>>>>>>> picked up in time.  Please do speak up if you want me to revert the last
>>>>>>>>>>>>>> part.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also it is often true that people find problems by testing on master
>>>>>>>>>>>>> so applying it helps to shake the tree a bit.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Simon
>>>>>>>>>>>>
>>>>>>>>>>>> We don't actually have a problem with this series but with a previous
>>>>>>>>>>>> watchdog patch. The culprit according to bisecting is:
>>>>>>>>>>>>
>>>>>>>>>>>> b147bd3607f8 ("sunxi: Enable watchdog timer support by default")
>>>>>>>>>>>>
>>>>>>>>>>>> When booting the OrangePi PC the watchdog triggers while Linux is booting,
>>>>>>>>>>>> ca. 16 s after leaving the UEFI subsystem. This matches WDT_MAX_TIMEOUT in
>>>>>>>>>>>> drivers/watchdog/sunxi_wdt.c.
>>>>>>>>>>>>
>>>>>>>>>>>> If I run
>>>>>>>>>>>> => wdt dev watchdog at 1c20ca0
>>>>>>>>>>>> => wdt stop
>>>>>>>>>>>>
>>>>>>>>>>>> before the bootefi command booting succeeds.
>>>>>>>>>>>>
>>>>>>>>>>>> We don't disarm the watchdog and Linux does not do it for us in time.
>>>>>>>>>>>>
>>>>>>>>>>>> The UEFI specification requires that the default watchdog reset time is 300
>>>>>>>>>>>> s. We should never arm the Sunxi hardware watchdog except within the
>>>>>>>>>>>> watchdog reset driver.
>>>>>>>>>>>>
>>>>>>>>>>>> The solution is to disable CONFIG_WATCHDOG_AUTOSTART on SUNXI. See
>>>>>>>>>>>>
>>>>>>>>>>>> [PATCH 1/1] watchdog: don't autostart watchdog on Sunxi boards
>>>>>>>>>>>> https://lists.denx.de/pipermail/u-boot/2021-November/466318.html
>>>>>>>>>>>
>>>>>>>>>>> This means we never did come up with a satisfactory to everyone solution
>>>>>>>>>>> to what UEFI thinks a watchdog should do, and what other types of
>>>>>>>>>>> deployment think a watchdog should do, yes?
>>>>>>>>>>
>>>>>>>>>> Dear Tom,
>>>>>>>>>>
>>>>>>>>>> The issue is *not* UEFI specific.
>>>>>>>>>>
>>>>>>>>>> A watchdog timeout of 16 seconds is too short for Linux to boot no matter
>>>>>>>>>> whether you use the EFI stub or the legacy entry point.
>>>>>>>>>>
>>>>>>>>>> I only referred to the UEFI specification as it indicates what can be
>>>>>>>>>> considered as a reasonable timeout interval: 300 seconds.
>>>>>>>>>
>>>>>>>>> 16 seconds from the last time we pet the watchdog in U-Boot to the
>>>>>>>>> kernel being able to take over is quite reasonable.
>>>>>>>>
>>>>>>>> How do we know that the kernel takes over? What if the kernel/EFI
>>>>>>>> payload doesn't have a watchdog driver? I was assuming that the
>>>>>>>> watchdog would be disabled as soon as we boot a kernel or an EFI app
>>>>>>>> calls ExitBootServices (maybe even earlier).
>>>>>>>> But this sounds like a generic problem, not sunxi specific. So how do
>>>>>>>> other platforms solve this?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Andre
>>>>>>>
>>>>>>> The UEFI specification has this requirement in chapter "3.1.2 Load Option
>>>>>>> Processing":
>>>>>>>
>>>>>>> "... the boot manager must enable the watchdog timer for 5 minutes by using
>>>>>>> the EFI_BOOT_SERVICES.SetWatchdogTimer() boot service prior to calling
>>>>>>> EFI_BOOT_SERVICES.StartImage(). If a boot option returns control to the boot
>>>>>>> manager, the boot manager must disable the watchdog timer with an additional
>>>>>>> call to the SetWatchdogTimer() boot service."
>>>>>>>
>>>>>>> This means that having an armed watchdog when starting the kernel is
>>>>>>> correct.
>>>>>>>
>>>>>>> If you start a watchdog in the firmware which is not disabled or reset by
>>>>>>> the operating system, you are out of luck and won't be able to boot.
>>>>>>>
>>>>>>> Current Linux has driver drivers/watchdog/sunxi_wdt.c compatible to
>>>>>>> "allwinner,sun4i-a10-wdt","allwinner,sun6i-a31-wdt" and enabled by
>>>>>>> CONFIG_SUNXI_WATCHDOG. This driver was introduced in Linux v3.12. It
>>>>>>> originally had compatible "allwinner,sun4i-wdt" only.
>>>>>>>
>>>>>>> Debian Bullseye has the driver enabled as a module. In the bootlog of the
>>>>>>> Orange Pi PC I find:
>>>>>>> [   12.321909] sunxi-wdt 1c20ca0.watchdog: Watchdog enabled (timeout=16 sec,
>>>>>>> nowayout=0)
>>>>>>> This message appears approximately *20 seconds* after the EFI stub hands
>>>>>>> over to the main kernel. Adding the driver to initrd shortens this to *18
>>>>>>> seconds*. The message occurs after file system checks which can be a lengthy
>>>>>>> operation. In Debian systemd manages the watchdog.
>>>>>>>
>>>>>>> As I said: 16 seconds is way too short for a hardware watchdog timeout.
>>>>>>
>>>>>> What's the time if you build it in?
>>>>>>   
>>>>>
>>>>> For sure you will find some board and configuration that is faster.
>>>>>
>>>>> But why should I care? This series breaks booting Debian on my board. So
>>>>> it needs to be fixed. So, please, apply my patch that is doing so.
>>>>
>>>> Five minutes sounds completely unacceptable for embedded platforms.
>>>> The user will surely have packaged the item up and will be just
>>>> heading out to drop it off for return...
>>
>> I have no problem with people switching on the SUNXI hardware watchdog
>> for their specific embedded solution. But here it was switched on for
>> all SUNXI boards and breaks booting into Linux distributions.
> 
> I agree here, and will take Heinrich's patch to keep it disabled on
> sunxi, avoiding the regression.
>   
>> If the Linux distribution resets the watchdog *after* file system checks
>> because it is managed by systemd (as is true for Debian), 5 minutes is
>> realistic.
> 
> I am still puzzled about this, if I read the UEFI spec correctly, the
> 5 minutes watchdog timer is for EFI applications using boot services?
> So grub, for instance. But the description of ExitBootServices tells me
> that the: "boot services watchdog timer is disabled"?
> So it should not affect Linux booting (after the EFI stub is done)?

Currently we only disable the software watchdog (efi_tpl = 
TPL_HIGH_LEVEL;) We should call wdt_stop_all() too. I will create a 
patch for that.

Best regards

Heinrich

> 
> Cheers,
> Andre
> 
>>>
>>> I'm trying to avoid bringing up the long discussion from the previous
>>> thread about this :)
>>>    
>>>> Do we need to add a special case for UEFI here? E.g. bootefi could use
>>>> a hook to lengthen the watchdog?
>>
>> The problem is not UEFI related.
>>
>>>
>>> Well, the problem is that the hardware watchdog has a maximum period of
>>> 16 seconds, I believe.
>>>    
>>
>> Yes, Sunxi is limited to 16 seconds.
>>
>> Best regards
>>
>> Heinrich
> 


More information about the U-Boot mailing list