imx8 regression: cyclic_register for watchdog at 30280000 failed
Tim Harvey
tharvey at gateworks.com
Thu Oct 27 17:34:21 CEST 2022
On Wed, Oct 26, 2022 at 11:32 PM Stefan Roese <sr at denx.de> wrote:
>
> Tim,
>
> On 26.10.22 21:06, Tim Harvey wrote:
> > On Wed, Oct 26, 2022 at 5:33 AM Rasmus Villemoes
> > <rasmus.villemoes at prevas.dk> wrote:
> >>
> >> On 25/10/2022 18.32, Tim Harvey wrote:
> >>> Greetings,
> >>>
> >>> I've noticed a regression since the merge of the cyclic framework use
> >>> for watchdog on my imx8m boards:
> >>>
> >>> cyclic_register for watchdog at 30280000 failed
> >>> WDT: Failed to start watchdog at 30280000
> >>>
> >>> A bisect lead me to the following 3 sequential patches:
> >>> 29caf9305b6f cyclic: Use schedule() instead of WATCHDOG_RESET()
> >>> ^^^ bad
> >>> 881d4108257a cyclic: Introduce schedule() function
> >>> ^^^ unbuildable
> >>> c2fd0ca1a822 watchdog: Integrate watchdog triggering into the cyclic framework
> >>> ^^^ unbootable
> >>
> >> Can you provide some more details on "unbootable" and "unbuildable"?
> >> I.e., what build error do you get for the middle patch, and what do you
> >> see on the console with the "unbootable" image?
> >
> > Rasmus,
> >
> > Sure. I'm building and testing for imx8mm_venice.
> >
> > Here are the patches in question listed in commit order:
> > d219fc06b30d configs: Resync with savedefconfig
> > ^^^ no issues found
> >
> > c2fd0ca1a822 watchdog: Integrate watchdog triggering into the cyclic framework
> > ^^^ unbootable for imx8mm_venice - hangs after SPL banner:
> > U-Boot SPL 2022.10-rc3-00241-gc2fd0ca1a822 (Oct 26 2022 - 10:08:54 -0700)
> > ^^^ this occurs because 'uclass_get_device_by_driver(UCLASS_MISC,
> > DM_DRIVER_GET(gsc), &dev))' in gateworks/venice/spl.c board_init_f
> > hangs and I'm not clear why this patch affects that
> >
> > 881d4108257a cyclic: Introduce schedule() function
> > ^^^ unbuildable for imx8mm_venice_defconfig
> > CC arch/arm/lib/sections.o
> > In file included from include/asm-generic/global_data.h:23,
> > from ./arch/arm/include/asm/global_data.h:102,
> > from include/dm/of.h:11,
> > from include/dm/ofnode.h:12,
> > from include/dm/device.h:13,
> > from include/linux/mtd/mtd.h:26,
> > from include/nand.h:17,
> > from cmd/bootm.c:17:
> > include/cyclic.h:116:19: error: macro "schedule" passed 1 arguments, but takes j
> > ust 0
> > void schedule(void);
> >
> > 29caf9305b6f cyclic: Use schedule() instead of WATCHDOG_RESET()
> > ^^^ build issue resolved, boot issue on imx8mm-venice resolved, but
> > this is where we now encounter watchdog failures in both the SPL and
> > U-Boot:
> >
> > SPL:
> > cyclic_register for watchdog at 30280000 failed
> > WDT: Failed to start watchdog at 30280000
> >
> > The failure here is 'Cyclic IF not ready yet' (which should probably
> > be an error print not a debug print).
>
> Ye, makes sense. I'll change this.
>
> > In this case the watchdog gets
> > started but never reset via cyclic. This is due to cyclic_init never
> > being called in the SPL - where is that supposed to occur?
>
> I did not have many targets with WDT in SPL to test with. Seems that
> I missed to call into cyclic_init() here.
>
> > U-Boot:
> > cyclic function watchdog at 30280000 took too long: 2976us vs 1000us max, disabling
> >
> > Here also the watchdog gets started but never reset via cyclic so the
> > board will reset itself after 60s sitting in U-Boot for example.
>
> This is already changed in master since yesterday. Now only a warning
> will be shown upon long execution time but the function will not
> be disabled. So please re-check with master and report if this
> works better.
confirmed - the cyclic call no longer gets cancelled and its now just a warning
>
> > The issue here is that for some reason the first call to wdt_cyclic()
> > takes about 3ms vs subsequent calls taking about 185us on this
> > platform which exceeds the 1ms default max of
> > CONFIG_CYCLIC_MAX_CPU_TIME_US and thus the watchdog reset is disabled
> > and the board resets in 60s. Setting
> > CONFIG_CYCLIC_MAX_CPU_TIME_US=5000 resolves that issue but this feels
> > like a workaround that perhaps shouldn't be required.
>
> I agree.
Sounds like Rasmus is going to spin a patch for this but at least now
it's just a warning message.
>
> > I assume the
> > extra time in the first call is the probing of the device. So I think
> > the fix for this U-Boot regression is to move the init part of
> > wdt_cyclic to wdt_start() and have wdt_cyclic() only call wdt_reset()
> >
> > Fabio, I assume you see this on other imx8m boards?
> >
> >> Also, the .configs used in each case might be interesting, certainly all
> >> lines containing CYCLIC, WATCHDOG or WDT.
> >>
> >
> > $ grep CYCLIC .config
> > CONFIG_CYCLIC=y
> > CONFIG_CYCLIC_MAX_CPU_TIME_US=1000
> > # CONFIG_CMD_MX_CYCLIC is not set
> > CONFIG_CMD_CYCLIC=y
> > $ grep WATCHDOG .config
> > CONFIG_SPL_WATCHDOG=y
> > CONFIG_SYSRESET_WATCHDOG=y
> > # CONFIG_SYSRESET_WATCHDOG_AUTO is not set
> > CONFIG_WATCHDOG=y
> > CONFIG_WATCHDOG_AUTOSTART=y
> > CONFIG_WATCHDOG_TIMEOUT_MSECS=60000
> > CONFIG_IMX_WATCHDOG=y
> > # CONFIG_WATCHDOG_RESET_DISABLE is not set
> > # CONFIG_ULP_WATCHDOG is not set
> > # CONFIG_DESIGNWARE_WATCHDOG is not set
> > # CONFIG_XILINX_TB_WATCHDOG is not set
> > $ grep WDT .config
> > # CONFIG_CMD_WDT is not set
> > CONFIG_WDT=y
> > # CONFIG_WDT_APPLE is not set
> > # CONFIG_WDT_ASPEED is not set
> > # CONFIG_WDT_AST2600 is not set
> > # CONFIG_WDT_AT91 is not set
> > # CONFIG_WDT_CDNS is not set
> > # CONFIG_WDT_CORTINA is not set
> > # CONFIG_WDT_GPIO is not set
> > # CONFIG_WDT_MAX6370 is not set
> > # CONFIG_WDT_MESON_GXBB is not set
> > # CONFIG_WDT_ORION is not set
> > # CONFIG_WDT_SBSA is not set
> > # CONFIG_WDT_SP805 is not set
> > # CONFIG_WDT_STM32MP is not set
> > CONFIG_SPL_WDT=y
> >
> >> One thing I notice is that I think wdt_stop should unregister the cyclic
> >> function; there's really no point having the cyclic framework keep
> >> calling wdt_cyclic() only to bail out at "if (!priv->running)" -
> >> moreover, it's almost certainly a bug if the device is started again via
> >> another wdt_start(), because then we have two cyclic instances
> >> registered for the same device.
> >>
> >> I also think the cyclic API is somewhat misdesigned. The storage for the
> >> cyclic metadata should be provided by the caller (e.g. in the watchdog
> >> case just embedded as part of struct wdt_priv), so that
> >> cyclic_register() can never fail. Otherwise we have this awkward
> >> situation where ops->start() have already been called and succeeded, but
> >> then perhaps we fail to register the cyclic instance; should we then do
> >> wdt_stop(), and what if _that_ then fails?
> >>
> >> This also allows the callback to be a little more type-safe; let the
> >> callback take a "struct cyclic_info *" as argument, and then the
> >> callback can use e.g. container_of to get the containing wdt_priv.
> >>
> >
> > I really didn't follow the various submissions for the cyclic
> > framework so I don't have any input on that end.
>
> We were aware that this cyclic IF and especially the WDT move to
> cyclic might produce some breakages. But it was impossible, at least
> for me, to foresee all potential issue. So it was merged very early
> in this release cycle, so that we have time to fix all problems.
understood! I haven't done a lot of testing until now because the imx
changes get left until the end of the merge window. I would love to
see that change in the next release.
>
> I've a small patch that might solve the SPL problem you are seeing.
> The U-Boot proper issue should be fixed in master already. Please
> give this attached patch a try and let me know, if it works.
>
Yes, this does resolve the SPL issue.
Tim
More information about the U-Boot
mailing list