imx8 regression: cyclic_register for watchdog at 30280000 failed

Stefan Roese sr at denx.de
Fri Oct 28 07:05:38 CEST 2022


Tim,

On 27.10.22 17:34, Tim Harvey wrote:
> On Wed, Oct 26, 2022 at 11:32 PM Stefan Roese <sr at denx.de> wrote:
>>
>> Tim,
>>
>> On 26.10.22 21:06, Tim Harvey wrote:
>>> On Wed, Oct 26, 2022 at 5:33 AM Rasmus Villemoes
>>> <rasmus.villemoes at prevas.dk> wrote:
>>>>
>>>> On 25/10/2022 18.32, Tim Harvey wrote:
>>>>> Greetings,
>>>>>
>>>>> I've noticed a regression since the merge of the cyclic framework use
>>>>> for watchdog on my imx8m boards:
>>>>>
>>>>> cyclic_register for watchdog at 30280000 failed
>>>>> WDT:   Failed to start watchdog at 30280000
>>>>>
>>>>> A bisect lead me to the following 3 sequential patches:
>>>>> 29caf9305b6f cyclic: Use schedule() instead of WATCHDOG_RESET()
>>>>> ^^^ bad
>>>>> 881d4108257a cyclic: Introduce schedule() function
>>>>> ^^^ unbuildable
>>>>> c2fd0ca1a822 watchdog: Integrate watchdog triggering into the cyclic framework
>>>>> ^^^ unbootable
>>>>
>>>> Can you provide some more details on "unbootable" and "unbuildable"?
>>>> I.e., what build error do you get for the middle patch, and what do you
>>>> see on the console with the "unbootable" image?
>>>
>>> Rasmus,
>>>
>>> Sure. I'm building and testing for imx8mm_venice.
>>>
>>> Here are the patches in question listed in commit order:
>>> d219fc06b30d configs: Resync with savedefconfig
>>> ^^^ no issues found
>>>
>>> c2fd0ca1a822 watchdog: Integrate watchdog triggering into the cyclic framework
>>> ^^^ unbootable for imx8mm_venice - hangs after SPL banner:
>>> U-Boot SPL 2022.10-rc3-00241-gc2fd0ca1a822 (Oct 26 2022 - 10:08:54 -0700)
>>> ^^^ this occurs because 'uclass_get_device_by_driver(UCLASS_MISC,
>>> DM_DRIVER_GET(gsc), &dev))' in gateworks/venice/spl.c board_init_f
>>> hangs and I'm not clear why this patch affects that
>>>
>>> 881d4108257a cyclic: Introduce schedule() function
>>> ^^^ unbuildable for imx8mm_venice_defconfig
>>>    CC      arch/arm/lib/sections.o
>>> In file included from include/asm-generic/global_data.h:23,
>>>                    from ./arch/arm/include/asm/global_data.h:102,
>>>                    from include/dm/of.h:11,
>>>                    from include/dm/ofnode.h:12,
>>>                    from include/dm/device.h:13,
>>>                    from include/linux/mtd/mtd.h:26,
>>>                    from include/nand.h:17,
>>>                    from cmd/bootm.c:17:
>>> include/cyclic.h:116:19: error: macro "schedule" passed 1 arguments, but takes j
>>> ust 0
>>>    void schedule(void);
>>>
>>> 29caf9305b6f cyclic: Use schedule() instead of WATCHDOG_RESET()
>>> ^^^ build issue resolved, boot issue on imx8mm-venice resolved, but
>>> this is where we now encounter watchdog failures in both the SPL and
>>> U-Boot:
>>>
>>> SPL:
>>> cyclic_register for watchdog at 30280000 failed
>>> WDT:   Failed to start watchdog at 30280000
>>>
>>> The failure here is 'Cyclic IF not ready yet' (which should probably
>>> be an error print not a debug print).
>>
>> Ye, makes sense. I'll change this.
>>
>>> In this case the watchdog gets
>>> started but never reset via cyclic. This is due to cyclic_init never
>>> being called in the SPL - where is that supposed to occur?
>>
>> I did not have many targets with WDT in SPL to test with. Seems that
>> I missed to call into cyclic_init() here.
>>
>>> U-Boot:
>>> cyclic function watchdog at 30280000 took too long: 2976us vs 1000us max, disabling
>>>
>>> Here also the watchdog gets started but never reset via cyclic so the
>>> board will reset itself after 60s sitting in U-Boot for example.
>>
>> This is already changed in master since yesterday. Now only a warning
>> will be shown upon long execution time but the function will not
>> be disabled. So please re-check with master and report if this
>> works better.
> 
> confirmed - the cyclic call no longer gets cancelled and its now just a warning

Good. We should improve this, that this warning will not be displayed,
at least in the WDT event, at some point.

>>> The issue here is that for some reason the first call to wdt_cyclic()
>>> takes about 3ms vs subsequent calls taking about 185us on this
>>> platform which exceeds the 1ms default max of
>>> CONFIG_CYCLIC_MAX_CPU_TIME_US and thus the watchdog reset is disabled
>>> and the board resets in 60s. Setting
>>> CONFIG_CYCLIC_MAX_CPU_TIME_US=5000 resolves that issue but this feels
>>> like a workaround that perhaps shouldn't be required.
>>
>> I agree.
> 
> Sounds like Rasmus is going to spin a patch for this but at least now
> it's just a warning message.

Yes. Let's wait for Rasmus'es work on this.

>>> I assume the
>>> extra time in the first call is the probing of the device. So I think
>>> the fix for this U-Boot regression is to move the init part of
>>> wdt_cyclic to wdt_start() and have wdt_cyclic() only call wdt_reset()
>>>
>>> Fabio, I assume you see this on other imx8m boards?
>>>
>>>> Also, the .configs used in each case might be interesting, certainly all
>>>> lines containing CYCLIC, WATCHDOG or WDT.
>>>>
>>>
>>> $ grep CYCLIC .config
>>> CONFIG_CYCLIC=y
>>> CONFIG_CYCLIC_MAX_CPU_TIME_US=1000
>>> # CONFIG_CMD_MX_CYCLIC is not set
>>> CONFIG_CMD_CYCLIC=y
>>> $ grep WATCHDOG .config
>>> CONFIG_SPL_WATCHDOG=y
>>> CONFIG_SYSRESET_WATCHDOG=y
>>> # CONFIG_SYSRESET_WATCHDOG_AUTO is not set
>>> CONFIG_WATCHDOG=y
>>> CONFIG_WATCHDOG_AUTOSTART=y
>>> CONFIG_WATCHDOG_TIMEOUT_MSECS=60000
>>> CONFIG_IMX_WATCHDOG=y
>>> # CONFIG_WATCHDOG_RESET_DISABLE is not set
>>> # CONFIG_ULP_WATCHDOG is not set
>>> # CONFIG_DESIGNWARE_WATCHDOG is not set
>>> # CONFIG_XILINX_TB_WATCHDOG is not set
>>> $ grep WDT .config
>>> # CONFIG_CMD_WDT is not set
>>> CONFIG_WDT=y
>>> # CONFIG_WDT_APPLE is not set
>>> # CONFIG_WDT_ASPEED is not set
>>> # CONFIG_WDT_AST2600 is not set
>>> # CONFIG_WDT_AT91 is not set
>>> # CONFIG_WDT_CDNS is not set
>>> # CONFIG_WDT_CORTINA is not set
>>> # CONFIG_WDT_GPIO is not set
>>> # CONFIG_WDT_MAX6370 is not set
>>> # CONFIG_WDT_MESON_GXBB is not set
>>> # CONFIG_WDT_ORION is not set
>>> # CONFIG_WDT_SBSA is not set
>>> # CONFIG_WDT_SP805 is not set
>>> # CONFIG_WDT_STM32MP is not set
>>> CONFIG_SPL_WDT=y
>>>
>>>> One thing I notice is that I think wdt_stop should unregister the cyclic
>>>> function; there's really no point having the cyclic framework keep
>>>> calling wdt_cyclic() only to bail out at "if (!priv->running)" -
>>>> moreover, it's almost certainly a bug if the device is started again via
>>>> another wdt_start(), because then we have two cyclic instances
>>>> registered for the same device.
>>>>
>>>> I also think the cyclic API is somewhat misdesigned. The storage for the
>>>> cyclic metadata should be provided by the caller (e.g. in the watchdog
>>>> case just embedded as part of struct wdt_priv), so that
>>>> cyclic_register() can never fail. Otherwise we have this awkward
>>>> situation where ops->start() have already been called and succeeded, but
>>>> then perhaps we fail to register the cyclic instance; should we then do
>>>> wdt_stop(), and what if _that_ then fails?
>>>>
>>>> This also allows the callback to be a little more type-safe; let the
>>>> callback take a "struct cyclic_info *" as argument, and then the
>>>> callback can use e.g. container_of to get the containing wdt_priv.
>>>>
>>>
>>> I really didn't follow the various submissions for the cyclic
>>> framework so I don't have any input on that end.
>>
>> We were aware that this cyclic IF and especially the WDT move to
>> cyclic might produce some breakages. But it was impossible, at least
>> for me, to foresee all potential issue. So it was merged very early
>> in this release cycle, so that we have time to fix all problems.
> 
> understood! I haven't done a lot of testing until now because the imx
> changes get left until the end of the merge window. I would love to
> see that change in the next release.

We will definitely fix this cyclic WDT integration in this release
cycle.

>> I've a small patch that might solve the SPL problem you are seeing.
>> The U-Boot proper issue should be fixed in master already. Please
>> give this attached patch a try and let me know, if it works.
>>
> 
> Yes, this does resolve the SPL issue.

Good. Thanks for testing.

Thanks,
Stefan


More information about the U-Boot mailing list