[PATCH 0/7] misc: introduce STATUS LED activity function

Thu Jun 6 15:32:14 CEST 2024

Hi Christian,

On 6/6/24 1:52 PM, Christian Marangi wrote:
> On Thu, Jun 06, 2024 at 12:55:37PM +0200, Quentin Schulz wrote:
>> Hi Christian,
>>
>> On 6/6/24 11:52 AM, Christian Marangi wrote:
>>> On Thu, Jun 06, 2024 at 11:12:11AM +0200, Quentin Schulz wrote:
>>>> Hi Christian,
>>>>
>>>> On 6/5/24 9:21 PM, Christian Marangi wrote:
>>>>> This series expand the STATUS LED framework with a new color
>>>>> and a big new feature. One thing that many device need is a way
>>>>> to communicate to the user that the device is actually doing
>>>>> something.
>>>>>
>>>>> This is especially useful for recovery steps where an
>>>>> user (for example) insert an USB drive, keep a button pressed
>>>>> and the device autorecover.
>>>>>
>>>>> There is currently no way to signal the user externally that
>>>>> the bootloader is processing/recoverying aside from setting
>>>>> a LED on.
>>>>>
>>>>> A solid LED on is not enough and won't actually signal any
>>>>> kind of progress.
>>>>> Solution is the good old blinking LED but uboot doesn't
>>>>> suggest (and support) interrupts and almost all the LED
>>>>> are usually GPIO LED that doesn't support HW blink.
>>>>>
>>>>
>>>> I haven't used it yet but we do have a cyclic framework now for things
>>>> happening in the background. I think this is a good use-case for this
>>>> framework? Something would set the blinking frequency (could be from CLI
>>>> directly, or as part of board files, or architecture, etc...) and the LED
>>>> would just blink then. This would allow to highlight stages in the boot
>>>> process, though that is not like an activity LED so if you're stuck in a
>>>> stage, you wouldn't know if something is still happening or if you're really
>>>> stuck (e.g. no packet on TFTP or TFTP very slow). The benefit is that it
>>>> would be way less intrusive than patching all commands that could make use
>>>> of that LED. Right now, this only adds support to MTD, SPI and TFTP, but
>>>> what about MMC, NVMe, USB, other net stuff (e.g. wget), etc...
>>>>
>>>
>>> Can you hint me on where is this framework? Checking the tftp code i
>>> couldn't find extra call to it. Maybe it's attached to the schedule()
>>> function?
>>>
>>
>> https://docs.u-boot.org/en/latest/develop/cyclic.html
>>
> 
> Thanks looks very interesting and looks handy to make use of the
> watchdog for it. I will try now to rework the implementation for the sw
> blink to make use of cyclic thing.
> 
>>> Also notice that it's really not a one setting since almost all device
>>> have GPIO LEDs and doesn't have a way to support HW Blink so the
>>> "activity" function needs to be called multiple time to increase the
>>> counter and toggle the LED.
>>>
>>
>> Cyclic callback would be called twice per expected blink period, where you
>> would toggle the GPIO (essentially making it 50% duty cycle, but could be
>> more fine-grained if you want a different duty cycle).
>>
> 
> Well status LED already have CONFIG_STATUS_LED_FREQ where you can set a
> value. I will just use this.
> 

This actually only appears in the rST doc, nothing actually makes use of 
this right now, so it's not something we **need** to use.

What I meant is, if you only provide a frequency, a specific, hardcoded, 
pattern is expected. E.g. for 1KHz, you enable the LED for 0.5ms and 
disable it for 0.5ms (or 1ms and 1ms, depending on how you see 
LED_STATUS_FREQ working). Could be 0.2ms and 0.8ms but it would always 
be this. How do you differentiate between "something is happening on 
NAND" and "TFTP is being used" if you don't have the ability to change 
the duty cycle? Or are you expecting people to have multiple LED of 
different colors for that?

>>> Also this have the extra feature that you can check if something gets
>>> stuck in the process. The lifecycle is:
>>> - Turn on the ACTIVITY LED at the start of the thing
>>> - Blink once in a while (for very little task this might not happen)
>>> - Turn off the ACTIVITY LED at the end of the thing
>>>
>>> Soo if something goes wrong the LED would never turn OFF but would stay
>>> solid ON.
>>>
>>
>> Yes, that's something that wouldn't be covered by cyclic framework here. It
>> all depends what you want to do, if it's an activity LED, then we need to
>> hook ourselves deep into frameworks where stuff is actually happening. If
>> it's just to specify which stage of the boot we reached, then cyclic would
>> be good enough probably (register for stage 1, unregister stage1+register
>> for stage2 for different frequency, etc...).
>>
> 
> The cyclic framework can reduce the implementation to just START and
> STOP. We would lose the ability to know if there is an actual progress
> or not tho... So maybe that is bad but honestly a TFTP transfer can be
> tracked by the other machine and MTD write/erase won't magically stop
> and get stalled... (and even with that they will timeout and the status
> LED stop will be called anyway)
> 
> So a dumb blinking with the watchdog is O.K. This is really a simple
> thing to show that something is happening (use case of recovering the
> device without actually using serial)
> 

Then it's not so much an activity LED anymore, rather a "i'm still alive 
and doing X thing right now, but maybe I'm stuck who knows", e.g. a 
little bit like a glorified heartbeat (I'm not saying it's bad, it's 
just a different use case :) ).

>>> More than happy to rework this to a less intrusive implementation.
>>>
>>> Maybe this can be generalized to some generic API like task_start(),
>>> task_processing() and task_end()? Might make more sense than having to
>>> add specific LED function to each function?
>>>
>>
>> This also likely would introduce a hit in performance if we need to toggle
>> the GPIO in the same thread that we do TFTP/storage medium reading/writing?
>> I assume we could still adapt cyclic to make it spawn a one time event
>> instead of looping (e.g. by unregistering itself at the end of its own
>> callback?).
> 
> But the penality is that bad? Unless you have the crazy idea of an
> absurt low value for the freq, it would be triggered once every 200
> iteration. If you are transfering MB of data you are probably on x86 or
> high end device where a GPIO bit set doesn't really affect anything.
> 

The kernel on Aarch64 is usually a few MB, sometimes a few tens of MB, 
then you may have an initramfs which is also in the tens of MB.

People are trying to get U-Boot to boot into Linux ASAP, so maybe 
they'll be bothered by this. If they are, they can always disable the 
support for this LED status feature or improve it, so I wouldn't be too 
worried about it for now since it won't be on by default anyway?

> My current idea is that start() will register a cyclic and LED will
> blink with watchdog. Stop will deregister it.
> 

Sounds good :)

>>
>>> (AFAIK linux kernel have something similar used for all the trace
>>> framework so having something in uboot to trace these kind of operation
>>> might be interesting)
>>>
>>
>> Indeed, that's what's being done with ledtrig_.* functions, they are however
>> scheduled on a workqueue and called from the subsystem directly.
>>
>> I'm a bit confused also as to why we control the LED blinking from the cmd/
>> ? E.g. for cmd/mtd.c I would assume that the changes made to the mtd
>> subsystem should be enough to handle those? Similarly, since UBI is for use
>> over NAND/MTD, shouldn't that already be handled by the MTD subsystem, and
>> if not, why not in the UBI subsystem instead of the CMD_UBI? One of the
>> issues is that you may not necessarily go through the cmd/ to do stuff with
>> storage medium or network (e.g. directly from board files).
> 
> It's a mix of cmd and subsystem cause you need to turn the LED on. Use
> the activity function to increase the conunter and toggle the LED at
> actual activity (example a mtd write command, a single block received from
> TFTP)
> 
> If the cyclic thing works as I think MAYBE we can just add this to the
> cmd part without having to disturb all the subsystem. And that is a much
> cleaner approach.
> 

Why isn't the subsystem turning the LED on when writing to the counter, 
why does it have to go through the CMD?

If I'm not mistaken, I have my disk-activity LED in Linux working just 
fine without having to rely on a userspace tool to activate it for me? 
Maybe I missed something there though :)

>>
>> For the net protocols, why not hook this to the net_[sg]et_*_handler for
>> example so it's protocol-agnostic? No clue how difficult this would be, or
>> if you'd rather have something like per-protocol activity?
> 
> Attaching to the net handler would provide a way too generic activity
> thing and would result in a dump NIC LED. We want to show activity when
> we actually want to signal stuff not when the device is executin ASM
> instructions ahahah
> 

How does one define "when we actually want to signal stuff"? The further 
away you're from the subsystem, the more consumers we're going to have 
to patch to add this logic.

> So yes I feel adding this way in way too generic subsystem might
> deviates from the idea of this. Even attaching to the write and erase
> function was a bit problematic and too generic.
> 
>>
>> Finally, maybe we also want to have a Kconfig symbol per "type" of activity
>> to control what should be "monitored", and I would also suggest if we go
>> this route to have a Kconfig symbol for the frequency per "type" of activity
>> as well, so that one can know which activity is happening right now.
>>
> 
> Yes that might be good but I have some fear we might explode with
> Kconfig. I think it's better to first have a solid idea on how this
> works and then we can think of the configuration part.
> 

Oh, we've seen much worse with the VPL_/TPL_/SPL_// symbols :) Can come 
later though if someone has a usecase for it.

Cheers,
Quentin