[u-boot-test-hooks PATCH v4 3/3] Provide some basic scripts for Labgrid integration

Sun Sep 1 22:09:47 CEST 2024

Hi Tom,

On Fri, 30 Aug 2024 at 09:13, Tom Rini <trini at konsulko.com> wrote:
>
> On Thu, Aug 29, 2024 at 07:09:22PM -0600, Simon Glass wrote:
> > Hi Tom,
> >
> > On Thu, 29 Aug 2024 at 09:20, Tom Rini <trini at konsulko.com> wrote:
> > >
> > > On Thu, Aug 29, 2024 at 09:02:38AM -0600, Simon Glass wrote:
> > > > Hi Neil,
> > > >
> > > > On Thu, 29 Aug 2024 at 08:44, <neil.armstrong at linaro.org> wrote:
> > > > >
> > > > > On 29/08/2024 14:17, Simon Glass wrote:
> > > > > > Hi Peter,
> > > > > >
> > > > > > On Thu, 29 Aug 2024 at 04:43, Peter Robinson <pbrobinson at gmail.com> wrote:
> > > > > >>
> > > > > >> On Wed, 28 Aug 2024 at 22:25, Simon Glass <sjg at chromium.org> wrote:
> > > > > >>>
> > > > > >>> Hi Peter,
> > > > > >>>
> > > > > >>> On Wed, 28 Aug 2024 at 12:14, Peter Robinson <pbrobinson at gmail.com> wrote:
> > > > > >>>>
> > > > > >>>> Hi Simon,
> > > > > >>>>
> > > > > >>>>> With Labgrid we don't need to specify the various methods, except for
> > > > > >>>>> the console, which simply calls labgrid-client.
> > > > > >>>>>
> > > > > >>>>> This allows supporting any boards in your lab, without adding per-board
> > > > > >>>>> configuration to these hooks.
> > > > > >>>>>
> > > > > >>>>> Provide ellesmere files as an example.
> > > > > >>>>
> > > > > >>>> What's ellesmere?
> > > > > >>>
> > > > > >>> It is a lake but also the name of a computer.
> > > > > >>>
> > > > > >>>>
> > > > > >>>>> Signed-off-by: Simon Glass <sjg at chromium.org>
> > > > > >>>>> ---
> > > > > >>>>>
> > > > > >>>>> Changes in v4:
> > > > > >>>>> - Support pytest fully with dual-build boards like Beagleplay
> > > > > >>>>>
> > > > > >>>>> Changes in v3:
> > > > > >>>>> - Update scripts for latest version of Labgrid integration
> > > > > >>>>> - Add poweroff.none and poweron.none
> > > > > >>>>> - Provide -n flag when querying board info
> > > > > >>>>> - Target the grpc version of Labgrid which is now in -master
> > > > > >>>>> - Update README to cover the changes
> > > > > >>>>>
> > > > > >>>>> Changes in v2:
> > > > > >>>>> - Make use of the common script (only) to set bin_dir
> > > > > >>>>>
> > > > > >>>>>   README.md                    | 50 ++++++++++++++++++++++++++++++++++++
> > > > > >>>>
> > > > > >>>> Maybe that should be in a separate labsgrid readme?
> > > > > >>>
> > > > > >>> My hope is that Labgrid becomes the normal way of running these tests,
> > > > > >>
> > > > > >> Generally I agree with automated testing platforms, I think <insert
> > > > > >> you're preferred platform here> makes sense, there's a bunch like
> > > > > >> Linaro and a bunch in the arm ecosystem use LAVA [1] and  there's a
> > > > > >> bunch more that I'm aware of.
> > > > > >
> > > > > > I am somewhat familiar with LAVA and I believe it can be used to test
> > > > > > U-Boot, although I need to learn how. Looking at a test run [2] for
> > > > > > beaglebone black I see that it is using a recent kernel but the U-Boot
> > > > > > seems to be older.
> > > > > >
> > > > > >>
> > > > > >> Does it make sense, of course at some point in the future post this
> > > > > >> being merged, make sense to look at a general way of making it easier
> > > > > >> to plugin these sort of HW test platforms using this as a basis? I ask
> > > > > >> mostly because putting a bunch of my devices into some sort of
> > > > > >> platform can auto test things and of course everyone has an opinion to
> > > > > >> which is the best one :-P
> > > > > >
> > > > > > Yes. I had heard from Tom that Labgrid is the new hotness for now.
> > > > > > Having dug into it I believe it is a good solution, although it can
> > > > > > certainly be improved to handle scale better.
> > > > > >
> > > > > > Anyway, IMO the current test hooks are not a great solution, just
> > > > > > because the configuration is spread all over the place and it relies
> > > > > > on lots of little shell scripts. So I believe that the Labgrid
> > > > > > integration is a closer to where we want to be with others that come
> > > > > > along.
> > > > >
> > > > > I'd say all those scripts are actually here to ease integration with any
> > > > > system, booting U-Boot and Linux are two different beasts.
> > > >
> > > > That's fine, go ahead and use the scripts. My point is that Labgrid
> > > > doesn't need them and in fact it makes everything pretty painful if we
> > > > try to use all of them.
> > >
> > > I guess I really need to clean-up and post my former co-workers scripts
> > > as I strongly disagree with that statement.
> >
> > See some examples below. Bear in mind also that my goal has been to
> > get my lab fully running, that includes interactive access to boards,
> > as well as running tests. For that I have been using tbot (with
> > integrated build and software-loading features). WIth Labgrid I have
> > been able to replace most of the pytest scripts, tbot and Labman with
> > Labgrid + 60 patches and some configuration files.
>
> That's certainly good and useful progress. And I feel like one of the
> lessons of this thread is that every lab ends up a bit different and so
> there's just not going to be as much standardization as we'd all like.
>
> > Let's look through the scripts:
> >
> > u-boot-test-common - new script
> >
> > u-boot-test-console - needed
> >
> > u-boot-test-flash - here's an example:
> >
> > . poweroff.${power_impl}
> > sleep 0.1
> > . flash.sdwire_common_mount
> > . poweron.${power_impl}
> >
> > Here's another:
> >
> > # Handles the common SDwire mounting (caller does power control)
> >
> > mount_dir=/media/${mount_point}
> >
> > # Switch over to get USB card access
> > sd-mux-ctrl --device-serial ${sdwire_serial} --ts
> >
> > complete=false
> > for i in {0..9}; do
> >     if out="$(mount UUID=${mount_uuid} 2>&1)"; then
> >         complete=true
> >         break
> >     fi
> >     echo $out
> >
> >     # If it is already mounted, try to unmount it first. It may have been
> >     # mounted by another user so we won't have the access we need. If this gives
> >     # an error then we know we cannot continue
> >     if [[ $out == *"already mounted"* ]]; then
> >         umount UUID=${mount_uuid}
> >     fi
> >     sleep 1
> > done
> > if [[ $complete = false ]]; then
> >     echo "Failed to mount UUID ${mount_uuid} after 10 tries"
> >     exit 1
> > fi
> >
> > # Sanity check
> > if ! mountpoint -q ${mount_dir}; then
> >     echo "Mount ${mount_dir} not available after 'mount'"
> >     exit 1
> > fi
> >
> > # Perform the write, pass along as much environment as possible
> > . writer.${flash_writer}
> >
> > complete=false
> > for i in {0..9}; do
> >     if out="$(umount ${mount_dir} 2>&1)"; then
> >         complete=true
> >         break
> >     fi
> >     echo $out
> >     sleep 1
> > done
> >
> > if [[ $complete = false ]]; then
> >     echo "Failed to umount UUID ${mount_uuid} after 10 tries"
> >     exit 1
> > fi
> >
> > # Sanity check
> > if mountpoint -q ${mount_dir}; then
> >     echo "Mount ${mount_dir} still available after 'umount'"
> >     exit 1
> > fi
> >
> > # Back to card access for the DUT
> > sd-mux-ctrl --device-serial ${sdwire_serial} --dut
> >
> >
> > Of course that assumes an SDwire which may not always be the case.
>
> Or some other piece of hardware that sd-mux-ctrl supports, but yes.
>
> > Abstracting all of this into shell scripts is just too painful.
>
> Abstracting it to something else sounds even more painful however.

I see this a little bit like the situation with Binman. We now have a
description for the image and can build for any SoC which has added
its peculiarities in a Python script.

In a similar way, I believe it is valuable to have each SoC encode (in
some sort of description) how it writes images to an SD card, over
USB, etc.

Now I am not a huge fan of yaml but it does the job and it is what
Labgrid uses. So my vision of all this is that if someone wants to set
up a lab with 5 boards, they can get the gear together, wire it all
up, then copy and modify some yaml and it should all work.

>
> > Also I have found it to be error-prone.
> >
> > There is also bin/flash.sdwire_relay_mount:
> >
> > . poweroff.${power_impl}
> >
> > sleep 0.1
> >
> > . flash.sdwire_common_mount
> >
> > . poweron.${power_impl}
>
> Yup. It's possible we could do some slightly nicer check/sleep loop but
> it probably wouldn't really get us much.
>
> > u-boot-test-getrole
> > - New script
>
> Which is for how you abstracted hardware from software configurations it
> can run (for example, the Pi 4 case of rpi_4, rpi_4_32b and rpi_arm64
> all being valid defconfigs).

Yes it maps a 'role' (like an 'identity' in test.py) to a particular
DUT and the corresponding U-Boot board. It can be many-to-one.

>
> > u-boot-test-power-off
> > - This one fits with Labgrid quite well, but of course there may not
> > actually be power control
> >
> > u-boot-test-power-on
> > - This one also fits with Labgrid, but since serial output can come
> > out as soon as it is called, jumping back into Labgrid later to bring
> > up a console results in lots of missing output. Also there may not
> > actually be power control
>
> Right, but in both cases, poweron/poweroff.labgrid could just do:
> labgrid-client power $state || true
> if power isn't supported (which sounds odd, but, maybe makes sense in
> some setup).

Yes it can. I have quite a few boards with no power control (they are
just on all the time). I find a reset line more useful than power in
some cases.

>
> > u-boot-test-release
> > - New script. Actually this isn't really needed, since I added
> > automatic release to Labgrid. I was just worried that Labgrid might
> > crash
>
> What I saw in working on my scripts was that if you don't release you
> can't acquire later and there's not an option for "you already have
> acquired it". On my scripts, I should update the cleanup function to
> release the current LG_PLACE.

Yes I hit the same problem. I added an '--auto' option to acquire and
release, so they are more forgiving.

>
> > u-boot-test-reset
> > - Unfortunately labgrid doesn't provide a reset option, although I'm
> > sure it could be added. You still need to know whether to use power or
> > reset or both, though, so again you have shell scripts with this stuff
>
> Perhaps this is something to talk with upstream about, for the cases
> where "reset" is not just another name for "power cycle" ? I know we
> have scripts today for a number of cases but the only ones I wrote are
> just power cycle (digital-loggers) or power cycle via USB (ykush, the
> USB hubs that support per-port power control intentionally).

If you did down a bit it ends up being intertwined with the U-Boot
strategy...e.g. you for USB you often need to hold reset down while
holding recovery, then release reset, then write U-Boot and wait for
it to come up.

>
> > reset.recovery_download - another thing I had to add, BTW
> >
> > . "${bin_dir}/recovery.${recovery_impl}"
> >
> > # USB enumeration delay
> > for ((i = 0; i <= 100; i++)); do
> >     if [ -e "${board_usb_dev}" ]; then
> >         break
> >     fi
> >     sleep 0.1
> > done
> > if [ ! -e "${board_usb_dev}" ]; then
> >     echo "Cannot find device ${board_usb_dev}"
> >     exit 1
> > fi
> > sleep 1
> >
> > . "${bin_dir}/download.${download_impl}"
> >
> > or another case, this is Zynq which tries to wait a bit before calling
> > picocom. It mostly works, but it loses output
> >
> > # Some boards have a UART built into them and it doesn't appear until the
> > # board is powered on. An example is zynq_zybo. Try to handle this by waiting
> > # for the console and then continuing as soon as possible (so we don't miss
> > # the U-Boot banner).
> > if [ -n "${console_wait}" ]; then
> >     for i in {0..99}; do
> >         if [ -e "${console_dev}" ]; then
> >             break
> >         fi
> >         sleep .1
> >     done
> >     if [ ! -e "${console_dev}" ]; then
> >         echo "Console ${console_dev} not found"
> >         exit 1
> >     fi
> > fi
> >
> > exec picocom -b "${console_baud}" "${console_dev}"
> >
> > So that's about a third of the reason why I don't think the scripts scale.
>
> This is where I think looking at https://github.com/bootlin/snagboot
> which Kevin Hillman pointed me at a while ago might come in handy,
> instead.

Oh yes I ran into that at a show a year or two back, very nice! Better
than having a lot of little tools.

>
> > > The only "painful" part is
> > > the shared pain, regardless of framework, for "put blob $X at location
> > > $Y", which I believe you abstract out in to python?
> >
> > Sort-of. The Python knows how to handle all the different writing
> > methods and is at [2], but it is entirely controlled by yaml[3] (the
> > example_env.yaml). See the ti,am625 method for example and the 'play'
> > board right at the bottom of the end file.
> >
> > I very much like the configuration files (although I hope one day
> > Labgrid will allow the two files to be combined). It makes it easier
> > to add a new board, since there is much less to think about. It really
> > did take a long time to fiddle with all those hook scripts and I got
> > tired of it.
> >
> > Regards,
> > Simon
> >
> > [1] https://github.com/sjg20/u-boot/tree/lab6a/tools/labman
> > [2] https://github.com/labgrid-project/labgrid/pull/1411/commits/c7ea03987362af70bbbb94d14ff74c54a1ee5ed8#diff-e994502495dc8083c0b70ab20ace18fb5f85217805d01a54270a69c84ec31023R107
> > [3] https://github.com/labgrid-project/labgrid/pull/1411/commits/6d3b4a0cfc7e35e0547dcd14065328df4b4b5e1f#diff-66f82743402c95b2eac8f9937c61c582bd0aca2b35e841477d62122c811b7750R1187
>
> Yeah, personally I don't care for that. I know I'm not going to convince
> people and it's just one of those "goodness, I'm the old guy now"
> things, but, no. It would be easier if you had comments for each SoC as
> to what / why each of the magic values, but then shouldn't it be
> abstracted more? Perhaps to a file per SoC, and at that point how is
> this easier to follow than a commented shell script? But in sum, I'm
> just cranky about yaml and I know your opinion is the opposite.

As above I don't like it either, but it is what Labgrid has chosen. In
fact I used it with Labman too...it really needs proper phandles like
devicetree. At least is isn't JSON though...:-)

>
> As an aside, please review the ti,am625 stuff with some TI folks, that's
> both far too specific (you're going to need to add different am62 and
> am64 and am69 and j7... and on and on cases) and I forget if there's
> some work to have a slightly easier way to know what file to use going
> on.

Well, perhaps they can take it over and get everything working? I have
done the hard yards, e.g. supporting two builds together.

Regards,
Simon