[PATCH 0/8] efi_loader: Complete the bootflow_efi() test

Sat Jan 18 05:35:42 CET 2025

Hi Tom,

On Mon, 13 Jan 2025 at 13:20, Tom Rini <trini at konsulko.com> wrote:
>
> On Mon, Jan 13, 2025 at 12:01:36PM -0700, Simon Glass wrote:
> > Hi Tom,
> >
> > On Fri, 10 Jan 2025 at 09:48, Tom Rini <trini at konsulko.com> wrote:
> > >
> > > On Fri, Jan 10, 2025 at 06:40:37AM -0700, Simon Glass wrote:
> > > > Hi Tom,
> > > >
> > > > On Thu, 9 Jan 2025 at 09:51, Tom Rini <trini at konsulko.com> wrote:
> > > > >
> > > > > On Thu, Jan 09, 2025 at 08:02:01AM -0700, Simon Glass wrote:
> > > > > > Hi Tom,
> > > > > >
> > > > > > On Wed, 8 Jan 2025 at 12:15, Tom Rini <trini at konsulko.com>
wrote:
> > > > > > >
> > > > > > > On Wed, Jan 08, 2025 at 10:02:52AM -0700, Simon Glass wrote:
> > > > > > > > Hi Heinrich, Tom,
> > > > > > > >
> > > > > > > > On Tue, 7 Jan 2025 at 08:47, Heinrich Schuchardt <
xypron.glpk at gmx.de> wrote:
> > > > > > > > >
> > > > > > > > > On 07.01.25 16:11, Tom Rini wrote:
> > > > > > > > > > On Tue, Jan 07, 2025 at 06:57:50AM -0700, Simon Glass
wrote:
> > > > > > > > > >> Hi Heinrich,
> > > > > > > > > >>
> > > > > > > > > >> On Tue, 7 Jan 2025 at 06:11, Heinrich Schuchardt <
xypron.glpk at gmx.de> wrote:
> > > > > > > > > >>>
> > > > > > > > > >>> On 07.01.25 13:15, Simon Glass wrote:
> > > > > > > > > >>>> Hi Heinrich,
> > > > > > > > > >>>>
> > > > > > > > > >>>> On Mon, 6 Jan 2025 at 10:00, Heinrich Schuchardt <
xypron.glpk at gmx.de> wrote:
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> On 06.01.25 15:47, Simon Glass wrote:
> > > > > > > > > >>>>>> This test was hamstrung in code review so this
series is an attempt to
> > > > > > > > > >>>>>> complete the intended functionality:
> > > > > > > > > >>>>>>
> > > > > > > > > >>>>>> - Check memory allocations look correct
> > > > > > > > > >>>>>> - Check that exit-boot-services removes active-DMA
devices
> > > > > > > > > >>>>>> - Check that the bootflow is still present after
testapp finishes
> > > > > > > > > >>>>>>
> > > > > > > > > >>>>>> The EFI functionality duplicates
bootm_announce_and_cleanup() and still
> > > > > > > > > >>>>>> uses the defunct board_quiesce_devices() so a nice
cleanup would be to
> > > > > > > > > >>>>>> call the bootm function instead, with suitable
modifications. That would
> > > > > > > > > >>>>>> allow bootstage to work too.
> > > > > > > > > >>>>>>
> > > > > > > > > >>>>>> This series is based on sjg/master since the EFI
logging was rejected so
> > > > > > > > > >>>>>> far.
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> Yes, it was rejected because a solution at the
lib/log.c level would be
> > > > > > > > > >>>>> more generic.
> > > > > > > > > >>>>
> > > > > > > > > >>>> As I mentioned, that idea isn't suitable for
programmatic use.
> > > > > > > > > >>>
> > > > > > > > > >>> What can be done with show_addr("mem", rec->memory);
that log_debug()
> > > > > > > > > >>> does not offer or which you could not do with a new
log function in
> > > > > > > > > >>> lib/log.c that takes variadic arguments?
> > > > > > > > > >>
> > > > > > > > > >> There are asserts in [1], for example. How do you
propose to handle
> > > > > > > > > >> that? See [2] for my previous explanation, quoted here:
> > > > > > > > > >>
> > > > > > > > > >>> CONFIG_LOG with a bloblist option would be a great
idea, but it's hard
> > > > > > > > > >>> to programmatically scan text...plus only the
external call sites are
> > > > > > > > > >>> actually logged.
> > > > > > > > > >>
> > > > > > > > > >> Also see the discussion on the original patch [3].
There was also your
> > > > > > > > > >> reply at [4], but I think you missed that this is
intended for use in
> > > > > > > > > >> unit tests (i.e. with ut_assert()).
> > > > > > > > > >>
> > > > > > > > > >> You also requested that this be generalised, rather
than being
> > > > > > > > > >> EFI-loader-specific. I have no objection to that, but
don't have a use
> > > > > > > > > >> case for it yet, so have deferred that to later. It's
a fairly simple
> > > > > > > > > >> change, if/when needed. If the series was not NAKed,
I'd be happy to
> > > > > > > > > >> do it now.
> > > > > > > > > >>
> > > > > > > > > >>>>
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> Tom suggested not to send patches that are for
private enjoyment to the
> > > > > > > > > >>>>> mailing list.
> > > > > > > > > >>>>
> > > > > > > > > >>>> My contributions to U-Boot are only ever about
private enjoyment :-)
> > > > > > > > > >>>>
> > > > > > > > > >>>> Do you have any comments on the patches?
> > > > > > > > > >>
> > > > > > > > > >> Regards,
> > > > > > > > > >> Simon
> > > > > > > > > >>
> > > > > > > > > >> [1]
https://patchwork.ozlabs.org/project/uboot/patch/20250106144755.3054780-6-sjg@chromium.org/
> > > > > > > > > >> [2]
https://lore.kernel.org/u-boot/CAFLszTjxOE_037+kR0jgdax80sBombYo_k0YgiuVnP=KZCOvuA@mail.gmail.com/
> > > > > > > > > >> [3]
https://lore.kernel.org/u-boot/CAC_iWjKtaN54B98OKbkoXkC_GmKJ=x+M4=UY_O6roSOpZaDxag@mail.gmail.com/
> > > > > > > > > >> [4]
https://lore.kernel.org/u-boot/D513D326-41A6-425E-B11F-85958065BCD2@gmx.de/
> > > > > > > > > >
> > > > > > > > > > Looking at the logging portions of the original series
again, especially
> > > > > > > > > > if this was made generic, we probably don't want to
print to actual
> > > > > > > > > > console every time we're making a note of some memory
allocation for
> > > > > > > > > > example, that would be unreadable outside of a debug
context. The point
> > > > > > > > > > of this really seems to be "log things for verifying in
tests later".
> > > > > > > > > > Does that end up being useful? I don't know. Heinrich
or Ilias, do the
> > > > > > > > > > tests in [1] look generally useful?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > The tests in [1] are not documented, not even in the
commit message. So
> > > > > > > > > the reasoning behind the tests remains Simon's secret.
> > > > > > > >
> > > > > > > > Are you asking for code comments in the test? If so, I can
add some.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > At first sight the tests in [1] don't make much sense.
E.g. that only a
> > > > > > > > > subset of memory types have been used does not tell that
the right
> > > > > > > > > memory type has been used for the right object.
> > > > > > > >
> > > > > > > > It is a pretty good start, though. It makes sure that the
memory types
> > > > > > > > are sane, checks addresses are within DRAM, etc. With [5]
it makes
> > > > > > > > sure that devices are removed.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Implementing a specific tracing functionality for EFI is
definitively
> > > > > > > > > the wrong way forward as it will lead to code duplication.
> > > > > > > >
> > > > > > > > We can cross that bridge when we come to it.
> > > > > > >
> > > > > > > Well, no. It's backwards to make a bridge in one place when
everyone
> > > > > > > agrees it needs to be moved somewhere else. I mean [5] is a
generic
> > > > > > > issue and test/py/tests/test_net_boot.py or some other test
we already
> > > > > > > have which tests booting an OS should confirm that we've
quiesced
> > > > > > > devices before moving on. And as a bonus it's in python where
dealing
> > > > > > > with strings doesn't suck.
> > > > > >
> > > > > > I really don't want to write C tests in Python. CI is slow
enough as
> > > > > > it is, something realy want to fix. I'm also not sure how you
can tell
> > > > > > if a device has been removed. Run 'dm tree' and look for the
missing
> > > > > > 'star' in the resulting 300 lines of text?
> > > > >
> > > > > As I'm in a bisect-hell in our C tests you'll have to forgive me
for not
> > > > > thinking the C tests are noticeably faster than python tests. Or
that
> > > > > they aren't their own potential source of corner-case bugs. But I
> > > > > digress..
> > > >
> > > > Welcome to my world. I bisected my lab devices so many times to try
to
> > > > isolate all the breakages that have crept in. What is the problem,
> > > > maybe I can help?
> > >
> > > Sure. test/cmd/hash.c::dm_test_cmd_hash_md5 fails randomly, in maybe 1
> > > out of 100 runs, via pytest, in sandbox. Not via "./u-boot -T -c 'ut
dm
> > > dm_test_cmd_hash_md5'" however (I stopped checking after 1000
> > > iterations). I was iterating over "and built with clang" but I think
it
> > > happens with gcc too, from the actual failures in CI. And you can use
> > > "-k ut" to limit to just what's matched there, so it's a quicker
> > > iteration.
> >
> > Hmmm do you have a link? It's hard to imagine what it is, but perhaps
> > a dependency on a previous test.
>
> Sure:
> https://source.denx.de/u-boot/u-boot/-/jobs/993200#L286
>
> My current gut feeling is that we've got some section overlap issue
> somewhere. And, FWIW, if I turn off EFI_LOADER I still see it. And if
> you use the same binary it won't always fail.

The only way I think the hash command can return CMD_RET_USAGE is if there
are not enough arguments.

The only way that can happen with this code:

ut_assertok(run_command("hash md5 $loadaddr 0", 0));

is if loadaddr is undefined.

But (like you, I suspect) I cannot find how that could be.

In fact, I can't even build with clang:

$ buildman -O clang-17 --bo sandbox -w -o /tmp/b/sandbox-clang
Building current source for 1 boards (1 thread, 32 jobs per thread)
   sandbox:  +   sandbox
+/usr/bin/llvm-ar: error: arch/sandbox/cpu/built-in.o: Opaque pointers are
only supported in -opaque-pointers mode (Producer: 'LLVM17.0.6' Reader:
'LLVM 14.0.0')
+make[2]: *** [scripts/Makefile.build:333: arch/sandbox/cpu/built-in.o]
Error 1
+make[1]: *** [Makefile:1906: arch/sandbox/cpu] Error 2
+make: *** [Makefile:177: sub-make] Error 2
    0    0    1 /1              sandbox

>
> > At present 'ut all' fails so I am going to take a look at that. Quite
> > a bit of clean-up needed in test system, though. Ideally we could run
> > the tests in random order so we can find and fix the dependencies. For
> > driver model we reinit as needed, but that's not the case for EFI, for
> > example.
>
> Personally, for making pytest faster I'd look at the general
> recommendations various blog posts about "make your pytest run faster"
> and then go from there.

I think the problem is that you are looking at the C tests through a Python
lens, so everything seems a bit slow.

'ut all' takes about 18 seconds for me.

>
> > > > > And yes, taking a bunch of text and parsing it, is what python is
fast
> > > > > at. And easier to write.
> > > > >
> > > > > > But actually [5] is not generic, since EFI uses its own code to
remove
> > > > > > devices. This test is solely focussed on EFI.
> > > > >
> > > > > Yes, you're testing the EFI version of the code in
> > > > > arch/$(ARCH)/lib/bootm.c. The remove devices functions being
called in
> > > > > both cases are generic.
> > > >
> > > > The code in EFI is:
> > > >
> > > > if (!efi_st_keep_devices) {
> > > > bootm_disable_interrupts();
> > > > if (IS_ENABLED(CONFIG_USB_DEVICE))
> > > > udc_disconnect();
> > > > board_quiesce_devices();
> > > > dm_remove_devices_active();
> > > > }
> > > >
> > > > It does call somewhat the same functions, but is doing its own
thing,
> > > > not even using the arch-specific code. As I mentioned, a nice
clean-up
> > > > would be to make bootm_announce_and_cleanup() common.
> > >
> > > Yes, we almost agree? Both the EFI code, and arch/$(ARCH)/lib/bootm.c
> > > have functions that make the above calls. A nice clean-up would be to
> > > have something common.
> >
> > Yes indeed. It still does not provide a test for the EFI bootmeth,
> > though, where I found half a dozen bugs.
>
> Yes, the bootmeth code is it's own thing.

No I mean there were bugs in the EFI implementation.

>
> > > > Actually, now that I see efi_st_keep_devices, I wonder why Heinrich
> > > > didn't want my ANSI patch[6] which serves a similar function.
> > >
> > > No? Your patch disables ANSI output in those tests, that variable is
for
> > > making sure those tests can accomplish (if I skim things right)
similar
> > > kinds of tests you've asked for before, but with an EFI app instead?
But
> > > perhaps better to not start yet another tangent here...
> >
> > I wouldn't know where to start, anyway...
> >
> > > > > > If you want the logging to be renamed and placed centrally I
don't
> > > > > > mind doing it now. But note that only EFI will use it for now.
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > We already have function _log() which is variadic.
> > > > > > > > >
> > > > > > > > > Simon could write a new log driver that parses the
`format` parameter
> > > > > > > > > and saves the binary data in an appropriate format for
analysis by the
> > > > > > > > > unit tests:
> > > > > > > > >
> > > > > > > > > * For %s the driver should save the string and not the
address of the
> > > > > > > > > string.
> > > > > > > > > * For %pD the driver should save the device path instead
of the pointer.
> > > > > > > > > * ...
> > > > > > > > >
> > > > > > > > > Some changes to the log driver interface will be needed
to pass the
> > > > > > > > > variadic arguments instead of the formatted message.
> > > > > > > >
> > > > > > > > Perhaps the word 'log' is confusing people. But the above
suggestion
> > > > > > > > is quite a complicated way of handling things. We have no
way to
> > > > > > > > decode printf() strings in this way. See log_dispatch() for
how this
> > > > > > > > is handled today. It uses sprintf(). Trying to test based
on text
> > > > > > > > output would be very clumsy (lots of regexes and sscan()
calls?) and
> > > > > > > > result in a huge amount of parsing code, highly dependent
on the
> > > > > > > > printf() format, etc.
> > > > > > > >
> > > > > > > > I very-much doubt that would produce a useful
implementation, but if
> > > > > > > > you would like to try it out then I would be happy to look
at it.
> > > > > > > >
> > > > > > > > I mentioned this several times, but even if we did go that
way, we
> > > > > > > > only have logging on the external calls, so much of the
EFI-memory
> > > > > > > > allocation in U-Boot would not be logged.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Simon
> > > > > > > >
> > > > > > > > [5]
https://patchwork.ozlabs.org/project/uboot/patch/20250106144755.3054780-9-sjg@chromium.org/
> > > > > > >
> > > > > > > Yes, calling this a "log" when it's intended for capturing
information
> > > > > > > for tests got some of this off on the wrong track. But that
also helps
> > > > > > > explain now that this is still on the wrong track and should
instead be
> > > > > > > following normal design practices for testing and expanding
existing
> > > > > > > infrastructure and not inventing a new everything. So if you
don't like
> > > > > > > Heinrich's suggestion, take a look at Caleb's suggestion.
> > > > > >
> > > > > > I don't have the energy to port the tracing framework from
Linux to
> > > > > > U-Boot, although I agree it would be useful. Still, function
tracing
> > > > > > is quite fragile and confusing to work with when refactoring
code. I
> > > > > > don't like that idea much for this use case, although if
function
> > > > > > tracing did exist in U-Boot I would likely have used it.
> > > > >
> > > > > I mean yes, it would be good if you went back and expanded on the
trace
> > > > > functionality you did before.
> > > >
> > > > I still don't believe it is the best solution and seems like yet
> > > > another ocean I should avoid sticking my heater into.
> > >
> > > I strongly disagree. If you go back to the trace code you brought in
to
> > > start with and make it more useful / include newer features existing
> > > elsewhere you're not going to end up in conflict with everyone asking
> > > why you're doing something subsystem specific.
> >
> > Perhaps someone else could do this? It would be a substantial amount
> > of work to bring runtime tooling into U-Boot, bpf and the like. It
> > would be quite a pain to use, I suspect, and certainly not possible to
> > write a simple C test as I have done here.
>
> I'm not sure where bpf and the like comes from in what I said here. But
> again, if you find that logging thing you wrote handy, keep it in a
> topic branch somewhere and then later on you can "Told You So" the rest
> of us later on.

I'm not sure what you are suggesting re tracing, but I assumed it was that
we patch the code at runtime and set tracepoints to check for particular
calls in a test. I'm not sure how else we would get access to the function
arguments. Then, presumably we need a bpf rule to collect the function
args, etc.? It wasn't my idea, so perhaps we should let Caleb explain the
intent.

>
> > > > > > > And if you
> > > > > > > don't like Caleb's suggestion, go put this in a topic branch
you can
> > > > > > > merge when you need to debug some problem that seemingly
nothing else
> > > > > > > will catch.
> > > > > >
> > > > > > Here we are over a year after I reported the bug and we still
don't
> > > > > > have a test to cover it. This series is better than the
available
> > > > > > alternatives, IMO.
> > > > >
> > > > > Well, no. We have commit dabaa4ae3206 ("dm: Add
> > > > > dm_remove_devices_active() for ordered device removal") we have a
test
> > > > > for the underlying problem. We need more functional boot tests,
but we
> > > > > need those to be in python too, and not more C code.
> > > >
> > > > That is a nice improvement, but did not fix the underlying problem.
> > > > The underlying problem was that EFI was calling exit-boot-services,
> > > > causing U-Boot to free up data structures which were needed to boot.
> > > > This was on x86_64. I never quite figured out which one (very hard
> > > > when you cannot get back to U-Boot to check).
> > > >
> > > > There were quite a lot of problems, actually. There v2 series is at
[7]
> > > >
> > > > Only a C test can check what actually happens inside U-Boot.
> > >
> > > Yes, I think now we get back to disagreeing on which symptoms lead to
> > > which code problems and then what to do about them.
> >
> > OK
> >
> > >
> > > > > And you're not just coming up with a test, you're refactoring a
bunch of
> > > > > code and introducing new subsystems in order to do that. When as
I keep
> > > > > pointing out, we don't need that. We could easily extend the
existing OS
> > > > > boot tests we have to script booting an ISO. And we only run
those when
> > > > > say "ENABLE_SLOW_TESTS" is set, and only do that on tagged
releases.
> > > >
> > > > Yes of course we need to refactor to make tests work. This is not
> > > > necessarily a bad thing, as it helps us break code down into
testable
> > > > chunks. We cannot rely only on large functional-tests, not that you
> > > > are suggesting that. See [8], but they are too slow, too hard to
debug
> > > > when they fail. They also tend to devolve into chaos as people get
> > > > lazy and stop writing unit/smaller tests.
> > >
> > > I'll just note that I don't ever even think to use "make tests" or
> > > "qcheck" or any of the others since they never work for me.
> >
> > Would you mind filing an issue on that? I use 'make pcheck' all the
time.
>
> Done:
> https://source.denx.de/u-boot/custodians/u-boot-dm/-/issues/33
>
> I didn't file one for "qcheck" which fails differently in that same
> environment.

OK, I'll reply there, thanks.

Regards,
Simon