[U-Boot] [PATCH] Implement pytest-based test infrastructure

Tue Nov 24 05:44:03 CET 2015

On 11/23/2015 06:45 PM, Simon Glass wrote:
> Hi Stephen,
> 
> On 22 November 2015 at 10:30, Stephen Warren <swarren at wwwdotorg.org> wrote:
>> On 11/21/2015 09:49 AM, Simon Glass wrote:
>>> Hi Stephen,
>>>
>>> On 19 November 2015 at 12:09, Stephen Warren <swarren at wwwdotorg.org> wrote:
>>>>
>>>> On 11/19/2015 10:00 AM, Stephen Warren wrote:
>>>>>
>>>>> On 11/19/2015 07:45 AM, Simon Glass wrote:
>>>>>>
>>>>>> Hi Stephen,
>>>>>>
>>>>>> On 14 November 2015 at 23:53, Stephen Warren <swarren at wwwdotorg.org>
>>>>>> wrote:
>>>>>>>
>>>>>>> This tool aims to test U-Boot by executing U-Boot shell commands
>>>>>>> using the
>>>>>>> console interface. A single top-level script exists to execute or attach
>>>>>>> to the U-Boot console, run the entire script of tests against it, and
>>>>>>> summarize the results. Advantages of this approach are:
>>>>>>>
>>>>>>> - Testing is performed in the same way a user or script would interact
>>>>>>>    with U-Boot; there can be no disconnect.
>>>>>>> - There is no need to write or embed test-related code into U-Boot
>>>>>>> itself.
>>>>>>>    It is asserted that writing test-related code in Python is simpler
>>>>>>> and
>>>>>>>    more flexible that writing it all in C.
>>>>>>> - It is reasonably simple to interact with U-Boot in this way.
>>>>>>>
>>>>>>> A few simple tests are provided as examples. Soon, we should convert as
>>>>>>> many as possible of the other tests in test/* and test/cmd_ut.c too.
>>>>>>
>>>>>>
>>>>>> It's great to see this and thank you for putting in the effort!
>>>>>>
>>>>>> It looks like a good way of doing functional tests. I still see a role
>>>>>> for unit tests and things like test/dm. But if we can arrange to call
>>>>>> all U-Boot tests (unit and functional) from one 'test.py' command that
>>>>>> would be a win.
>>>>>>
>>>>>> I'll look more when I can get it to work - see below.
>>>>
>>>> ...
>>>>>
>>>>> made it print a message about checking the docs for missing
>>>>> requirements. I can probably patch the top-level test.py to do the same.
>>>>
>>>>
>>>> I've pushed such a patch to:
>>>>
>>>> git://github.com/swarren/u-boot.git tegra_dev
>>>> (the separate pytests branch has now been deleted)
>>>>
>>>> There are also a variety of other patches there related to this testing infra-structure. I guess I'll hold off sending them to the list until there's been some general feedback on the patches I've already posted, but feel free to pull the branch down and play with it. Note that it's likely to get rebased as I work.
>>>
>>> OK I got it working thank you. It is horribly slow though - do you
>>> know what is holding it up? For me to takes 12 seconds to run the
>>> (very basic) tests.
>>
>> It looks like pexpect includes a default delay to simulate human
>> interaction. If you edit test/py/uboot_console_base.py ensure_spawned()
>> and add the following somewhere soon after the assignment to self.p:
>>
>>             self.p.delaybeforesend = 0
>>
>> ... that will more than halve the execution time. (8.3 -> 3.5s on my
>> 5-year-old laptop).
>>
>> That said, even your 12s or my 8.3s doesn't seem like a bad price to pay
>> for some easy-to-use automated testing.
> 
> Sure, but my reference is to the difference between a native C test
> and this framework. As we add more and more tests the overhead will be
> significant. If it takes 8 seconds to run the current (fairly trivial)
> tests, it might take a minute to run a larger suite, and to me that is
> too long (e.g. to bisect for a failing commit).
> 
> I wonder what is causing the delay?

I actually hope the opposite.

Most of the tests supported today are the most trivial possible tests,
i.e. they take very little CPU time on the target to implement. I would
naively expect that once we implement more interesting tests (USB Mass
Storage, USB enumeration, eMMC/SD/USB data reading, Ethernet DHCP/TFTP,
...) the command invocation overhead will rapidly become insignificant.
This certainly seems to be true for the UMS test I have locally, but who
knows whether this will be more generally true.

I put a bit of time measurement into run_command() and found that on my
system at work, for p.send("the shell command to execute") was actually
(marginally) slower on sandbox than on real HW, despite real HW being a
115200 baud serial port, and the code splitting the shell commands into
chunks that are sent and waited for synchronously to avoid overflowing
UART FIFOs. I'm not sure why this is. Looking at U-Boot's console, it
seems to be non-blocking, so I don't think termios VMIN/VTIME come into
play (setting them to 0 made no difference), and the two raw modes took
the same time. I meant to look into pexpect's termios settings to see if
there was anything to tweak there, but forgot today.

I did do one experiment to compare expect (the Tcl version) and pexpect.
If I do roughly the following in both:

spawn u-boot (sandbox)
wait for prompt
100 times:
    send "echo $foo\n"
    wait for "echo $foo"
    wait for shell prompt
send "reset"
wait for "reset"
send "\n"

... then Tcl is about 3x faster on my system (IIRC 0.5 vs. 1.5s). If I
remove all the "wait"s, then IIRC Tcl was about 15x faster or more.
That's a pity. Still, I'm sure as heck not going to rewrite all this in
Tcl:-( I wonder if something similar to pexpect but more targetted at
simple "interactive shell" cases would remove any of that overhead.

>>> Also please see dm_test_usb_tree() which uses a console buffer to
>>> check command output.
>>
>> OK, I'll take a look.
>>
>>> I wonder if we should use something like that
>>> for simple unit tests, and use python for the more complicated
>>> functional tests?
>>
>> I'm not sure that's a good idea; it'd be best to settle on a single way
>> of executing tests so that (a) people don't have to run/implement
>> different kinds of tests in different ways (b) we can leverage test code
>> across as many tests as possible.
>>
>> (Well, doing unit tests and system level tests differently might be
>> necessary since one calls functions and the other uses the shell "user
>> interface", but having multiple ways of doing e.g. system tests doesn't
>> seem like a good idea.)
> 
> As you found with some of the tests, it is convenient/necessary to be
> able to call U-Boot C functions in some tests. So I don't see this as
> a one-size-fits-all solution.

Yes, although I expect the split would be need-to-call-a-C-function ->
put the code in U-Boot vs. anything else in Python via the shell prompt.

> I think it is perfectly reasonable for the python framework to run the
> existing C tests

Yes.

> - there is no need to rewrite them in Python.

Probably not as an absolute mandate. Still, consistency would be nice.
One advantage of having things as individual pytests is that the status
of separate tests doesn't get aggregated; you can see that of 1000
tests, 10 failed, rather than seeing that 1000 logical tests were
executed as part of 25 pytests, and 2 of those failed, each only because
of 1 subtest with the other hundred subtests passing.

> Also
> for the driver model tests - we can just run the tests from some sort
> of python wrapper and get the best of both worlds, right?

I expect so, yes. I haven't looked at those yet.

> Please don't take this to indicate any lack of enthusiasm for what you
> are doing - it's a great development and I'm sure it will help a lot!
> We really need to unify all the tests so we can run them all in one
> step.

Thanks:-)

> I just think we should aim to have the automated tests run in a few
> seconds (let's say 5-10 at the outside). We need to make sure that the
> python framework will allow this even when running thousands of tests.

I'd be happy with something that took minutes, or longer. Given "build
all boards" takes a very long time (and I'm sure we'd like everyone to
do that, although I imagine few do), something of the same order of
magnitude might even be reasonable? Thousands of test sounds like rather
a lot; perhaps that number makes sense for tiny unit tests. I was
thinking of testing fewer larger user-visible features that generally
will have disk/network/... IO rates as the limiting factor. Perhaps one
of those tests could indeed be "run 1000 tiny C-based unit tests via a
single shell command".