[U-Boot] [PATCH] Implement pytest-based test infrastructure

Tue Nov 24 20:04:45 CET 2015

Hi Stephen,

On 23 November 2015 at 21:44, Stephen Warren <swarren at wwwdotorg.org> wrote:
> On 11/23/2015 06:45 PM, Simon Glass wrote:
>> Hi Stephen,
>>
>> On 22 November 2015 at 10:30, Stephen Warren <swarren at wwwdotorg.org> wrote:
>>> On 11/21/2015 09:49 AM, Simon Glass wrote:
>>>> Hi Stephen,
>>>>
>>>> On 19 November 2015 at 12:09, Stephen Warren <swarren at wwwdotorg.org> wrote:
>>>>>
>>>>> On 11/19/2015 10:00 AM, Stephen Warren wrote:
>>>>>>
>>>>>> On 11/19/2015 07:45 AM, Simon Glass wrote:
>>>>>>>
>>>>>>> Hi Stephen,
>>>>>>>
>>>>>>> On 14 November 2015 at 23:53, Stephen Warren <swarren at wwwdotorg.org>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> This tool aims to test U-Boot by executing U-Boot shell commands
>>>>>>>> using the
>>>>>>>> console interface. A single top-level script exists to execute or attach
>>>>>>>> to the U-Boot console, run the entire script of tests against it, and
>>>>>>>> summarize the results. Advantages of this approach are:
>>>>>>>>
>>>>>>>> - Testing is performed in the same way a user or script would interact
>>>>>>>>    with U-Boot; there can be no disconnect.
>>>>>>>> - There is no need to write or embed test-related code into U-Boot
>>>>>>>> itself.
>>>>>>>>    It is asserted that writing test-related code in Python is simpler
>>>>>>>> and
>>>>>>>>    more flexible that writing it all in C.
>>>>>>>> - It is reasonably simple to interact with U-Boot in this way.
>>>>>>>>
>>>>>>>> A few simple tests are provided as examples. Soon, we should convert as
>>>>>>>> many as possible of the other tests in test/* and test/cmd_ut.c too.
>>>>>>>
>>>>>>>
>>>>>>> It's great to see this and thank you for putting in the effort!
>>>>>>>
>>>>>>> It looks like a good way of doing functional tests. I still see a role
>>>>>>> for unit tests and things like test/dm. But if we can arrange to call
>>>>>>> all U-Boot tests (unit and functional) from one 'test.py' command that
>>>>>>> would be a win.
>>>>>>>
>>>>>>> I'll look more when I can get it to work - see below.
>>>>>
>>>>> ...
>>>>>>
>>>>>> made it print a message about checking the docs for missing
>>>>>> requirements. I can probably patch the top-level test.py to do the same.
>>>>>
>>>>>
>>>>> I've pushed such a patch to:
>>>>>
>>>>> git://github.com/swarren/u-boot.git tegra_dev
>>>>> (the separate pytests branch has now been deleted)
>>>>>
>>>>> There are also a variety of other patches there related to this testing infra-structure. I guess I'll hold off sending them to the list until there's been some general feedback on the patches I've already posted, but feel free to pull the branch down and play with it. Note that it's likely to get rebased as I work.
>>>>
>>>> OK I got it working thank you. It is horribly slow though - do you
>>>> know what is holding it up? For me to takes 12 seconds to run the
>>>> (very basic) tests.
>>>
>>> It looks like pexpect includes a default delay to simulate human
>>> interaction. If you edit test/py/uboot_console_base.py ensure_spawned()
>>> and add the following somewhere soon after the assignment to self.p:
>>>
>>>             self.p.delaybeforesend = 0
>>>
>>> ... that will more than halve the execution time. (8.3 -> 3.5s on my
>>> 5-year-old laptop).
>>>
>>> That said, even your 12s or my 8.3s doesn't seem like a bad price to pay
>>> for some easy-to-use automated testing.
>>
>> Sure, but my reference is to the difference between a native C test
>> and this framework. As we add more and more tests the overhead will be
>> significant. If it takes 8 seconds to run the current (fairly trivial)
>> tests, it might take a minute to run a larger suite, and to me that is
>> too long (e.g. to bisect for a failing commit).
>>
>> I wonder what is causing the delay?
>
> I actually hope the opposite.
>
> Most of the tests supported today are the most trivial possible tests,
> i.e. they take very little CPU time on the target to implement. I would
> naively expect that once we implement more interesting tests (USB Mass
> Storage, USB enumeration, eMMC/SD/USB data reading, Ethernet DHCP/TFTP,
> ...) the command invocation overhead will rapidly become insignificant.
> This certainly seems to be true for the UMS test I have locally, but who
> knows whether this will be more generally true.

We do have a USB enumeration and storage test including data reading.
We have some simple 'ping' Ethernet tests. These run in close to no
time (they fudge the timer).

I think you are referring to tests running on real hardware. In that
case I'm sure you are right - e.g. the USB or Ethernet PHY delays will
dwarf the framework time.

I should have been clear that I am most concerned about sandbox tests
running quickly. To me that is where we have most of gain/lose.

>
> I put a bit of time measurement into run_command() and found that on my
> system at work, for p.send("the shell command to execute") was actually
> (marginally) slower on sandbox than on real HW, despite real HW being a
> 115200 baud serial port, and the code splitting the shell commands into
> chunks that are sent and waited for synchronously to avoid overflowing
> UART FIFOs. I'm not sure why this is. Looking at U-Boot's console, it
> seems to be non-blocking, so I don't think termios VMIN/VTIME come into
> play (setting them to 0 made no difference), and the two raw modes took
> the same time. I meant to look into pexpect's termios settings to see if
> there was anything to tweak there, but forgot today.
>
> I did do one experiment to compare expect (the Tcl version) and pexpect.
> If I do roughly the following in both:
>
> spawn u-boot (sandbox)
> wait for prompt
> 100 times:
>     send "echo $foo\n"
>     wait for "echo $foo"
>     wait for shell prompt
> send "reset"
> wait for "reset"
> send "\n"
>
> ... then Tcl is about 3x faster on my system (IIRC 0.5 vs. 1.5s). If I
> remove all the "wait"s, then IIRC Tcl was about 15x faster or more.
> That's a pity. Still, I'm sure as heck not going to rewrite all this in
> Tcl:-( I wonder if something similar to pexpect but more targetted at
> simple "interactive shell" cases would remove any of that overhead.

It is possible that we should use sandbox in 'cooked' mode so that
lines an entered synchronously. The -t option might help here, or we
may need something else.

>
>>>> Also please see dm_test_usb_tree() which uses a console buffer to
>>>> check command output.
>>>
>>> OK, I'll take a look.
>>>
>>>> I wonder if we should use something like that
>>>> for simple unit tests, and use python for the more complicated
>>>> functional tests?
>>>
>>> I'm not sure that's a good idea; it'd be best to settle on a single way
>>> of executing tests so that (a) people don't have to run/implement
>>> different kinds of tests in different ways (b) we can leverage test code
>>> across as many tests as possible.
>>>
>>> (Well, doing unit tests and system level tests differently might be
>>> necessary since one calls functions and the other uses the shell "user
>>> interface", but having multiple ways of doing e.g. system tests doesn't
>>> seem like a good idea.)
>>
>> As you found with some of the tests, it is convenient/necessary to be
>> able to call U-Boot C functions in some tests. So I don't see this as
>> a one-size-fits-all solution.
>
> Yes, although I expect the split would be need-to-call-a-C-function ->
> put the code in U-Boot vs. anything else in Python via the shell prompt.
>
>> I think it is perfectly reasonable for the python framework to run the
>> existing C tests
>
> Yes.
>
>> - there is no need to rewrite them in Python.
>
> Probably not as an absolute mandate. Still, consistency would be nice.
> One advantage of having things as individual pytests is that the status
> of separate tests doesn't get aggregated; you can see that of 1000
> tests, 10 failed, rather than seeing that 1000 logical tests were
> executed as part of 25 pytests, and 2 of those failed, each only because
> of 1 subtest with the other hundred subtests passing.

Indeed. As things stand we would want the framework to 'understand'
driver model tests, and integrate the results of calling out to those,
into its own report.

>
>> Also
>> for the driver model tests - we can just run the tests from some sort
>> of python wrapper and get the best of both worlds, right?
>
> I expect so, yes. I haven't looked at those yet.
>
>> Please don't take this to indicate any lack of enthusiasm for what you
>> are doing - it's a great development and I'm sure it will help a lot!
>> We really need to unify all the tests so we can run them all in one
>> step.
>
> Thanks:-)
>
>> I just think we should aim to have the automated tests run in a few
>> seconds (let's say 5-10 at the outside). We need to make sure that the
>> python framework will allow this even when running thousands of tests.
>
> I'd be happy with something that took minutes, or longer. Given "build
> all boards" takes a very long time (and I'm sure we'd like everyone to
> do that, although I imagine few do), something of the same order of
> magnitude might even be reasonable? Thousands of test sounds like rather
> a lot; perhaps that number makes sense for tiny unit tests. I was
> thinking of testing fewer larger user-visible features that generally
> will have disk/network/... IO rates as the limiting factor. Perhaps one
> of those tests could indeed be "run 1000 tiny C-based unit tests via a
> single shell command".

We have a few hundred tests at present and our coverage is poor, so I
don't think 1000 tests is out of the question within a year or two.

Just because tests are complex does not mean they need to be slow. At
least with sandbox, even a complex test should be able to run in a few
milliseconds in most cases.

Regards,
Simon