[tbot] How to debug acquire_linux

Harald Seiler hws at denx.de
Fri Nov 23 09:46:57 UTC 2018


Hello Stefano,

first of all, so we are all on the same page.  The implementation
for login is here:

	https://github.com/Rahix/tbot/blob/master/tbot/machine/board/linux.py#L62

To break it down:

1. Wait for the login prompt ("login: ") to appear at the END
   of the stream (L67)
2. Send the username + "\n" (L68)
3. If a password is configured, wait for the password prompt,
   at the END of the stream (L73)
4. Send the password + "\n"
5. Wait for `login_wait` seconds for the shell to come up (L77)
6. Initialize the shell. Sends a bunch of commands to get the
   shell into a known state.  Then wait for the TBot prompt. (L79)

This process has 3 places that can block: 1, 3, and 6.

I can get behind allowing 1 and 3 to search for there prompt in the middle of
the stream as well, to circumvent the kernel-log clobber issue.

Number 6 needs a different solution though.  The reason is that once
TBot has acquired the shell, additional input from the serial console might
potentially mess up command outputs.  This will lead to a whole load of new issues,
so TBot should somehow prevent that in a reasonable way.

On Fri, 2018-11-23 at 10:02 +0100, Stefano Babic wrote:
> Hi Harald,
> 
> On 23/11/18 09:48, Harald Seiler wrote:
> > Hi Stefano,
> > 
> > On Thu, 2018-11-22 at 23:47 +0100, Stefano Babic wrote:
> > > Hi,
> > > 
> > > On 22/11/18 19:24, Stefano Babic wrote:
> > > > Hi,
> > > > 
> > > > I have a board that cannot be got in linux. It looks like that
> > > > tbot.acquire_linux() hangs forever (can I set a tiimeout ?)
> > > > 
> > > > I ran a testcase that works fine for another board - I guess that the
> > > > issue is due to the fact that this board has a lot of output in part
> > > > caused by systemd and in part due to kernel messages that are printed to
> > > > the console. When I start the testcase, tbot remains in <board>-linux. I
> > > > can understand that board is up and works because software activates
> > > > some relays, but it is not recognized by tbot.
> > > > 
> > > > The weird thing is that "interactive_linux" works. How can I debug this
> > > > ? Or do we have to block kernel messages to the console via kernel
> > > > parameter ?
> > > > 
> > > 
> > > I have some more details. The not working board has "bash" as shell and
> > > TERM is set to "xterm". Running tbot with -vvv, after "root" as user,
> > > the bord answers with some terminal control characters. It looks like
> > > that tbot gets confused. If I replace "bash" with "busybox", it works.
> > 
> > Hmm, can you send me the output of -vvv? The only cases I know don't
> > work is when something is sent after the password prompt or once the
> > shell comes up ...
> > 
> 
> Ouch..I have already done, but I see that the ML was removed in the
> answer. I attach Heiko's answer, too.

Hmm, your output doesn't look bad to me at first glance ...

> 
> The thing is:
> 
>    ├─'login: '

The login prompt was successfully detected here, TBot sent the
username (so step 1 didn't block in this case):

> │   │    <> CCU-1 login: root
> │   ├─'root\n\x1b7\x1b[r\x1b[999;999H\x1b[6n'
> │   ├─'\x1b'
> │   ├─'8'

These control sequenced are ignored

> │   ├─'root at CCU-1:~'
> │   ├─'# '

This prompt is also ignored ... If tbot hangs here, I can think of two things
that might have happened:

1. You set a password and tbot rightfully waits for the password prompt
2. Something went wrong with the shell initialization

	https://github.com/Rahix/tbot/blob/master/tbot/machine/channel/channel.py#L229

The first blocking call happens in line 240, where the prompt is set. `raw_command`
sends something like `PS1="SOME_TBOT_PROMPT> "\n` and then waits for the prompt to
appear again.  Maybe some kernel message happend just after the prompt returned?

What irritates me is that we should definitely see this.  `raw_command` calls
`read_until_prompt`, which is the place where the -vvv debug output is generated ...

> So there are some control characters (I have not decoded them) just if
> the target runs the bash shell. With busybox, it works and tbot completes.
> 
> 
> > You could try defining `login_wait`[1] in your BoardLinuxMachine
> > and see if that helps ...
> 
> This does not help, I have already tried.
> 
> IMHO the login should be done more resistant - I proposed to try to send
> some CR in a loop to check if login was successful in my previous mail.

Yes, it definitely should.  The solution to just send ^C in a loop is not
one I like, however.  In my opinion, we should avoid "brute-forcing" the
board into a certain state, as this can have lots of uncertainties.  As mentioned
before, as soon as we have the shell, we can't allow clobber anyway ...

> > 
> > > The second not working case seems if the board prints some data on the
> > > console after the login propt is sent. This is the case when a daemon
> > > has the stdout set to the console. Also in this case, it seems that tbot
> > > gets confused.
> > 
> > Yes, this is an open issue:
> > 
> > 	https://github.com/Rahix/tbot/issues/1
> > 
> > I don't know how we best handle this case ... Do you have any ideas?
> 
> I think there are more use cases. One is the kernel messages. Kernel
> messages can be turned off at boot time with loglevel=0 (but they can be
> read via dmesg). This is a suitable way.
> 
> My use case is an application (SWUpdate) in a small system (rescue
> system). It makes sense to redirect application stdout to the console,
> becaus a rescue does not have many options. It should at least possible
> to login and to collect the output, while a testcase will push an image
> from network (and then, there is no exec on the board).

Is shell and application output interleaved in this case or is
it in a well defined order?

> Regards,
> Stefano
> 

-- 
Harald

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-62  Fax: +49-8142-66989-80   Email: hws at denx.de


More information about the tbot mailing list