[tbot] How to debug acquire_linux
Harald Seiler
hws at denx.de
Fri Nov 23 09:46:57 UTC 2018
Hello Stefano,
first of all, so we are all on the same page. The implementation
for login is here:
https://github.com/Rahix/tbot/blob/master/tbot/machine/board/linux.py#L62
To break it down:
1. Wait for the login prompt ("login: ") to appear at the END
of the stream (L67)
2. Send the username + "\n" (L68)
3. If a password is configured, wait for the password prompt,
at the END of the stream (L73)
4. Send the password + "\n"
5. Wait for `login_wait` seconds for the shell to come up (L77)
6. Initialize the shell. Sends a bunch of commands to get the
shell into a known state. Then wait for the TBot prompt. (L79)
This process has 3 places that can block: 1, 3, and 6.
I can get behind allowing 1 and 3 to search for there prompt in the middle of
the stream as well, to circumvent the kernel-log clobber issue.
Number 6 needs a different solution though. The reason is that once
TBot has acquired the shell, additional input from the serial console might
potentially mess up command outputs. This will lead to a whole load of new issues,
so TBot should somehow prevent that in a reasonable way.
On Fri, 2018-11-23 at 10:02 +0100, Stefano Babic wrote:
> Hi Harald,
>
> On 23/11/18 09:48, Harald Seiler wrote:
> > Hi Stefano,
> >
> > On Thu, 2018-11-22 at 23:47 +0100, Stefano Babic wrote:
> > > Hi,
> > >
> > > On 22/11/18 19:24, Stefano Babic wrote:
> > > > Hi,
> > > >
> > > > I have a board that cannot be got in linux. It looks like that
> > > > tbot.acquire_linux() hangs forever (can I set a tiimeout ?)
> > > >
> > > > I ran a testcase that works fine for another board - I guess that the
> > > > issue is due to the fact that this board has a lot of output in part
> > > > caused by systemd and in part due to kernel messages that are printed to
> > > > the console. When I start the testcase, tbot remains in <board>-linux. I
> > > > can understand that board is up and works because software activates
> > > > some relays, but it is not recognized by tbot.
> > > >
> > > > The weird thing is that "interactive_linux" works. How can I debug this
> > > > ? Or do we have to block kernel messages to the console via kernel
> > > > parameter ?
> > > >
> > >
> > > I have some more details. The not working board has "bash" as shell and
> > > TERM is set to "xterm". Running tbot with -vvv, after "root" as user,
> > > the bord answers with some terminal control characters. It looks like
> > > that tbot gets confused. If I replace "bash" with "busybox", it works.
> >
> > Hmm, can you send me the output of -vvv? The only cases I know don't
> > work is when something is sent after the password prompt or once the
> > shell comes up ...
> >
>
> Ouch..I have already done, but I see that the ML was removed in the
> answer. I attach Heiko's answer, too.
Hmm, your output doesn't look bad to me at first glance ...
>
> The thing is:
>
> ├─'login: '
The login prompt was successfully detected here, TBot sent the
username (so step 1 didn't block in this case):
> │ │ <> CCU-1 login: root
> │ ├─'root\n\x1b7\x1b[r\x1b[999;999H\x1b[6n'
> │ ├─'\x1b'
> │ ├─'8'
These control sequenced are ignored
> │ ├─'root at CCU-1:~'
> │ ├─'# '
This prompt is also ignored ... If tbot hangs here, I can think of two things
that might have happened:
1. You set a password and tbot rightfully waits for the password prompt
2. Something went wrong with the shell initialization
https://github.com/Rahix/tbot/blob/master/tbot/machine/channel/channel.py#L229
The first blocking call happens in line 240, where the prompt is set. `raw_command`
sends something like `PS1="SOME_TBOT_PROMPT> "\n` and then waits for the prompt to
appear again. Maybe some kernel message happend just after the prompt returned?
What irritates me is that we should definitely see this. `raw_command` calls
`read_until_prompt`, which is the place where the -vvv debug output is generated ...
> So there are some control characters (I have not decoded them) just if
> the target runs the bash shell. With busybox, it works and tbot completes.
>
>
> > You could try defining `login_wait`[1] in your BoardLinuxMachine
> > and see if that helps ...
>
> This does not help, I have already tried.
>
> IMHO the login should be done more resistant - I proposed to try to send
> some CR in a loop to check if login was successful in my previous mail.
Yes, it definitely should. The solution to just send ^C in a loop is not
one I like, however. In my opinion, we should avoid "brute-forcing" the
board into a certain state, as this can have lots of uncertainties. As mentioned
before, as soon as we have the shell, we can't allow clobber anyway ...
> >
> > > The second not working case seems if the board prints some data on the
> > > console after the login propt is sent. This is the case when a daemon
> > > has the stdout set to the console. Also in this case, it seems that tbot
> > > gets confused.
> >
> > Yes, this is an open issue:
> >
> > https://github.com/Rahix/tbot/issues/1
> >
> > I don't know how we best handle this case ... Do you have any ideas?
>
> I think there are more use cases. One is the kernel messages. Kernel
> messages can be turned off at boot time with loglevel=0 (but they can be
> read via dmesg). This is a suitable way.
>
> My use case is an application (SWUpdate) in a small system (rescue
> system). It makes sense to redirect application stdout to the console,
> becaus a rescue does not have many options. It should at least possible
> to login and to collect the output, while a testcase will push an image
> from network (and then, there is no exec on the board).
Is shell and application output interleaved in this case or is
it in a well defined order?
> Regards,
> Stefano
>
--
Harald
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-62 Fax: +49-8142-66989-80 Email: hws at denx.de
More information about the tbot
mailing list