[tbot] How to debug acquire_linux

Fri Nov 23 10:22:01 UTC 2018

Hi Harald,

On 23/11/18 10:46, Harald Seiler wrote:
> Hello Stefano,
> 
> first of all, so we are all on the same page.  The implementation
> for login is here:
> 
> 	https://github.com/Rahix/tbot/blob/master/tbot/machine/board/linux.py#L62
> 
> To break it down:
> 
> 1. Wait for the login prompt ("login: ") to appear at the END
>    of the stream (L67)

Prompt was detected

> 2. Send the username + "\n" (L68)
> 3. If a password is configured, wait for the password prompt,
>    at the END of the stream (L73)

There is no password

> 4. Send the password + "\n"
> 5. Wait for `login_wait` seconds for the shell to come up (L77)

With 1-2 seconds, it waits indefinetely.

I set now login_wait=10, it works. It is a long time, but it is still a
work-around.

> 6. Initialize the shell. Sends a bunch of commands to get the
>    shell into a known state.  Then wait for the TBot prompt. (L79)
> 
> This process has 3 places that can block: 1, 3, and 6.
> 

It looks to me that 1,3 are fine

> I can get behind allowing 1 and 3 to search for there prompt in the middle of
> the stream as well, to circumvent the kernel-log clobber issue.
> 
> Number 6 needs a different solution though.  The reason is that once
> TBot has acquired the shell, additional input from the serial console might
> potentially mess up command outputs.  This will lead to a whole load of new issues,
> so TBot should somehow prevent that in a reasonable way.

ok - that means it is reasonable to add "loglevel=0" (or at least =2) to
all boards to be tested ?

> 
> On Fri, 2018-11-23 at 10:02 +0100, Stefano Babic wrote:
>> Hi Harald,
>>
>> On 23/11/18 09:48, Harald Seiler wrote:
>>> Hi Stefano,
>>>
>>> On Thu, 2018-11-22 at 23:47 +0100, Stefano Babic wrote:
>>>> Hi,
>>>>
>>>> On 22/11/18 19:24, Stefano Babic wrote:
>>>>> Hi,
>>>>>
>>>>> I have a board that cannot be got in linux. It looks like that
>>>>> tbot.acquire_linux() hangs forever (can I set a tiimeout ?)
>>>>>
>>>>> I ran a testcase that works fine for another board - I guess that the
>>>>> issue is due to the fact that this board has a lot of output in part
>>>>> caused by systemd and in part due to kernel messages that are printed to
>>>>> the console. When I start the testcase, tbot remains in <board>-linux. I
>>>>> can understand that board is up and works because software activates
>>>>> some relays, but it is not recognized by tbot.
>>>>>
>>>>> The weird thing is that "interactive_linux" works. How can I debug this
>>>>> ? Or do we have to block kernel messages to the console via kernel
>>>>> parameter ?
>>>>>
>>>>
>>>> I have some more details. The not working board has "bash" as shell and
>>>> TERM is set to "xterm". Running tbot with -vvv, after "root" as user,
>>>> the bord answers with some terminal control characters. It looks like
>>>> that tbot gets confused. If I replace "bash" with "busybox", it works.
>>>
>>> Hmm, can you send me the output of -vvv? The only cases I know don't
>>> work is when something is sent after the password prompt or once the
>>> shell comes up ...
>>>
>>
>> Ouch..I have already done, but I see that the ML was removed in the
>> answer. I attach Heiko's answer, too.
> 
> Hmm, your output doesn't look bad to me at first glance ...
> 
>>
>> The thing is:
>>
>>    ├─'login: '
> 
> The login prompt was successfully detected here, TBot sent the
> username (so step 1 didn't block in this case):

Right

> 
>> │   │    <> CCU-1 login: root
>> │   ├─'root\n\x1b7\x1b[r\x1b[999;999H\x1b[6n'
>> │   ├─'\x1b'
>> │   ├─'8'
> 
> These control sequenced are ignored

Well, this is the more evident difference with busybox. With busybox,
just '#' is returned.

> 
>> │   ├─'root at CCU-1:~'
>> │   ├─'# '
> 
> This prompt is also ignored ... If tbot hangs here, I can think of two things
> that might have happened:
> 
> 1. You set a password and tbot rightfully waits for the password prompt

No.

> 2. Something went wrong with the shell initialization

Yes.

> 
> 	https://github.com/Rahix/tbot/blob/master/tbot/machine/channel/channel.py#L229
> 
> The first blocking call happens in line 240, where the prompt is set. `raw_command`
> sends something like `PS1="SOME_TBOT_PROMPT> "\n` and then waits for the prompt to
> appear again.  Maybe some kernel message happend just after the prompt returned?

But then we should see them in the log, or not ?

But yes, in last tests I remove again loglevel=0, and kernel messages
can appear.

This could be a reason why increasing login_wait helps. After some
seconds, board is surely in a idle state without kernel messages.

> 
> What irritates me is that we should definitely see this.  `raw_command` calls
> `read_until_prompt`, which is the place where the -vvv debug output is generated ...

Agree.

> 
>> So there are some control characters (I have not decoded them) just if
>> the target runs the bash shell. With busybox, it works and tbot completes.
>>
>>
>>> You could try defining `login_wait`[1] in your BoardLinuxMachine
>>> and see if that helps ...
>>
>> This does not help, I have already tried.

Note: I tried with login_wait=1 and login_wait=2. login_wait=10 helps.

>>
>> IMHO the login should be done more resistant - I proposed to try to send
>> some CR in a loop to check if login was successful in my previous mail.
> 
> Yes, it definitely should.  The solution to just send ^C in a loop is not
> one I like, however.  In my opinion, we should avoid "brute-forcing" the
> board into a certain state, as this can have lots of uncertainties.  As mentioned
> before, as soon as we have the shell, we can't allow clobber anyway ...

ok

> 
>>>
>>>> The second not working case seems if the board prints some data on the
>>>> console after the login propt is sent. This is the case when a daemon
>>>> has the stdout set to the console. Also in this case, it seems that tbot
>>>> gets confused.
>>>
>>> Yes, this is an open issue:
>>>
>>> 	https://github.com/Rahix/tbot/issues/1
>>>
>>> I don't know how we best handle this case ... Do you have any ideas?
>>
>> I think there are more use cases. One is the kernel messages. Kernel
>> messages can be turned off at boot time with loglevel=0 (but they can be
>> read via dmesg). This is a suitable way.
>>
>> My use case is an application (SWUpdate) in a small system (rescue
>> system). It makes sense to redirect application stdout to the console,
>> becaus a rescue does not have many options. It should at least possible
>> to login and to collect the output, while a testcase will push an image
>> from network (and then, there is no exec on the board).
> 
> Is shell and application output interleaved

They are interleaved.

> in this case or is
> it in a well defined order?

Regards,
Stefano

-- 
=====================================================================
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-53 Fax: +49-8142-66989-80 Email: sbabic at denx.de
=====================================================================