[tbot] How to debug acquire_linux
Stefano Babic
sbabic at denx.de
Fri Nov 23 10:22:01 UTC 2018
Hi Harald,
On 23/11/18 10:46, Harald Seiler wrote:
> Hello Stefano,
>
> first of all, so we are all on the same page. The implementation
> for login is here:
>
> https://github.com/Rahix/tbot/blob/master/tbot/machine/board/linux.py#L62
>
> To break it down:
>
> 1. Wait for the login prompt ("login: ") to appear at the END
> of the stream (L67)
Prompt was detected
> 2. Send the username + "\n" (L68)
> 3. If a password is configured, wait for the password prompt,
> at the END of the stream (L73)
There is no password
> 4. Send the password + "\n"
> 5. Wait for `login_wait` seconds for the shell to come up (L77)
With 1-2 seconds, it waits indefinetely.
I set now login_wait=10, it works. It is a long time, but it is still a
work-around.
> 6. Initialize the shell. Sends a bunch of commands to get the
> shell into a known state. Then wait for the TBot prompt. (L79)
>
> This process has 3 places that can block: 1, 3, and 6.
>
It looks to me that 1,3 are fine
> I can get behind allowing 1 and 3 to search for there prompt in the middle of
> the stream as well, to circumvent the kernel-log clobber issue.
>
> Number 6 needs a different solution though. The reason is that once
> TBot has acquired the shell, additional input from the serial console might
> potentially mess up command outputs. This will lead to a whole load of new issues,
> so TBot should somehow prevent that in a reasonable way.
ok - that means it is reasonable to add "loglevel=0" (or at least =2) to
all boards to be tested ?
>
> On Fri, 2018-11-23 at 10:02 +0100, Stefano Babic wrote:
>> Hi Harald,
>>
>> On 23/11/18 09:48, Harald Seiler wrote:
>>> Hi Stefano,
>>>
>>> On Thu, 2018-11-22 at 23:47 +0100, Stefano Babic wrote:
>>>> Hi,
>>>>
>>>> On 22/11/18 19:24, Stefano Babic wrote:
>>>>> Hi,
>>>>>
>>>>> I have a board that cannot be got in linux. It looks like that
>>>>> tbot.acquire_linux() hangs forever (can I set a tiimeout ?)
>>>>>
>>>>> I ran a testcase that works fine for another board - I guess that the
>>>>> issue is due to the fact that this board has a lot of output in part
>>>>> caused by systemd and in part due to kernel messages that are printed to
>>>>> the console. When I start the testcase, tbot remains in <board>-linux. I
>>>>> can understand that board is up and works because software activates
>>>>> some relays, but it is not recognized by tbot.
>>>>>
>>>>> The weird thing is that "interactive_linux" works. How can I debug this
>>>>> ? Or do we have to block kernel messages to the console via kernel
>>>>> parameter ?
>>>>>
>>>>
>>>> I have some more details. The not working board has "bash" as shell and
>>>> TERM is set to "xterm". Running tbot with -vvv, after "root" as user,
>>>> the bord answers with some terminal control characters. It looks like
>>>> that tbot gets confused. If I replace "bash" with "busybox", it works.
>>>
>>> Hmm, can you send me the output of -vvv? The only cases I know don't
>>> work is when something is sent after the password prompt or once the
>>> shell comes up ...
>>>
>>
>> Ouch..I have already done, but I see that the ML was removed in the
>> answer. I attach Heiko's answer, too.
>
> Hmm, your output doesn't look bad to me at first glance ...
>
>>
>> The thing is:
>>
>> ├─'login: '
>
> The login prompt was successfully detected here, TBot sent the
> username (so step 1 didn't block in this case):
Right
>
>> │ │ <> CCU-1 login: root
>> │ ├─'root\n\x1b7\x1b[r\x1b[999;999H\x1b[6n'
>> │ ├─'\x1b'
>> │ ├─'8'
>
> These control sequenced are ignored
Well, this is the more evident difference with busybox. With busybox,
just '#' is returned.
>
>> │ ├─'root at CCU-1:~'
>> │ ├─'# '
>
> This prompt is also ignored ... If tbot hangs here, I can think of two things
> that might have happened:
>
> 1. You set a password and tbot rightfully waits for the password prompt
No.
> 2. Something went wrong with the shell initialization
Yes.
>
> https://github.com/Rahix/tbot/blob/master/tbot/machine/channel/channel.py#L229
>
> The first blocking call happens in line 240, where the prompt is set. `raw_command`
> sends something like `PS1="SOME_TBOT_PROMPT> "\n` and then waits for the prompt to
> appear again. Maybe some kernel message happend just after the prompt returned?
But then we should see them in the log, or not ?
But yes, in last tests I remove again loglevel=0, and kernel messages
can appear.
This could be a reason why increasing login_wait helps. After some
seconds, board is surely in a idle state without kernel messages.
>
> What irritates me is that we should definitely see this. `raw_command` calls
> `read_until_prompt`, which is the place where the -vvv debug output is generated ...
Agree.
>
>> So there are some control characters (I have not decoded them) just if
>> the target runs the bash shell. With busybox, it works and tbot completes.
>>
>>
>>> You could try defining `login_wait`[1] in your BoardLinuxMachine
>>> and see if that helps ...
>>
>> This does not help, I have already tried.
Note: I tried with login_wait=1 and login_wait=2. login_wait=10 helps.
>>
>> IMHO the login should be done more resistant - I proposed to try to send
>> some CR in a loop to check if login was successful in my previous mail.
>
> Yes, it definitely should. The solution to just send ^C in a loop is not
> one I like, however. In my opinion, we should avoid "brute-forcing" the
> board into a certain state, as this can have lots of uncertainties. As mentioned
> before, as soon as we have the shell, we can't allow clobber anyway ...
ok
>
>>>
>>>> The second not working case seems if the board prints some data on the
>>>> console after the login propt is sent. This is the case when a daemon
>>>> has the stdout set to the console. Also in this case, it seems that tbot
>>>> gets confused.
>>>
>>> Yes, this is an open issue:
>>>
>>> https://github.com/Rahix/tbot/issues/1
>>>
>>> I don't know how we best handle this case ... Do you have any ideas?
>>
>> I think there are more use cases. One is the kernel messages. Kernel
>> messages can be turned off at boot time with loglevel=0 (but they can be
>> read via dmesg). This is a suitable way.
>>
>> My use case is an application (SWUpdate) in a small system (rescue
>> system). It makes sense to redirect application stdout to the console,
>> becaus a rescue does not have many options. It should at least possible
>> to login and to collect the output, while a testcase will push an image
>> from network (and then, there is no exec on the board).
>
> Is shell and application output interleaved
They are interleaved.
> in this case or is
> it in a well defined order?
Regards,
Stefano
--
=====================================================================
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-53 Fax: +49-8142-66989-80 Email: sbabic at denx.de
=====================================================================
More information about the tbot
mailing list