New issue
Advanced search Search tips

Issue 881425 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Sep 24
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

host cleanup() via reboot() fails because getting information from DUT fails

Project Member Reported by pprabhu@chromium.org, Sep 6

Issue description

Example: https://stainless.corp.google.com/browse/chromeos-autotest-results/swarming-3fc662e5bb899111/

Relevant log lines are:
09/06 00:37:36.574 WARNI|         cros_host:1001| Unable to restart ui, rebooting device.
09/06 00:37:36.586 DEBUG|          ssh_host:0310| Running (ssh) 'cat /etc/lsb-release' from 'cleanup|reboot|get_board|parse_cmd_output|run|run_very_slowly'
09/06 00:37:36.592 INFO |     ssh_multiplex:0079| Master ssh connection to chromeos6-row4-rack10-host7 is down.
09/06 00:37:36.592 DEBUG|     ssh_multiplex:0125| Nuking ssh master_job
09/06 00:37:36.592 DEBUG|     ssh_multiplex:0130| Cleaning ssh master_tempdir
09/06 00:37:36.593 INFO |     ssh_multiplex:0096| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_z366PVssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos6-row4-rack10-host7'
09/06 00:37:36.593 DEBUG|             utils:0219| Running '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_z366PVssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos6-row4-rack10-host7'
09/06 00:37:41.456 ERROR|             utils:2765| Timed out waiting for condition: Wait for a socket file to exist
09/06 00:37:41.457 INFO |     ssh_multiplex:0113| Timed out waiting for master-ssh connection to be established.
09/06 00:38:44.700 ERROR|             reset:0037| Reset failed due to Exception.
Traceback (most recent call last):
  File "/usr/local/autotest/server/control_segments/reset", line 29, in reset
    target.cleanup()
  File "/usr/local/autotest/server/hosts/cros_host.py", line 1004, in cleanup
    super(CrosHost, self).cleanup()
  File "/usr/local/autotest/server/hosts/remote.py", line 249, in cleanup
    self.reboot()
  File "/usr/local/autotest/server/hosts/cros_host.py", line 1030, in reboot
    board_fullname = self.get_board()
  File "/usr/local/autotest/server/hosts/cros_host.py", line 1773, in get_board
    run_method=self.run)
  File "/usr/local/autotest/client/bin/utils.py", line 1530, in parse_cmd_output
    cmd_result = run_method(command, stdout_tee=None, stderr_tee=None)
  File "/usr/local/autotest/server/hosts/ssh_host.py", line 335, in run
    return self.run_very_slowly(*args, **kwargs)
  File "/usr/local/autotest/server/hosts/ssh_host.py", line 324, in run_very_slowly
    ssh_failure_retry_ok)
  File "/usr/local/autotest/server/hosts/ssh_host.py", line 260, in _run
    raise error.AutoservSSHTimeout("ssh timed out", result)
AutoservSSHTimeout: ('ssh timed out', * Command: 
    /usr/bin/ssh -a -x  -o ControlPath=/tmp/_autotmp_z366PVssh-master/socket
    -o Protocol=2 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null
    -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o
    ServerAliveCountMax=3 -o ConnectionAttempts=4 -l root -p 22
    chromeos6-row4-rack10-host7 "export LIBC_FATAL_STDERR_=1; if type
    \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\"
    \"server[stack::get_board|parse_cmd_output|run] -> ssh_run(cat /etc/lsb-
    release)\";fi; cat /etc/lsb-release"
Exit status: 255


Nobody should be surprised: We were trying to reboot() because cleanup() failed to restart ui. The DUT is bonkers. Then asking the DUT for information is likely to timeout and fail. So, this wastes time, then fails.
 
Labels: -Hotlist-Deputy
Status: Fixed (was: Started)
CL incorrectly referenced some other bug...

Sign in to add a comment