Example: https://stainless.corp.google.com/browse/chromeos-autotest-results/swarming-3fc662e5bb899111/
Relevant log lines are:
09/06 00:37:36.574 WARNI| cros_host:1001| Unable to restart ui, rebooting device.
09/06 00:37:36.586 DEBUG| ssh_host:0310| Running (ssh) 'cat /etc/lsb-release' from 'cleanup|reboot|get_board|parse_cmd_output|run|run_very_slowly'
09/06 00:37:36.592 INFO | ssh_multiplex:0079| Master ssh connection to chromeos6-row4-rack10-host7 is down.
09/06 00:37:36.592 DEBUG| ssh_multiplex:0125| Nuking ssh master_job
09/06 00:37:36.592 DEBUG| ssh_multiplex:0130| Cleaning ssh master_tempdir
09/06 00:37:36.593 INFO | ssh_multiplex:0096| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_z366PVssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos6-row4-rack10-host7'
09/06 00:37:36.593 DEBUG| utils:0219| Running '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_z366PVssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos6-row4-rack10-host7'
09/06 00:37:41.456 ERROR| utils:2765| Timed out waiting for condition: Wait for a socket file to exist
09/06 00:37:41.457 INFO | ssh_multiplex:0113| Timed out waiting for master-ssh connection to be established.
09/06 00:38:44.700 ERROR| reset:0037| Reset failed due to Exception.
Traceback (most recent call last):
File "/usr/local/autotest/server/control_segments/reset", line 29, in reset
target.cleanup()
File "/usr/local/autotest/server/hosts/cros_host.py", line 1004, in cleanup
super(CrosHost, self).cleanup()
File "/usr/local/autotest/server/hosts/remote.py", line 249, in cleanup
self.reboot()
File "/usr/local/autotest/server/hosts/cros_host.py", line 1030, in reboot
board_fullname = self.get_board()
File "/usr/local/autotest/server/hosts/cros_host.py", line 1773, in get_board
run_method=self.run)
File "/usr/local/autotest/client/bin/utils.py", line 1530, in parse_cmd_output
cmd_result = run_method(command, stdout_tee=None, stderr_tee=None)
File "/usr/local/autotest/server/hosts/ssh_host.py", line 335, in run
return self.run_very_slowly(*args, **kwargs)
File "/usr/local/autotest/server/hosts/ssh_host.py", line 324, in run_very_slowly
ssh_failure_retry_ok)
File "/usr/local/autotest/server/hosts/ssh_host.py", line 260, in _run
raise error.AutoservSSHTimeout("ssh timed out", result)
AutoservSSHTimeout: ('ssh timed out', * Command:
/usr/bin/ssh -a -x -o ControlPath=/tmp/_autotmp_z366PVssh-master/socket
-o Protocol=2 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null
-o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o
ServerAliveCountMax=3 -o ConnectionAttempts=4 -l root -p 22
chromeos6-row4-rack10-host7 "export LIBC_FATAL_STDERR_=1; if type
\"logger\" > /dev/null 2>&1; then logger -tag \"autotest\"
\"server[stack::get_board|parse_cmd_output|run] -> ssh_run(cat /etc/lsb-
release)\";fi; cat /etc/lsb-release"
Exit status: 255
Nobody should be surprised: We were trying to reboot() because cleanup() failed to restart ui. The DUT is bonkers. Then asking the DUT for information is likely to timeout and fail. So, this wastes time, then fails.
Comment 1 by pprabhu@chromium.org
, Sep 7