kefka release failing HWTest sanity provision: ABORT: Host did not return from reboot |
||||||
Issue descriptionThis happened twice in a row. https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934797447039826176 https://stainless.corp.google.com/browse/chromeos-autotest-results/240273698-chromeos-test/ ... 09/21 03:46:13.504 ERROR| utils:0287| [stderr] [0921/092155:INFO:update_engine_client.cc(508)] Querying Update Engine status... 09/21 03:48:27.504 ERROR| utils:2769| Timed out waiting for condition: Wait for a socket file to exist 09/21 03:49:31.049 ERROR| utils:0287| [stderr] ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out 09/21 03:50:34.601 ERROR| utils:0287| [stderr] ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out 09/21 03:51:04.595 ERROR| utils:2769| Timed out waiting for condition: Wait for a socket file to exist 09/21 03:52:08.457 ERROR| utils:0287| [stderr] ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out 09/21 03:52:14.346 ERROR| utils:2769| Timed out waiting for condition: Wait for a socket file to exist 09/21 03:53:17.901 ERROR| utils:0287| [stderr] ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out 09/21 03:54:21.449 ERROR| utils:0287| [stderr] ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out 09/21 03:54:51.473 ERROR| utils:2769| Timed out waiting for condition: Wait for a socket file to exist 09/21 03:55:55.018 ERROR| utils:0287| [stderr] ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out 09/21 03:56:00.905 ERROR| utils:2769| Timed out waiting for condition: Wait for a socket file to exist 09/21 03:57:16.182 ERROR| utils:2769| Timed out waiting for condition: Wait for a socket file to exist 09/21 03:57:39.319 ERROR| autoupdater:0998| Failure preparing host prior to update. Traceback (most recent call last): File "/usr/local/autotest/server/cros/autoupdater.py", line 994, in run_update self._prepare_host() File "/usr/local/autotest/server/cros/autoupdater.py", line 819, in _prepare_host self.host.reboot(timeout=self.host.REBOOT_TIMEOUT) File "/usr/local/autotest/server/hosts/cros_host.py", line 1046, in reboot super(CrosHost, self).reboot(**dargs) File "/usr/local/autotest/server/hosts/remote.py", line 169, in reboot self.log_op(self.OP_REBOOT, reboot) File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 556, in log_op self.job.run_op(op, op_func, self.get_kernel_ver) File "/usr/local/autotest/server/server_job.py", line 946, in run_op op_func() File "/usr/local/autotest/server/hosts/remote.py", line 165, in reboot **dargs) File "/usr/local/autotest/server/hosts/remote.py", line 243, in wait_for_restart self.log_op(self.OP_REBOOT, op_func) File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 560, in log_op op_func() File "/usr/local/autotest/server/hosts/remote.py", line 242, in op_func super(RemoteHost, self).wait_for_restart(timeout=timeout, **dargs) File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 310, in wait_for_restart raise error.AutoservRebootError("Host did not return from reboot") AutoservRebootError: Host did not return from reboot 09/21 03:57:45.234 ERROR| utils:2769| Timed out waiting for condition: Wait for a socket file to exist 09/21 03:58:53.652 ERROR| utils:2769| Timed out waiting for condition: Wait for a socket file to exist 09/21 03:59:57.177 ERROR| utils:0287| [stderr] ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out 09/21 03:59:57.179 ERROR| test:0252| Error cleaning up the sysinfo autotest/host objects, ignoring it Traceback (most recent call last): File "/usr/local/autotest/server/test.py", line 245, in cleanup self.host.close() File "/usr/local/autotest/server/hosts/cros_host.py", line 742, in close super(CrosHost, self).close() File "/usr/local/autotest/server/hosts/abstract_ssh.py", line 808, in close super(AbstractSSHHost, self).close() File "/usr/local/autotest/server/hosts/remote.py", line 55, in close self.run('rm -rf "%s"' % (utils.sh_escape(dir))) File "/usr/local/autotest/server/hosts/ssh_host.py", line 335, in run return self.run_very_slowly(*args, **kwargs) File "/usr/local/autotest/server/hosts/ssh_host.py", line 324, in run_very_slowly ssh_failure_retry_ok) File "/usr/local/autotest/server/hosts/ssh_host.py", line 260, in _run raise error.AutoservSSHTimeout("ssh timed out", result) AutoservSSHTimeout: ('ssh timed out', * Command: /usr/bin/ssh -a -x -o ControlPath=/tmp/_autotmp_ew7YXOssh-master/socket -o Protocol=2 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -l root -p 22 chromeos2-row4-rack8-host13 "export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::close|close|run] -> ssh_run(rm -rf \\\"/tmp/sysinfo /autoserv-BBLXHU\\\")\";fi; rm -rf \"/tmp/sysinfo/autoserv-BBLXHU\"" Exit status: 255 Duration: 63.4932498932 stderr: ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out) 09/21 03:59:57.183 ERROR| control:0074| Provision failed due to Exception. Traceback (most recent call last): File "/usr/local/autotest/results/hosts/chromeos2-row4-rack8-host13/1315752-provision/20182109034559/control.srv", line 53, in provision_machine provision.Provision) File "/usr/local/autotest/server/cros/provision.py", line 400, in run_special_task_actions task.run_task_actions(job, host, labels) File "/usr/local/autotest/server/cros/provision.py", line 173, in run_task_actions raise SpecialTaskActionException() SpecialTaskActionException 09/21 03:59:57.184 ERROR| server_job:0807| Exception escaped control file, job aborting: Traceback (most recent call last): File "/usr/local/autotest/server/server_job.py", line 799, in run self._execute_code(server_control_file, namespace) File "/usr/local/autotest/server/server_job.py", line 1322, in _execute_code execfile(code_file, namespace, namespace) File "/usr/local/autotest/results/hosts/chromeos2-row4-rack8-host13/1315752-provision/20182109034559/control.srv", line 108, in <module> job.parallel_simple(provision_machine, machines) File "/usr/local/autotest/server/server_job.py", line 607, in parallel_simple log=log, timeout=timeout, return_results=return_results) File "/usr/local/autotest/server/subcommand.py", line 98, in parallel_simple function(arg) File "/usr/local/autotest/results/hosts/chromeos2-row4-rack8-host13/1315752-provision/20182109034559/control.srv", line 99, in provision_machine raise Exception('') Exception 09/21 03:59:57.188 ERROR| logging_manager:0626| tko parser: {'drone': 'cros-full-0020.mtv.corp.google.com', 'user': 'chromeos-test', 'job_started': 1537526760, 'hostname': 'chromeos2-row4-rack8-host13', 'status_version': 1, 'label': ''} 09/21 03:59:57.189 ERROR| logging_manager:0626| tko parser: MACHINE NAME: chromeos2-row4-rack8-host13 09/21 03:59:57.190 ERROR| logging_manager:0626| tko parser: MACHINE GROUP: kefka 09/21 03:59:57.190 ERROR| logging_manager:0626| tko parser: parsing partial test ---- SERVER_JOB 09/21 03:59:57.190 ERROR| logging_manager:0626| tko parser: parsing partial test None provision 09/21 03:59:57.191 ERROR| logging_manager:0626| tko parser: RUNNING: RUNNING 09/21 03:59:57.191 ERROR| logging_manager:0626| Subdir: None 09/21 03:59:57.191 ERROR| logging_manager:0626| Testname: provision 09/21 03:59:57.191 ERROR| logging_manager:0626| 09/21 03:59:57.191 ERROR| logging_manager:0626| tko parser: update RUNNING reason: Host did not return from reboot 09/21 03:59:57.191 ERROR| logging_manager:0626| tko parser: The following lines were ignored: 09/21 03:59:57.192 ERROR| logging_manager:0626| tko parser: Traceback (most recent call last): 09/21 03:59:57.192 ERROR| logging_manager:0626| 09/21 03:59:57.192 ERROR| logging_manager:0626| tko parser: File "/usr/local/autotest/server/server_job.py", line 946, in run_op 09/21 03:59:57.192 ERROR| logging_manager:0626| 09/21 03:59:57.192 ERROR| logging_manager:0626| tko parser: op_func() 09/21 03:59:57.193 ERROR| logging_manager:0626| 09/21 03:59:57.193 ERROR| logging_manager:0626| tko parser: File "/usr/local/autotest/server/hosts/remote.py", line 165, in reboot 09/21 03:59:57.193 ERROR| logging_manager:0626| 09/21 03:59:57.193 ERROR| logging_manager:0626| tko parser: **dargs) 09/21 03:59:57.193 ERROR| logging_manager:0626| 09/21 03:59:57.194 ERROR| logging_manager:0626| tko parser: File "/usr/local/autotest/server/hosts/remote.py", line 243, in wait_for_restart 09/21 03:59:57.194 ERROR| logging_manager:0626| 09/21 03:59:57.194 ERROR| logging_manager:0626| tko parser: self.log_op(self.OP_REBOOT, op_func) 09/21 03:59:57.194 ERROR| logging_manager:0626| 09/21 03:59:57.194 ERROR| logging_manager:0626| tko parser: File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 560, in log_op 09/21 03:59:57.195 ERROR| logging_manager:0626| 09/21 03:59:57.195 ERROR| logging_manager:0626| tko parser: op_func() 09/21 03:59:57.195 ERROR| logging_manager:0626| 09/21 03:59:57.195 ERROR| logging_manager:0626| tko parser: File "/usr/local/autotest/server/hosts/remote.py", line 242, in op_func 09/21 03:59:57.195 ERROR| logging_manager:0626| 09/21 03:59:57.195 ERROR| logging_manager:0626| tko parser: super(RemoteHost, self).wait_for_restart(timeout=timeout, **dargs) 09/21 03:59:57.196 ERROR| logging_manager:0626| 09/21 03:59:57.196 ERROR| logging_manager:0626| tko parser: File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 310, in wait_for_restart 09/21 03:59:57.196 ERROR| logging_manager:0626| 09/21 03:59:57.196 ERROR| logging_manager:0626| tko parser: raise error.AutoservRebootError("Host did not return from reboot") 09/21 03:59:57.196 ERROR| logging_manager:0626| 09/21 03:59:57.197 ERROR| logging_manager:0626| tko parser: AutoservRebootError: Host did not return from reboot 09/21 03:59:57.197 ERROR| logging_manager:0626| 09/21 03:59:57.197 ERROR| logging_manager:0626| tko parser: --------------------------------- 09/21 03:59:57.197 ERROR| logging_manager:0626| tko parser: parsing test provision_AutoUpdate provision 09/21 03:59:57.198 ERROR| logging_manager:0626| tko parser: ADD: ABORT 09/21 03:59:57.198 ERROR| logging_manager:0626| Subdir: provision_AutoUpdate 09/21 03:59:57.198 ERROR| logging_manager:0626| Testname: provision 09/21 03:59:57.198 ERROR| logging_manager:0626| Host did not return from reboot 09/21 03:59:57.198 ERROR| logging_manager:0626| tko parser: parsing test ---- SERVER_JOB 09/21 04:00:14.612 ERROR| utils:2769| Timed out waiting for condition: Wait for a socket file to exist 09/21 04:02:37.132 ERROR| utils:2769| Timed out waiting for condition: Wait for a socket file to exist 09/21 04:03:36.877 ERROR| utils:2769| Timed out waiting for condition: Wait for a socket file to exist 09/21 04:04:40.425 ERROR| utils:0287| [stderr] ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out 09/21 04:05:43.993 ERROR| utils:0287| [stderr] ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out 09/21 04:06:13.991 ERROR| utils:2769| Timed out waiting for condition: Wait for a socket file to exist 09/21 04:07:17.529 ERROR| utils:0287| [stderr] ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out 09/21 04:07:24.394 ERROR| traceback:0013| Traceback (most recent call last): 09/21 04:07:24.394 ERROR| traceback:0013| File "/usr/local/autotest/server/autoserv", line 547, in run_autoserv 09/21 04:07:24.394 ERROR| traceback:0013| job.provision(job_labels) 09/21 04:07:24.394 ERROR| traceback:0013| File "/usr/local/autotest/server/server_job.py", line 526, in provision 09/21 04:07:24.394 ERROR| traceback:0013| self.run(control=control, job_labels=labels) 09/21 04:07:24.395 ERROR| traceback:0013| File "/usr/local/autotest/server/server_job.py", line 799, in run 09/21 04:07:24.395 ERROR| traceback:0013| self._execute_code(server_control_file, namespace) 09/21 04:07:24.395 ERROR| traceback:0013| File "/usr/local/autotest/server/server_job.py", line 1322, in _execute_code 09/21 04:07:24.395 ERROR| traceback:0013| execfile(code_file, namespace, namespace) 09/21 04:07:24.395 ERROR| traceback:0013| File "/usr/local/autotest/results/hosts/chromeos2-row4-rack8-host13/1315752-provision/20182109034559/control.srv", line 108, in <module> 09/21 04:07:24.395 ERROR| traceback:0013| job.parallel_simple(provision_machine, machines) 09/21 04:07:24.395 ERROR| traceback:0013| File "/usr/local/autotest/server/server_job.py", line 607, in parallel_simple 09/21 04:07:24.396 ERROR| traceback:0013| log=log, timeout=timeout, return_results=return_results) 09/21 04:07:24.396 ERROR| traceback:0013| File "/usr/local/autotest/server/subcommand.py", line 98, in parallel_simple 09/21 04:07:24.396 ERROR| traceback:0013| function(arg) 09/21 04:07:24.396 ERROR| traceback:0013| File "/usr/local/autotest/results/hosts/chromeos2-row4-rack8-host13/1315752-provision/20182109034559/control.srv", line 99, in provision_machine 09/21 04:07:24.396 ERROR| traceback:0013| raise Exception('') 09/21 04:07:24.396 ERROR| traceback:0013| Exception 09/21 04:07:25.853 ERROR| autoserv:0801| Uncaught SystemExit with code 1 Traceback (most recent call last): File "/usr/local/autotest/server/autoserv", line 797, in main use_ssp) File "/usr/local/autotest/server/autoserv", line 607, in run_autoserv sys.exit(exit_code) SystemExit: 1 ...
,
Sep 24
This error "ABORT: Host did not return from reboot" is also happening on some other boards: * coral-release R70 during HWTest [bvt-installer] [babymega]: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934525092588475168 * nautilus-release R69 during HWTest [bvt-installer]: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934529586792986688 And continues happening on kefka: * kefka-release during HWTest [bvt-installer]: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934525070338930288
,
Sep 24
Changing the priority according to comment #1
,
Sep 25
,
Oct 3
There was 1 success result on Sep.25, but has been getting same errors again. https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=kefka-release&buildBranch=release-R70-11021.B
,
Oct 5
,
Oct 5
guocb@, what's the progress on fixing this?
,
Oct 5
We have four good build since yesterday consequently. So close this bug as a obsolete one. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by kkan...@chromium.org
, Sep 24