New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 887973 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner:
OOO
Closed: Oct 5
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

kefka release failing HWTest sanity provision: ABORT: Host did not return from reboot

Project Member Reported by bhthompson@google.com, Sep 21

Issue description

This happened twice in a row.

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934797447039826176

https://stainless.corp.google.com/browse/chromeos-autotest-results/240273698-chromeos-test/

...
09/21 03:46:13.504 ERROR|             utils:0287| [stderr] [0921/092155:INFO:update_engine_client.cc(508)] Querying Update Engine status...
09/21 03:48:27.504 ERROR|             utils:2769| Timed out waiting for condition: Wait for a socket file to exist
09/21 03:49:31.049 ERROR|             utils:0287| [stderr] ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out
09/21 03:50:34.601 ERROR|             utils:0287| [stderr] ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out
09/21 03:51:04.595 ERROR|             utils:2769| Timed out waiting for condition: Wait for a socket file to exist
09/21 03:52:08.457 ERROR|             utils:0287| [stderr] ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out
09/21 03:52:14.346 ERROR|             utils:2769| Timed out waiting for condition: Wait for a socket file to exist
09/21 03:53:17.901 ERROR|             utils:0287| [stderr] ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out
09/21 03:54:21.449 ERROR|             utils:0287| [stderr] ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out
09/21 03:54:51.473 ERROR|             utils:2769| Timed out waiting for condition: Wait for a socket file to exist
09/21 03:55:55.018 ERROR|             utils:0287| [stderr] ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out
09/21 03:56:00.905 ERROR|             utils:2769| Timed out waiting for condition: Wait for a socket file to exist
09/21 03:57:16.182 ERROR|             utils:2769| Timed out waiting for condition: Wait for a socket file to exist
09/21 03:57:39.319 ERROR|       autoupdater:0998| Failure preparing host prior to update.
Traceback (most recent call last):
  File "/usr/local/autotest/server/cros/autoupdater.py", line 994, in run_update
    self._prepare_host()
  File "/usr/local/autotest/server/cros/autoupdater.py", line 819, in _prepare_host
    self.host.reboot(timeout=self.host.REBOOT_TIMEOUT)
  File "/usr/local/autotest/server/hosts/cros_host.py", line 1046, in reboot
    super(CrosHost, self).reboot(**dargs)
  File "/usr/local/autotest/server/hosts/remote.py", line 169, in reboot
    self.log_op(self.OP_REBOOT, reboot)
  File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 556, in log_op
    self.job.run_op(op, op_func, self.get_kernel_ver)
  File "/usr/local/autotest/server/server_job.py", line 946, in run_op
    op_func()
  File "/usr/local/autotest/server/hosts/remote.py", line 165, in reboot
    **dargs)
  File "/usr/local/autotest/server/hosts/remote.py", line 243, in wait_for_restart
    self.log_op(self.OP_REBOOT, op_func)
  File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 560, in log_op
    op_func()
  File "/usr/local/autotest/server/hosts/remote.py", line 242, in op_func
    super(RemoteHost, self).wait_for_restart(timeout=timeout, **dargs)
  File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 310, in wait_for_restart
    raise error.AutoservRebootError("Host did not return from reboot")
AutoservRebootError: Host did not return from reboot
09/21 03:57:45.234 ERROR|             utils:2769| Timed out waiting for condition: Wait for a socket file to exist
09/21 03:58:53.652 ERROR|             utils:2769| Timed out waiting for condition: Wait for a socket file to exist
09/21 03:59:57.177 ERROR|             utils:0287| [stderr] ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out
09/21 03:59:57.179 ERROR|              test:0252| Error cleaning up the sysinfo autotest/host objects, ignoring it
Traceback (most recent call last):
  File "/usr/local/autotest/server/test.py", line 245, in cleanup
    self.host.close()
  File "/usr/local/autotest/server/hosts/cros_host.py", line 742, in close
    super(CrosHost, self).close()
  File "/usr/local/autotest/server/hosts/abstract_ssh.py", line 808, in close
    super(AbstractSSHHost, self).close()
  File "/usr/local/autotest/server/hosts/remote.py", line 55, in close
    self.run('rm -rf "%s"' % (utils.sh_escape(dir)))
  File "/usr/local/autotest/server/hosts/ssh_host.py", line 335, in run
    return self.run_very_slowly(*args, **kwargs)
  File "/usr/local/autotest/server/hosts/ssh_host.py", line 324, in run_very_slowly
    ssh_failure_retry_ok)
  File "/usr/local/autotest/server/hosts/ssh_host.py", line 260, in _run
    raise error.AutoservSSHTimeout("ssh timed out", result)
AutoservSSHTimeout: ('ssh timed out', * Command: 
    /usr/bin/ssh -a -x  -o ControlPath=/tmp/_autotmp_ew7YXOssh-master/socket
    -o Protocol=2 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null
    -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o
    ServerAliveCountMax=3 -o ConnectionAttempts=4 -l root -p 22
    chromeos2-row4-rack8-host13 "export LIBC_FATAL_STDERR_=1; if type
    \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\"
    \"server[stack::close|close|run] -> ssh_run(rm -rf \\\"/tmp/sysinfo
    /autoserv-BBLXHU\\\")\";fi; rm -rf \"/tmp/sysinfo/autoserv-BBLXHU\""
Exit status: 255
Duration: 63.4932498932

stderr:
ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out)
09/21 03:59:57.183 ERROR|           control:0074| Provision failed due to Exception.
Traceback (most recent call last):
  File "/usr/local/autotest/results/hosts/chromeos2-row4-rack8-host13/1315752-provision/20182109034559/control.srv", line 53, in provision_machine
    provision.Provision)
  File "/usr/local/autotest/server/cros/provision.py", line 400, in run_special_task_actions
    task.run_task_actions(job, host, labels)
  File "/usr/local/autotest/server/cros/provision.py", line 173, in run_task_actions
    raise SpecialTaskActionException()
SpecialTaskActionException
09/21 03:59:57.184 ERROR|        server_job:0807| Exception escaped control file, job aborting:
Traceback (most recent call last):
  File "/usr/local/autotest/server/server_job.py", line 799, in run
    self._execute_code(server_control_file, namespace)
  File "/usr/local/autotest/server/server_job.py", line 1322, in _execute_code
    execfile(code_file, namespace, namespace)
  File "/usr/local/autotest/results/hosts/chromeos2-row4-rack8-host13/1315752-provision/20182109034559/control.srv", line 108, in <module>
    job.parallel_simple(provision_machine, machines)
  File "/usr/local/autotest/server/server_job.py", line 607, in parallel_simple
    log=log, timeout=timeout, return_results=return_results)
  File "/usr/local/autotest/server/subcommand.py", line 98, in parallel_simple
    function(arg)
  File "/usr/local/autotest/results/hosts/chromeos2-row4-rack8-host13/1315752-provision/20182109034559/control.srv", line 99, in provision_machine
    raise Exception('')
Exception
09/21 03:59:57.188 ERROR|   logging_manager:0626| tko parser: {'drone': 'cros-full-0020.mtv.corp.google.com', 'user': 'chromeos-test', 'job_started': 1537526760, 'hostname': 'chromeos2-row4-rack8-host13', 'status_version': 1, 'label': ''}
09/21 03:59:57.189 ERROR|   logging_manager:0626| tko parser: MACHINE NAME: chromeos2-row4-rack8-host13
09/21 03:59:57.190 ERROR|   logging_manager:0626| tko parser: MACHINE GROUP: kefka
09/21 03:59:57.190 ERROR|   logging_manager:0626| tko parser: parsing partial test ---- SERVER_JOB
09/21 03:59:57.190 ERROR|   logging_manager:0626| tko parser: parsing partial test None provision
09/21 03:59:57.191 ERROR|   logging_manager:0626| tko parser: RUNNING: RUNNING
09/21 03:59:57.191 ERROR|   logging_manager:0626| Subdir: None
09/21 03:59:57.191 ERROR|   logging_manager:0626| Testname: provision
09/21 03:59:57.191 ERROR|   logging_manager:0626| 
09/21 03:59:57.191 ERROR|   logging_manager:0626| tko parser: update RUNNING reason: Host did not return from reboot
09/21 03:59:57.191 ERROR|   logging_manager:0626| tko parser: The following lines were ignored:
09/21 03:59:57.192 ERROR|   logging_manager:0626| tko parser:   Traceback (most recent call last):
09/21 03:59:57.192 ERROR|   logging_manager:0626| 
09/21 03:59:57.192 ERROR|   logging_manager:0626| tko parser:     File "/usr/local/autotest/server/server_job.py", line 946, in run_op
09/21 03:59:57.192 ERROR|   logging_manager:0626| 
09/21 03:59:57.192 ERROR|   logging_manager:0626| tko parser:       op_func()
09/21 03:59:57.193 ERROR|   logging_manager:0626| 
09/21 03:59:57.193 ERROR|   logging_manager:0626| tko parser:     File "/usr/local/autotest/server/hosts/remote.py", line 165, in reboot
09/21 03:59:57.193 ERROR|   logging_manager:0626| 
09/21 03:59:57.193 ERROR|   logging_manager:0626| tko parser:       **dargs)
09/21 03:59:57.193 ERROR|   logging_manager:0626| 
09/21 03:59:57.194 ERROR|   logging_manager:0626| tko parser:     File "/usr/local/autotest/server/hosts/remote.py", line 243, in wait_for_restart
09/21 03:59:57.194 ERROR|   logging_manager:0626| 
09/21 03:59:57.194 ERROR|   logging_manager:0626| tko parser:       self.log_op(self.OP_REBOOT, op_func)
09/21 03:59:57.194 ERROR|   logging_manager:0626| 
09/21 03:59:57.194 ERROR|   logging_manager:0626| tko parser:     File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 560, in log_op
09/21 03:59:57.195 ERROR|   logging_manager:0626| 
09/21 03:59:57.195 ERROR|   logging_manager:0626| tko parser:       op_func()
09/21 03:59:57.195 ERROR|   logging_manager:0626| 
09/21 03:59:57.195 ERROR|   logging_manager:0626| tko parser:     File "/usr/local/autotest/server/hosts/remote.py", line 242, in op_func
09/21 03:59:57.195 ERROR|   logging_manager:0626| 
09/21 03:59:57.195 ERROR|   logging_manager:0626| tko parser:       super(RemoteHost, self).wait_for_restart(timeout=timeout, **dargs)
09/21 03:59:57.196 ERROR|   logging_manager:0626| 
09/21 03:59:57.196 ERROR|   logging_manager:0626| tko parser:     File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 310, in wait_for_restart
09/21 03:59:57.196 ERROR|   logging_manager:0626| 
09/21 03:59:57.196 ERROR|   logging_manager:0626| tko parser:       raise error.AutoservRebootError("Host did not return from reboot")
09/21 03:59:57.196 ERROR|   logging_manager:0626| 
09/21 03:59:57.197 ERROR|   logging_manager:0626| tko parser:   AutoservRebootError: Host did not return from reboot
09/21 03:59:57.197 ERROR|   logging_manager:0626| 
09/21 03:59:57.197 ERROR|   logging_manager:0626| tko parser: ---------------------------------
09/21 03:59:57.197 ERROR|   logging_manager:0626| tko parser: parsing test provision_AutoUpdate provision
09/21 03:59:57.198 ERROR|   logging_manager:0626| tko parser: ADD: ABORT
09/21 03:59:57.198 ERROR|   logging_manager:0626| Subdir: provision_AutoUpdate
09/21 03:59:57.198 ERROR|   logging_manager:0626| Testname: provision
09/21 03:59:57.198 ERROR|   logging_manager:0626| Host did not return from reboot
09/21 03:59:57.198 ERROR|   logging_manager:0626| tko parser: parsing test ---- SERVER_JOB
09/21 04:00:14.612 ERROR|             utils:2769| Timed out waiting for condition: Wait for a socket file to exist
09/21 04:02:37.132 ERROR|             utils:2769| Timed out waiting for condition: Wait for a socket file to exist
09/21 04:03:36.877 ERROR|             utils:2769| Timed out waiting for condition: Wait for a socket file to exist
09/21 04:04:40.425 ERROR|             utils:0287| [stderr] ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out
09/21 04:05:43.993 ERROR|             utils:0287| [stderr] ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out
09/21 04:06:13.991 ERROR|             utils:2769| Timed out waiting for condition: Wait for a socket file to exist
09/21 04:07:17.529 ERROR|             utils:0287| [stderr] ssh: connect to host chromeos2-row4-rack8-host13 port 22: Connection timed out
09/21 04:07:24.394 ERROR|         traceback:0013| Traceback (most recent call last):
09/21 04:07:24.394 ERROR|         traceback:0013|   File "/usr/local/autotest/server/autoserv", line 547, in run_autoserv
09/21 04:07:24.394 ERROR|         traceback:0013|     job.provision(job_labels)
09/21 04:07:24.394 ERROR|         traceback:0013|   File "/usr/local/autotest/server/server_job.py", line 526, in provision
09/21 04:07:24.394 ERROR|         traceback:0013|     self.run(control=control, job_labels=labels)
09/21 04:07:24.395 ERROR|         traceback:0013|   File "/usr/local/autotest/server/server_job.py", line 799, in run
09/21 04:07:24.395 ERROR|         traceback:0013|     self._execute_code(server_control_file, namespace)
09/21 04:07:24.395 ERROR|         traceback:0013|   File "/usr/local/autotest/server/server_job.py", line 1322, in _execute_code
09/21 04:07:24.395 ERROR|         traceback:0013|     execfile(code_file, namespace, namespace)
09/21 04:07:24.395 ERROR|         traceback:0013|   File "/usr/local/autotest/results/hosts/chromeos2-row4-rack8-host13/1315752-provision/20182109034559/control.srv", line 108, in <module>
09/21 04:07:24.395 ERROR|         traceback:0013|     job.parallel_simple(provision_machine, machines)
09/21 04:07:24.395 ERROR|         traceback:0013|   File "/usr/local/autotest/server/server_job.py", line 607, in parallel_simple
09/21 04:07:24.396 ERROR|         traceback:0013|     log=log, timeout=timeout, return_results=return_results)
09/21 04:07:24.396 ERROR|         traceback:0013|   File "/usr/local/autotest/server/subcommand.py", line 98, in parallel_simple
09/21 04:07:24.396 ERROR|         traceback:0013|     function(arg)
09/21 04:07:24.396 ERROR|         traceback:0013|   File "/usr/local/autotest/results/hosts/chromeos2-row4-rack8-host13/1315752-provision/20182109034559/control.srv", line 99, in provision_machine
09/21 04:07:24.396 ERROR|         traceback:0013|     raise Exception('')
09/21 04:07:24.396 ERROR|         traceback:0013| Exception
09/21 04:07:25.853 ERROR|          autoserv:0801| Uncaught SystemExit with code 1
Traceback (most recent call last):
  File "/usr/local/autotest/server/autoserv", line 797, in main
    use_ssp)
  File "/usr/local/autotest/server/autoserv", line 607, in run_autoserv
    sys.exit(exit_code)
SystemExit: 1
...
 
Cc: kinaba@chromium.org rohi...@chromium.org
This bug is blocking CTS Test results on M69.
This error "ABORT: Host did not return from reboot" is also happening on some other boards:
* coral-release R70 during HWTest [bvt-installer] [babymega]: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934525092588475168
* nautilus-release R69 during HWTest [bvt-installer]: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934529586792986688

And continues happening on kefka:
* kefka-release during HWTest [bvt-installer]: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934525070338930288
Labels: -Pri-3 Pri-1
Changing the priority according to comment #1
Cc: -kitching@google.com
Owner: gu...@chromium.org
There was 1 success result on Sep.25, but has been getting same errors again.
https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=kefka-release&buildBranch=release-R70-11021.B

Status: Assigned (was: Untriaged)
guocb@, what's the progress on fixing this?
Status: WontFix (was: Assigned)
We have four good build since yesterday consequently. So close this bug as a obsolete one.

Sign in to add a comment