guado_moblab-paladin fails moblab_RunSuite: FAIL: Unhandled AutoservRunError: command execution error |
||
Issue descriptionhttps://luci-milo.appspot.com/buildbot/chromeos/guado_moblab-paladin/9598 Potentially interesting logs from https://storage.cloud.google.com/chromeos-autotest-results/205522606-chromeos-test/chromeos2-row1-rack8-host7/debug/autoserv.DEBUG: 06/04 12:18:41.139 INFO | server_job:0218| FAIL moblab_RunSuite moblab_RunSuite timestamp=1528139921 localtime=Jun 04 12:18:41 Unhandled AutoservRunError: command execution error * Command: /usr/bin/ssh -a -x -o Protocol=2 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -l root -p 22 chromeos2-row1-rack8-host7 "export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::run_once|run_as_moblab|run] -> ssh_run(su - moblab -c '/usr/local/autotest/site_utils/run_suite.py --pool='' --board=cyan --build=cyan-release/R66-10452.74.0 --suite_name=dummy_server --retry=True --max_retries=1')\";fi; su - moblab -c '/usr/local/autotest/site_utils/run_suite.py --pool='' --board=cyan --build=cyan-release/R66-10452.74.0 --suite_name=dummy_server --retry=True --max_retries=1'" Exit status: 4 Duration: 1121.69987607 dummy_PassServer_nossp [ FAILED ] dummy_PassServer_nossp ABORT: Timed out, did not run. 06-04-2018 [12:17:49] Attempting to display pool info: No hosts found for board:cyan in pool: Reason: Tests were aborted before running; suite must have timed out.
,
Jun 4 2018
Can we make it more obvious for people to know where to look? Even just adding something to the yaqs entry https://yaqs.googleplex.com/eng/q/6532316467036160? https://bugs.chromium.org/p/chromium/issues/detail?id=747056 is related to this, but I honestly would never knew to go and look in this sysinfo file. On the cyan front, there might be light at the end of the tunnel for reboot issues: crbug.com/639301
,
Jun 7 2018
I am closing this as a flake - the sub DUT failed to provision - it happens occasionally. I added some doc to the YAQ about how I found the logs. |
||
►
Sign in to add a comment |
||
Comment 1 by haddowk@chromium.org
, Jun 4 2018One if the sub duts failed to return from reboot after provision. From sysinfo.tgz /sysinfo/mnt/moblab/results/3-moblab/192.168.231.101/status.log START ---- provision timestamp=1528138767 localtime=Jun 04 11:59:27 START provision_AutoUpdate provision_AutoUpdate timestamp=1528138768 localtime=Jun 04 11:59:28 START ---- ---- timestamp=1528138774 localtime=Jun 04 11:59:34 GOOD ---- sysinfo.before timestamp=1528138774 localtime=Jun 04 11:59:34 END GOOD ---- ---- timestamp=1528138774 localtime=Jun 04 11:59:34 START ---- reboot timestamp=1528138843 localtime=Jun 04 12:00:43 GOOD ---- reboot.start timestamp=1528138843 localtime=Jun 04 12:00:43 ABORT ---- reboot.verify timestamp=1528139567 localtime=Jun 04 12:12:47 Host did not return from reboot END FAIL ---- reboot timestamp=1528139567 localtime=Jun 04 12:12:47 Host did not return from reboot Traceback (most recent call last): File "/usr/local/autotest/server/server_job.py", line 952, in run_op op_func() File "/usr/local/autotest/server/hosts/remote.py", line 160, in reboot **dargs) File "/usr/local/autotest/server/hosts/remote.py", line 229, in wait_for_restart self.log_op(self.OP_REBOOT, op_func) File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 566, in log_op op_func() File "/usr/local/autotest/server/hosts/remote.py", line 228, in op_func super(RemoteHost, self).wait_for_restart(timeout=timeout, **dargs) File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 310, in wait_for_restart raise error.AutoservRebootError("Host did not return from reboot") AutoservRebootError: Host did not return from reboot FAIL provision_AutoUpdate provision_AutoUpdate timestamp=1528139668 localtime=Jun 04 12:14:28 Unhandled AutoservRebootError: Host did not return from reboot Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/test.py", line 831, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/client/common_lib/test.py", line 495, in execute dargs) File "/usr/local/autotest/client/common_lib/test.py", line 362, in _call_run_once_with_retry postprocess_profiled_run, args, dargs) File "/usr/local/autotest/client/common_lib/test.py", line 400, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/server/site_tests/provision_AutoUpdate/provision_AutoUpdate.py", line 126, in run_once with_cheets=with_cheets) File "/usr/local/autotest/server/afe_utils.py", line 126, in machine_install_and_update_labels image_name, host_attributes = host.machine_install(update_url) File "/usr/local/autotest/server/hosts/cros_host.py", line 743, in machine_install return updater.run_update() File "/usr/local/autotest/server/cros/autoupdater.py", line 716, in run_update self.host.reboot(timeout=self.host.REBOOT_TIMEOUT) File "/usr/local/autotest/server/hosts/cros_host.py", line 1255, in reboot super(CrosHost, self).reboot(**dargs) File "/usr/local/autotest/server/hosts/remote.py", line 164, in reboot self.log_op(self.OP_REBOOT, reboot) File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 562, in log_op self.job.run_op(op, op_func, self.get_kernel_ver) File "/usr/local/autotest/server/server_job.py", line 952, in run_op op_func() File "/usr/local/autotest/server/hosts/remote.py", line 160, in reboot **dargs) File "/usr/local/autotest/server/hosts/remote.py", line 229, in wait_for_restart self.log_op(self.OP_REBOOT, op_func) File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 566, in log_op op_func() File "/usr/local/autotest/server/hosts/remote.py", line 228, in op_func super(RemoteHost, self).wait_for_restart(timeout=timeout, **dargs) File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 310, in wait_for_restart raise error.AutoservRebootError("Host did not return from reboot") AutoservRebootError: Host did not return from reboot END FAIL provision_AutoUpdate provision_AutoUpdate timestamp=1528139668 localtime=Jun 04 12:14:28 END FAIL ---- provision timestamp=1528139668 localtime=Jun 04 12:14:28 INFO ---- ---- timestamp=1528139668 job_abort_reason= localtime=Jun 04 12:14:28 I will go check the device in the lab but it seems up at the moment so perhaps just very slow to reboot