New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 729944 link

Starred by 1 user

Issue metadata

Status: Duplicate
Owner: ----
Closed: Jun 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Provision failure which took a long time to time out

Project Member Reported by mojahsu@chromium.org, Jun 6 2017

Issue description

It happened on host chromeos4-row6-rack9-host17(100.115.197.91). 
https://viceroy.corp.google.com/chromeos/suite_details?job_id=121389064

Link to build or pfq page.
https://uberchromegw.corp.google.com/i/chromeos/builders/cyan-paladin/builds/2783

build # for that buildbot.
2783

Snippet of log that contains the failure.
The DUT was unreachable after updated image and rebooted.
------------------------------
From 
2017/06/04 12:28:56.922 DEBUG|      auto_updater:1159| Start pre-setup for rootfs update...
2017/06/04 12:28:56.923 INFO |     remote_access:0395| Rebooting chromeos4-row6-rack9-host17...
2017/06/04 12:28:56.924 DEBUG|    cros_build_lib:0584| RunCommand: ssh -p 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=30' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' '-oIdentitiesOnly=yes' -i /tmp/ssh-tmp8Qe_1z/testing_rsa root@chromeos4-row6-rack9-host17 -- cat /proc/sys/kernel/random/boot_id
2017/06/04 12:28:57.196 DEBUG|    cros_build_lib:0633| (stdout):
a68ba8ed-2c03-45f2-84bd-69bce5aced45
.....
...
2017/06/04 12:29:12.610 DEBUG|    cros_build_lib:0584| RunCommand: ssh -p 22 '-oConnectionAttempts=2' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=2' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' '-oIdentitiesOnly=yes' -i /tmp/ssh-tmp8Qe_1z/testing_rsa root@chromeos4-row6-rack9-host17 -- cat /proc/sys/kernel/random/boot_id
2017/06/04 12:29:17.685 DEBUG|    cros_build_lib:0635| (stderr):
ssh: connect to host chromeos4-row6-rack9-host17 port 22: Connection timed out
................
..
2017/06/04 12:37:01.371 ERROR|    cros_build_lib:0681| Reboot has not completed after 480 seconds; giving up.
2017/06/04 12:37:01.372 DEBUG|    cros_build_lib:0584| RunCommand: ssh -p 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=30' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' '-oIdentitiesOnly=yes' -i /tmp/ssh-tmp8Qe_1z/testing_rsa root@chromeos4-row6-rack9-host17 -- rm -rf /mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.Tm1OnoVZ4a
2017/06/04 12:39:04.563 ERROR|     remote_access:0881| Error connecting to device chromeos4-row6-rack9-host17
2017/06/04 12:39:04.563 DEBUG|       cros_update:0260| Error happens in CrOS auto-update: SSHConnectionError('ssh: connect to host chromeos4-row6-rack9-host17 port 22: Connection timed out\r\n',)
----------------------------------------------

It took 34 minutes to know the failure.
---------------------------------------
From provision_AutoUpdate/status.log

	START	provision_AutoUpdate	provision_AutoUpdate	timestamp=1496604336	localtime=Jun 04 12:25:36			
             ...............
    File "/usr/local/autotest/server/hosts/cros_host.py", line 812, in machine_install_by_devserver
      'No answer to ping from %s' % self.hostname)
  AutoservError: No answer to ping from chromeos4-row6-rack9-host17	
END FAIL	----	provision	timestamp=1496606398	localtime=Jun 04 12:59:58	
--------------------------------------------

But it raised exception at 12:42:11.
So it took more (12:59:58 - 12:42:11) = 17 minutes to finish the test.
------------------------------------------------------------------
From provision_AutoUpdate/debug/provision_AutoUpdate.DEBUG

06/04 12:42:11.164 WARNI|              test:0616| The test failed with the following exception
Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/test.py", line 610, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 824, in _call_test_function
    raise error.UnhandledTestFail(e)
UnhandledTestFail: Unhandled AutoservError: No answer to ping from chromeos4-row6-rack9-host17
Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/test.py", line 818, in _call_test_function
    return func(*args, **dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 471, in execute
    dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 348, in _call_run_once_with_retry
    postprocess_profiled_run, args, dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 381, in _call_run_once
    self.run_once(*args, **dargs)
  File "/usr/local/autotest/server/site_tests/provision_AutoUpdate/provision_AutoUpdate.py", line 113, in run_once
    force_full_update=force)
  File "/usr/local/autotest/server/afe_utils.py", line 206, in machine_install_and_update_labels
    *args, **dargs)
  File "/usr/local/autotest/server/hosts/cros_host.py", line 812, in machine_install_by_devserver
    'No answer to ping from %s' % self.hostname)
AutoservError: No answer to ping from chromeos4-row6-rack9-host17

06/04 12:42:11.166 DEBUG|              test:0621| Running cleanup for test.
---------------------------------------------------------------------------------------



 
Mergedinto: 730061
Status: Duplicate (was: Untriaged)
Merging into mine because it contains stack trace from the kernel and such.

Also, see blocking bug there for a separate complaint about how long it took to both notice and recover this failure.

Sign in to add a comment