New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 665600 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

test_push: Failed because powerwash timed out on chromeos4-row10-rack9-host15

Project Member Reported by pprabhu@chromium.org, Nov 15 2016

Issue description

Failure from /var/log/test_push.log:
[chromeos-autotest.hot.corp.google.com] out: Original stack trace of the exception:
[chromeos-autotest.hot.corp.google.com] out: [('./site_utils/test_push.py', 496, 'test_suite_wrapper', 'create_and_return, testbed_test)'), ('./site_utils/test_push.py', 396, 'test_suite', 'create_and_return, testbed_test)'), ('./site_utils/test_push.py', 308, 'do_run_suite', 'powerwash_dut_to_test_repair(host.hostname, timeout=300)'), ('./site_utils/test_push.py', 177, 'powerwash_dut_to_test_repair', '(hostname, timeout))')]
[chromeos-autotest.hot.corp.google.com] out: Test for pushing to prod failed:
[chromeos-autotest.hot.corp.google.com] out:
[chromeos-autotest.hot.corp.google.com] out: Powerwash test on chromeos4-row10-rack9-host15 timeout after 300s, abort it.
[chromeos-autotest.hot.corp.google.com] out: Traceback (most recent call last):
[chromeos-autotest.hot.corp.google.com] out:   File "./site_utils/test_push.py", line 637, in <module>
[chromeos-autotest.hot.corp.google.com] out:     sys.exit(main())
[chromeos-autotest.hot.corp.google.com] out:   File "./site_utils/test_push.py", line 593, in main
[chromeos-autotest.hot.corp.google.com] out:     check_queue(queue)
[chromeos-autotest.hot.corp.google.com] out:   File "./site_utils/test_push.py", line 514, in check_queue
[chromeos-autotest.hot.corp.google.com] out:     raise exc_info[0](exc_info[1])
[chromeos-autotest.hot.corp.google.com] out: __main__.TestPushException: Powerwash test on chromeos4-row10-rack9-host15 timeout after 300s, abort it.


The failing server job on the test server: http://chromeos-autotest.hot.corp.google.com/afe/#tab_id=view_job&object_id=1679
 
Cc: shuqianz@chromium.org
+shuqianz: Is this another instance of failure that I should ignore?
cf: https://bugs.chromium.org/p/chromium/issues/detail?id=641177

Cc: -shuqianz@chromium.org akes...@chromium.org pprabhu@chromium.org
Labels: -Pri-1 Pri-2
Owner: shuqianz@chromium.org
Figured this out with shuqianz@

The sequence of events was:
- The 9:00 AM test_push failed for an unrelated reason (dynamic_suite bug).
- [Mystery] For some strange reason DUT chromeos4-row10-rack9-host15 did not get a Verify special task created against it afterwards. The last job on it was powerwash. This leaves the DUT in a state where it doesn't have python installed.

- The 1:00 PM test_push created powerwash job on this DUT. In the usual case, this causes a Reset task, that succeeds, followed by the platform_PowerWash test.
- In this case, because the DUT had no python, Reset failed, triggering a Rapair task. But the timeout for this job set by test_push is not enough for a Repair.... so the job failed.


We're trusting a previous test_push to clean up after itself -- to run Verify on all DUTs and subsequent Repair tasks if needed. Now, if the last test_push failed, it's not ideal to depend on it for cleaning up the DUTs such that current test_push finishes.

We currently check that all DUTs are in state Ready before starting. How about we Verify all DUTs at the start, _then_ check that they go the Ready state before starting the testing?

I've started the test_push again (and it should succeed now in theory, because that DUT is now Repaired), so lowering priority.
Status: Assigned (was: Started)

Comment 4 by autumn@chromium.org, Nov 15 2016

Labels: -current-issue
Status: Fixed (was: Assigned)
Add verify at the beginning of test_push
https://chromium-review.googlesource.com/#/c/418490/
Summary: test_push: Failed because powerwash timed out on chromeos4-row10-rack9-host15 (was: push-to-prod: Failed because powerwash timed out on chromeos4-row10-rack9-host15)

Comment 7 by dchan@google.com, Mar 4 2017

Labels: VerifyIn-58

Comment 8 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 9 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 11 by dchan@chromium.org, Oct 14 2017

Status: Archived (was: Fixed)

Sign in to add a comment