test_push: Failed because powerwash timed out on chromeos4-row10-rack9-host15 |
|||||||||||
Issue descriptionFailure from /var/log/test_push.log: [chromeos-autotest.hot.corp.google.com] out: Original stack trace of the exception: [chromeos-autotest.hot.corp.google.com] out: [('./site_utils/test_push.py', 496, 'test_suite_wrapper', 'create_and_return, testbed_test)'), ('./site_utils/test_push.py', 396, 'test_suite', 'create_and_return, testbed_test)'), ('./site_utils/test_push.py', 308, 'do_run_suite', 'powerwash_dut_to_test_repair(host.hostname, timeout=300)'), ('./site_utils/test_push.py', 177, 'powerwash_dut_to_test_repair', '(hostname, timeout))')] [chromeos-autotest.hot.corp.google.com] out: Test for pushing to prod failed: [chromeos-autotest.hot.corp.google.com] out: [chromeos-autotest.hot.corp.google.com] out: Powerwash test on chromeos4-row10-rack9-host15 timeout after 300s, abort it. [chromeos-autotest.hot.corp.google.com] out: Traceback (most recent call last): [chromeos-autotest.hot.corp.google.com] out: File "./site_utils/test_push.py", line 637, in <module> [chromeos-autotest.hot.corp.google.com] out: sys.exit(main()) [chromeos-autotest.hot.corp.google.com] out: File "./site_utils/test_push.py", line 593, in main [chromeos-autotest.hot.corp.google.com] out: check_queue(queue) [chromeos-autotest.hot.corp.google.com] out: File "./site_utils/test_push.py", line 514, in check_queue [chromeos-autotest.hot.corp.google.com] out: raise exc_info[0](exc_info[1]) [chromeos-autotest.hot.corp.google.com] out: __main__.TestPushException: Powerwash test on chromeos4-row10-rack9-host15 timeout after 300s, abort it. The failing server job on the test server: http://chromeos-autotest.hot.corp.google.com/afe/#tab_id=view_job&object_id=1679
,
Nov 15 2016
Figured this out with shuqianz@ The sequence of events was: - The 9:00 AM test_push failed for an unrelated reason (dynamic_suite bug). - [Mystery] For some strange reason DUT chromeos4-row10-rack9-host15 did not get a Verify special task created against it afterwards. The last job on it was powerwash. This leaves the DUT in a state where it doesn't have python installed. - The 1:00 PM test_push created powerwash job on this DUT. In the usual case, this causes a Reset task, that succeeds, followed by the platform_PowerWash test. - In this case, because the DUT had no python, Reset failed, triggering a Rapair task. But the timeout for this job set by test_push is not enough for a Repair.... so the job failed. We're trusting a previous test_push to clean up after itself -- to run Verify on all DUTs and subsequent Repair tasks if needed. Now, if the last test_push failed, it's not ideal to depend on it for cleaning up the DUTs such that current test_push finishes. We currently check that all DUTs are in state Ready before starting. How about we Verify all DUTs at the start, _then_ check that they go the Ready state before starting the testing? I've started the test_push again (and it should succeed now in theory, because that DUT is now Repaired), so lowering priority.
,
Nov 15 2016
,
Nov 15 2016
,
Jan 12 2017
Add verify at the beginning of test_push https://chromium-review.googlesource.com/#/c/418490/
,
Mar 2 2017
,
Mar 4 2017
,
Apr 17 2017
,
May 30 2017
,
Aug 1 2017
,
Oct 14 2017
|
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by pprabhu@chromium.org
, Nov 15 2016