run_suite --retry does not retry tests with no JOB_RETRIES set in the control file |
||||
Issue descriptionBuilder: veyron_speedy-paladin Build #: https://chromegw.corp.google.com/i/chromeos/builders/veyron_speedy-paladin/builds/5550 Error messages from https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/122066343-chromeos-test/chromeos4-row4-rack11-host16/debug/: 06/07 16:27:59.119 DEBUG| test:0390| Test failed due to No answer to ping from chromeos4-row4-rack11-host16. Exception log follows the after_iteration_hooks. 06/07 16:27:59.119 DEBUG| test:0393| starting after_iteration_hooks 06/07 16:27:59.119 DEBUG| test:0396| after_iteration_hooks completed 06/07 16:27:59.121 WARNI| test:0616| The test failed with the following exception Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/test.py", line 610, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/client/common_lib/test.py", line 824, in _call_test_function raise error.UnhandledTestFail(e) UnhandledTestFail: Unhandled AutoservError: No answer to ping from chromeos4-row4-rack11-host16 Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/test.py", line 818, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/client/common_lib/test.py", line 471, in execute dargs) File "/usr/local/autotest/client/common_lib/test.py", line 348, in _call_run_once_with_retry postprocess_profiled_run, args, dargs) File "/usr/local/autotest/client/common_lib/test.py", line 381, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/server/site_tests/provision_AutoUpdate/provision_AutoUpdate.py", line 113, in run_once force_full_update=force) File "/usr/local/autotest/server/afe_utils.py", line 208, in machine_install_and_update_labels *args, **dargs) File "/usr/local/autotest/server/hosts/cros_host.py", line 809, in machine_install_by_devserver 'No answer to ping from %s' % self.hostname) AutoservError: No answer to ping from chromeos4-row4-rack11-host16
,
Jun 8 2017
https://viceroy.corp.google.com/chromeos/suite_details?job_id=122066331 The provision job on chromeos4-row4-rack11-host16 failed after 36 minutes. This is a PITA in its own right, but what's bad is that the affected test generic_RebootTest wasn't retried. I've been seeing this mis-behaviour on moblab as I test my CL to turn on retries on moblab: https://chromium-review.googlesource.com/c/522926/ Either I'm missing something or test retries in a suite are simply broken right now.
,
Jun 8 2017
OK, I finally understand job retries. The test control file needs to say it wants to be retried. This calls for an audit of all tests used in important suites or just relaxing this requirement. We may be retrying some tests right now but not others. https://chromium-review.googlesource.com/c/527935/
,
Jun 9 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/7295bf332df608b1d4e9cdc9f0c769b71ffbae46 commit 7295bf332df608b1d4e9cdc9f0c769b71ffbae46 Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Fri Jun 09 14:10:42 2017 [autotest] Bump all tests to retry at least once in a suite. When a suite requests job retries, we retry a test only if the test itself also request retries. For important suites (running on CQ / BVT), we would like tests that fail as a result of their provision job failing to get at least one more chance to run. This CL is a short-term fix. It bumps up the individual test retry limit to at least 1, so that each test is protected from its DUT failing provision. BUG= chromium:730885 BUG= chromium:729099 TEST=- run test_that with a test that doesn't request retries - inject a bug in the provision code so that the DUT fails provision. - watch the DUT fail provision, and the test get retried (of course that retry will again due to the same injected bug). TEST=(updated) unittests. Change-Id: I59b3ae36bb78c94fce234976d81297245cedd661 Reviewed-on: https://chromium-review.googlesource.com/528313 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Ilja H. Friedel <ihf@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> [modify] https://crrev.com/7295bf332df608b1d4e9cdc9f0c769b71ffbae46/server/cros/dynamic_suite/suite.py [modify] https://crrev.com/7295bf332df608b1d4e9cdc9f0c769b71ffbae46/server/cros/dynamic_suite/suite_unittest.py
,
Jun 9 2017
,
Jun 9 2017
Should be done, pending push-to-prod. |
||||
►
Sign in to add a comment |
||||
Comment 1 by dgarr...@chromium.org
, Jun 8 2017Summary: veyron_speedy-paladin: Test not retried after provision failure. (was: veyron_speedy-paladin: Unhandled AutoservError: No answer to ping from chromeos4-row4-rack11-host16)