Shard failures due to ADB issues should be retried at the shard layer |
|
Issue descriptionIt's relatively common for Android shards to have ADB issues, causing many of the tests to not run [and timeouts after 60min]. Usually, hundreds of tests will be marked with NOT_RUN. These tests are retried in 'retry_with_patch', but 'retry_with_patch' is not designed for this use case. 'retry_with_patch' retries all the tests 10 times, looking for all successes. If there is a single failure, the build is marked as a failure. Contrast this with 'with_patch', which retries failures up to 3 times, looking for a single success. I think the right solution is to retry the entire failing shard [without even rebuilding], and use the same retry logic as 'with_patch'.
,
Jan 7
I looked through 10 examples of recent false rejects caused by a failure in u'chrome_public_test_apk. Of these, two were caused by adb device issues: Task: https://chromium-swarm.appspot.com/task?id=423e6481d73f1910&refresh=10&show_raw=1 Build: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-kitkat-arm-rel/157168 """ E 539.916s run_tests_on_device(06ac5e85003bbdc1) Device never recovered. """ Task: https://chromium-swarm.appspot.com/task?id=423417e80347b710&refresh=10&show_raw=1 Build: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-kitkat-arm-rel/156885 """ CommandTimeoutError: Timeout """ [after only 160s, normally shard takes ~900s to run] But they did not have the exact symptoms I described above. The next time I run across the example I gave, I'll post back here. |
|
►
Sign in to add a comment |
|
Comment 1 by jbudorick@chromium.org
, Jan 7