android CQ jobs timing out due to excessive retries of instrumentation tests w/ "--gtest_repeat" arg |
||||
Issue descriptionSee: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-kitkat-arm-rel/99235 https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-kitkat-arm-rel/101359 https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-kitkat-arm-rel/101321 https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-kitkat-arm-rel/101317 https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-kitkat-arm-rel/101311 https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-kitkat-arm-rel/101286 Here's what's happening: 1 A single shard in the initial chrome_public_test_apk run times out and doesn't emit test results. 2 We retry the suite w/o patch. 3 When retrying it, we retry *every* test in the suite despite some of the tests passing in the initial run. 4 Since we're retrying it w/o patch, we add "--gtest_retries=10" 5 So we rerun the entire suite 10 times. 6 Since chrome_public_test_apk is a large, long-running suite, we reach the build timeout (ie: we don't have time to run the whole suite 10x in a tryjob) An argument could be made that everything here is WAI... but I think #3 can be improved. We shouldn't be retrying tests that passed in the initial run. The fact that we are is a bug IMO.
,
Oct 11
This is a source of repeated infra-failure pages, so is likely hurting more than helping. I'm going to disable the "--gtest_repeat" arg for android-kitkat-arm-rel since this problem seems to be isolated to that bot.
,
Oct 11
Issue 894615 has been merged into this issue.
,
Oct 11
I couldn't figure out a good way of configuring a single bot to be excluded from the retries without making the recipe even more of a mess. So passing to erikchen to either come up w/ one or revert the retries.
,
Oct 12
Thanks, let's revert the 10x retries for now and I'll scope it in the future to test suites that have a relatively smaller number of tests to retry.
,
Oct 12
,
Oct 12
To revert, we need to revert 2 CLs: https://chromium-review.googlesource.com/c/chromium/tools/build/+/1271636 https://chromium-review.googlesource.com/c/chromium/tools/build/+/1269016 To disable, we need to land this CL: https://chromium-review.googlesource.com/c/chromium/tools/build/+/1278367 I've been trying to land the reverts, but the CQ seems to be having issues. Ditto for the CL to disable. If we have to force-merge, I think it's actually to safer to force-merge the disable, since I just generated it from ToT and it's less likely to have expectation conflicts/issues.
,
Oct 12
Reverts have landed.
,
Oct 12
Thanks! android-kitkat-arm-rel retrying appears to be functional again https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-kitkat-arm-rel/102082
,
Oct 12
Issue 894670 has been merged into this issue. |
||||
►
Sign in to add a comment |
||||
Comment 1 by martiniss@chromium.org
, Oct 11