New issue
Advanced search Search tips

Issue 894637 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 12
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug



Sign in to add a comment

android CQ jobs timing out due to excessive retries of instrumentation tests w/ "--gtest_repeat" arg

Project Member Reported by bpastene@chromium.org, Oct 11

Issue description

See:
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-kitkat-arm-rel/99235
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-kitkat-arm-rel/101359
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-kitkat-arm-rel/101321
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-kitkat-arm-rel/101317
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-kitkat-arm-rel/101311
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-kitkat-arm-rel/101286

Here's what's happening:
1 A single shard in the initial chrome_public_test_apk run times out and doesn't emit test results.
2 We retry the suite w/o patch.
3 When retrying it, we retry *every* test in the suite despite some of the tests passing in the initial run.
4 Since we're retrying it w/o patch, we add "--gtest_retries=10"
5 So we rerun the entire suite 10 times.
6 Since chrome_public_test_apk is a large, long-running suite, we reach the build timeout (ie: we don't have time to run the whole suite 10x in a tryjob)

An argument could be made that everything here is WAI... but I think #3 can be improved. We shouldn't be retrying tests that passed in the initial run. The fact that we are is a bug IMO.
 
That is a bug. That's  bug 394826 . I didn't realize that would cause these issues, but it does make sense that it would. I have a CL in progress to fix this but it isn't close to landing.

Should we revert the retry 10x CL, erik?
This is a source of repeated infra-failure pages, so is likely hurting more than helping.

I'm going to disable the "--gtest_repeat" arg for android-kitkat-arm-rel since this problem seems to be isolated to that bot.
Issue 894615 has been merged into this issue.
Owner: erikc...@chromium.org
I couldn't figure out a good way of configuring a single bot to be excluded from the retries without making the recipe even more of a mess. So passing to erikchen to either come up w/ one or revert the retries.
Thanks, let's revert the 10x retries for now and I'll scope it in the future to test suites that have a relatively smaller number of tests to retry.
Status: Started (was: Untriaged)
To revert, we need to revert 2 CLs:

https://chromium-review.googlesource.com/c/chromium/tools/build/+/1271636
https://chromium-review.googlesource.com/c/chromium/tools/build/+/1269016

To disable, we need to land this CL:
https://chromium-review.googlesource.com/c/chromium/tools/build/+/1278367

I've been trying to land the reverts, but the CQ seems to be having issues. Ditto for the CL to disable. If we have to force-merge, I think it's actually to safer to force-merge the disable, since I just generated it from ToT and it's less likely to have expectation conflicts/issues.
Status: Fixed (was: Started)
Reverts have landed.
Thanks! android-kitkat-arm-rel retrying appears to be functional again https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-kitkat-arm-rel/102082
Issue 894670 has been merged into this issue.

Sign in to add a comment