'retry with patch' failing to detect successful test runs. |
||
Issue descriptionBuild: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux-chromeos-rel/146087 Task: https://chromium-swarm.appspot.com/task?id=415a1309fd84ed10&refresh=10&show_raw=1 The test sometimes passes, and sometimes times out: """ [2/2] USS/SingleClientPasswordsSyncTest.CommitWithCustomPassphrase/0 (TIMED OUT) Retrying 1 test (retry #2) [3/3] USS/SingleClientPasswordsSyncTest.CommitWithCustomPassphrase/0 (1307 ms) SUCCESS: all tests passed. """ This should have caused the recipe to mark the step as a success. Instead, the step is marked as a failure. Needs investigation.
,
Nov 30
The example in opening comment was real flakiness introduced by the CL. In the example in c#1, the failure in 'with patch' was caused by a device issue: """ I 505.302s run_tests_on_device(07e4ca75d00b3449) java.lang.AssertionError: Many tests will fail if the screen is not on. """ The failure in 'retry with patch' seems to be caused by real test flakiness: """ I 146.881s run_tests_on_device(06b4903a3440ec9e) Error in testFling(org.chromium.content.browser.ContentViewScrollingTest): I 146.881s run_tests_on_device(06b4903a3440ec9e) java.lang.AssertionError: Criteria not met in allotted time. I 146.881s run_tests_on_device(06b4903a3440ec9e) at org.junit.Assert.fail(Assert.java:88) I 146.881s run_tests_on_device(06b4903a3440ec9e) at org.junit.Assert.assertTrue(Assert.java:41) I 146.881s run_tests_on_device(06b4903a3440ec9e) at org.chromium.content_public.browser.test.util.CriteriaHelper.pollInstrumentationThread(CriteriaHelper.java:92) I 146.882s run_tests_on_device(06b4903a3440ec9e) at org.chromium.content_public.browser.test.util.CriteriaHelper.pollInstrumentationThread(CriteriaHelper.java:107) I 146.882s run_tests_on_device(06b4903a3440ec9e) at org.chromium.content.browser.ContentViewScrollingTest.waitForScroll(ContentViewScrollingTest.java:90) I 146.882s run_tests_on_device(06b4903a3440ec9e) at org.chromium.content.browser.ContentViewScrollingTest.testFling(ContentViewScrollingTest.java:203) ... """ The CL appears unrelated: https://chromium-review.googlesource.com/c/chromium/src/+/1349530/2
,
Nov 30
Please note: In the CQ run mentioned in the opening comment, even though the build failed due to real flakiness caused by the CL, the CQ-level retry then caused the next build to succeed.
,
Jan 4
Summary: If a test flakes in 'retry with patch' [having previously deterministically failed in 'with_patch' and deterministically succeeded in 'without_patch'], there is a high probability that the flakiness is caused by the CL. The exception [anecdotally from my observations] is when the test was never run in 'with_patch' [e.g. due to ADB issues]. In that case, flakiness in retry_with_patch causes the whole build to fail, whereas we really wanted something more similar to retrying the whole shard, and allowing flaky tests to be marked as success -- I think the right way to deal with this is to extend Issue 917122 to also apply to timeouts on Android [likely caused by ADB malfunction]. ================================================================ I wrote a script that fetches all instances of builds that: 1) Ran 'retry_with_patch' 2) Had flakiness in 'retry_with_patch' [some runs PASS, others TIMEOUT/CRASH/FAIL/...] 3) Had somewhere between 1 to 9 flaky tests. [This script doesn't correctly fetch results for webkit_layout_tests, as those use a different set of result strings] Here are some partial results: """ content_browsertests on Android device Nexus 5 (retry with patch) 8925738755423705568 flaky test count: 3 network_service_interactive_ui_tests (retry with patch) on Windows-10-15063 8925791439818045232 flaky test count: 2 network_service_content_browsertests (retry with patch) on Windows-10-15063 8925937313126258112 flaky test count: 1 browser_tests (retry with patch) 8925857213884101376 flaky test count: 1 browser_tests (retry with patch) 8925909638918282528 flaky test count: 2 browser_tests (retry with patch) 8925937313126258144 flaky test count: 1 browser_tests (retry with patch) 8925937313126258080 flaky test count: 3 browser_tests (retry with patch) 8925912939460364624 flaky test count: 4 browser_tests (retry with patch) 8925908805228630928 flaky test count: 9 browser_tests (retry with patch) 8925912939460364400 flaky test count: 1 ... """ I then manually audited around ~10 of these results. This sampling was non-uniform so I can't generalize to all results, but for most of the builds I checked -- the flakiness was caused by the CL in question. I'm going to mark this bug as a WontFix. |
||
►
Sign in to add a comment |
||
Comment 1 by erikc...@chromium.org
, Nov 28