New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 909845 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Jan 4
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

'retry with patch' failing to detect successful test runs.

Project Member Reported by erikc...@chromium.org, Nov 28

Issue description

Build: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux-chromeos-rel/146087

Task: https://chromium-swarm.appspot.com/task?id=415a1309fd84ed10&refresh=10&show_raw=1

The test sometimes passes, and sometimes times out:
"""
[2/2] USS/SingleClientPasswordsSyncTest.CommitWithCustomPassphrase/0 (TIMED OUT)
Retrying 1 test (retry #2)
[3/3] USS/SingleClientPasswordsSyncTest.CommitWithCustomPassphrase/0 (1307 ms)
SUCCESS: all tests passed.
"""

This should have caused the recipe to mark the step as a success. Instead, the step is marked as a failure. Needs investigation.
 
Another example, this time from Android:

https://chromium-swarm.appspot.com/task?id=415975d38e4e3610&refresh=10&show_raw=1

From the logs:
"""
C  232.105s Main  org.chromium.content.browser.ContentViewScrollingTest#testFling: 10 SUCCESS, 0 SKIPPED, 1 FAILURE, 0 CRASH, 0 TIMEOUT, 0 UNKNOWN, 0 NOTRUN
"""
The example in opening comment was real flakiness introduced by the CL.


In the example in c#1, the failure in 'with patch' was caused by a device issue:
"""
I  505.302s run_tests_on_device(07e4ca75d00b3449)    java.lang.AssertionError: Many tests will fail if the screen is not on.
"""

The failure in 'retry with patch' seems to be caused by real test flakiness:
"""
I  146.881s run_tests_on_device(06b4903a3440ec9e)    Error in testFling(org.chromium.content.browser.ContentViewScrollingTest):
I  146.881s run_tests_on_device(06b4903a3440ec9e)    java.lang.AssertionError: Criteria not met in allotted time.
I  146.881s run_tests_on_device(06b4903a3440ec9e)    	at org.junit.Assert.fail(Assert.java:88)
I  146.881s run_tests_on_device(06b4903a3440ec9e)    	at org.junit.Assert.assertTrue(Assert.java:41)
I  146.881s run_tests_on_device(06b4903a3440ec9e)    	at org.chromium.content_public.browser.test.util.CriteriaHelper.pollInstrumentationThread(CriteriaHelper.java:92)
I  146.882s run_tests_on_device(06b4903a3440ec9e)    	at org.chromium.content_public.browser.test.util.CriteriaHelper.pollInstrumentationThread(CriteriaHelper.java:107)
I  146.882s run_tests_on_device(06b4903a3440ec9e)    	at org.chromium.content.browser.ContentViewScrollingTest.waitForScroll(ContentViewScrollingTest.java:90)
I  146.882s run_tests_on_device(06b4903a3440ec9e)    	at org.chromium.content.browser.ContentViewScrollingTest.testFling(ContentViewScrollingTest.java:203)
...
"""

The CL appears unrelated:
https://chromium-review.googlesource.com/c/chromium/src/+/1349530/2
Please note: In the CQ run mentioned in the opening comment, even though the build failed due to real flakiness caused by the CL, the CQ-level retry then caused the next build to succeed.
Cc: dpranke@chromium.org st...@chromium.org
Labels: Infra-Platform-Test
Status: WontFix (was: Assigned)
Summary: If a test flakes in 'retry with patch' [having previously deterministically failed in 'with_patch' and deterministically succeeded in 'without_patch'], there is a high probability that the flakiness is caused by the CL. 

The exception [anecdotally from my observations] is when the test was never run in 'with_patch' [e.g. due to ADB issues]. In that case, flakiness in retry_with_patch causes the whole build to fail, whereas we really wanted something more similar to retrying the whole shard, and allowing flaky tests to be marked as success -- I think the right way to deal with this is to extend Issue 917122 to also apply to timeouts on Android [likely caused by ADB malfunction]. 

================================================================

I wrote a script that fetches all instances of builds that:

1) Ran 'retry_with_patch'
2) Had flakiness in 'retry_with_patch' [some runs PASS, others TIMEOUT/CRASH/FAIL/...]
3) Had somewhere between 1 to 9 flaky tests.

[This script doesn't correctly fetch results for webkit_layout_tests, as those use a different set of result strings]

Here are some partial results:
"""
content_browsertests on Android device Nexus 5 (retry with patch) 8925738755423705568 flaky test count: 3
network_service_interactive_ui_tests (retry with patch) on Windows-10-15063 8925791439818045232 flaky test count: 2
network_service_content_browsertests (retry with patch) on Windows-10-15063 8925937313126258112 flaky test count: 1
browser_tests (retry with patch) 8925857213884101376 flaky test count: 1
browser_tests (retry with patch) 8925909638918282528 flaky test count: 2
browser_tests (retry with patch) 8925937313126258144 flaky test count: 1
browser_tests (retry with patch) 8925937313126258080 flaky test count: 3
browser_tests (retry with patch) 8925912939460364624 flaky test count: 4
browser_tests (retry with patch) 8925908805228630928 flaky test count: 9
browser_tests (retry with patch) 8925912939460364400 flaky test count: 1
...
"""

I then manually audited around ~10 of these results. This sampling was non-uniform so I can't generalize to all results, but for most of the builds I checked -- the flakiness was caused by the CL in question.

I'm going to mark this bug as a WontFix.
compute_stats_for_retry_with_patch.py
6.2 KB View Download

Sign in to add a comment