New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 888734 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug


Sign in to add a comment

Reduce false rejects of win7_chromium_rel_ng CQ to 0.

Project Member Reported by erikc...@chromium.org, Sep 24

Issue description

I've been investigating every case where a retrying a failure in win7_chromium_rel_ng results in success. We should drive this towards 0. 
 
Owner: erikc...@chromium.org
Status: Assigned (was: Untriaged)
In the period from 9/20-9/25, there were 4 instances where a retry of win7_chromium_rel_ng went from failure->success

(1) https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/89131 -- compile error took down the build with failure status. Error went away with full rebuild.

(2) https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/89223 -- flaky failure in webkit_layout_tests. Filed  crbug.com/888660 . Test fails 4 times in initial run. In 'retry with patch' test only run once, fails once. Test passes in 'without patch'

(3) https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/90846 -- Different webkit_layout_test failure, similar pattern to previous case. Filed  crbug.com/888746 

(4) https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/91054 -- Different webkit_layout_test failure, similar pattern to previous case. Same test suite as (2) so I reported the same bug.

To reduce false rejects, we need to: gracefully handle compile error -- that can be caused by broken ToT or temporary bot issue.

Find a way to better recover from failed webkit tests. This includes: 
1) Figure out why flaky failing tests tend to fail on retry.
2) When rerunning failing tests 'retry with patch', run test more than once.
It's tough to see what we should do about a compile failure. If we have an incremental failure that goes away if you do a clobber and a rebuild (at the same revision, with a patch), you can't easily tell if the problem is due to the patch or not. I.e., the problem is likely a missing dependency somewhere, but I'm not sure if you can tell whether the missing dependency was introduced by the patch.

Similarly, if you do a retry with patch at a new revision, I'm not sure if that would allow you to conclude that the problem wasn't a missing dependency introduced by your patch; you might just have shifted the timing, or you might've generated the dependency.
Project Member

Comment 3 by bugdroid1@chromium.org, Sep 25

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/3addf88f94ea471756a8dc50230053678759c593

commit 3addf88f94ea471756a8dc50230053678759c593
Author: erikchen <erikchen@chromium.org>
Date: Tue Sep 25 19:20:14 2018

Always retry webkit_layout_test failures 3 times.

By default, webkit_layout_test failurse are retried 3 times. But when
'--test-list' is passed, the retry count defaults to 0. To simplify the logic,
this CL makes it so that webkit_layout_test failures are always retried 3 times.

Bug: 888734
Change-Id: I39f3494b87c3703738e95dcd0b29ac67a80622b1
Reviewed-on: https://chromium-review.googlesource.com/1242421
Reviewed-by: Stephen Martinis <martiniss@chromium.org>
Commit-Queue: Erik Chen <erikchen@chromium.org>

[modify] https://crrev.com/3addf88f94ea471756a8dc50230053678759c593/scripts/slave/recipe_modules/chromium_tests/tests/steps/blink_test.expected/android.json
[modify] https://crrev.com/3addf88f94ea471756a8dc50230053678759c593/scripts/slave/recipes/blink_downstream.expected/full_client_v8_fyi_V8_Blink_Win_fail.json
[modify] https://crrev.com/3addf88f94ea471756a8dc50230053678759c593/scripts/slave/recipes/blink_downstream.expected/full_client_v8_fyi_V8_Blink_Mac_fail.json
[modify] https://crrev.com/3addf88f94ea471756a8dc50230053678759c593/scripts/slave/recipes/blink_downstream.expected/webkit_layout_tests_interrupted.json
[modify] https://crrev.com/3addf88f94ea471756a8dc50230053678759c593/scripts/slave/recipes/blink_downstream.expected/webkit_layout_tests_unexpected_error.json
[modify] https://crrev.com/3addf88f94ea471756a8dc50230053678759c593/scripts/slave/recipes/blink_downstream.expected/full_client_v8_fyi_V8_Blink_Linux_64_fail.json
[modify] https://crrev.com/3addf88f94ea471756a8dc50230053678759c593/scripts/slave/recipes/blink_downstream.expected/full_client_v8_fyi_V8_Blink_Linux_64__dbg__pass.json
[modify] https://crrev.com/3addf88f94ea471756a8dc50230053678759c593/scripts/slave/recipe_modules/chromium_tests/tests/steps/blink_test.expected/big.json
[modify] https://crrev.com/3addf88f94ea471756a8dc50230053678759c593/scripts/slave/recipes/blink_downstream.expected/full_client_v8_fyi_V8_Blink_Linux_64_pass.json
[modify] https://crrev.com/3addf88f94ea471756a8dc50230053678759c593/scripts/slave/recipes/blink_downstream.expected/full_client_v8_fyi_V8_Blink_Linux_64__dbg__fail.json
[modify] https://crrev.com/3addf88f94ea471756a8dc50230053678759c593/scripts/slave/recipes/blink_downstream.expected/full_client_v8_fyi_V8_Blink_Linux_64___future_fail.json
[modify] https://crrev.com/3addf88f94ea471756a8dc50230053678759c593/scripts/slave/recipe_modules/chromium_tests/tests/steps/blink_test.expected/win.json
[modify] https://crrev.com/3addf88f94ea471756a8dc50230053678759c593/scripts/slave/recipes/blink_downstream.expected/minimal_pass_continues.json
[modify] https://crrev.com/3addf88f94ea471756a8dc50230053678759c593/scripts/slave/recipe_modules/chromium_tests/steps.py
[modify] https://crrev.com/3addf88f94ea471756a8dc50230053678759c593/scripts/slave/recipe_modules/chromium_tests/tests/steps/blink_test.expected/unexpected_flakes.json
[modify] https://crrev.com/3addf88f94ea471756a8dc50230053678759c593/scripts/slave/recipes/blink_downstream.expected/too_many_failures_for_retcode.json
[modify] https://crrev.com/3addf88f94ea471756a8dc50230053678759c593/scripts/slave/recipes/blink_downstream.expected/full_client_v8_fyi_V8_Blink_Linux_64___future_pass.json
[modify] https://crrev.com/3addf88f94ea471756a8dc50230053678759c593/scripts/slave/recipes/blink_downstream.expected/full_client_v8_fyi_V8_Blink_Mac_pass.json
[modify] https://crrev.com/3addf88f94ea471756a8dc50230053678759c593/scripts/slave/recipes/blink_downstream.expected/full_client_v8_fyi_V8_Blink_Win_pass.json

Project Member

Comment 4 by bugdroid1@chromium.org, Sep 26

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/74093950f272d0f7ca5a5984c6e4f350795314f1

commit 74093950f272d0f7ca5a5984c6e4f350795314f1
Author: erikchen <erikchen@chromium.org>
Date: Wed Sep 26 18:04:04 2018

Always retry webkit_layout_test failures 3 times.

By default, webkit_layout_test failurse are retried 3 times. But when
'--test-list' is passed, the retry count defaults to 0. To simplify the logic,
this CL makes it so that webkit_layout_test failures are always retried 3 times.

Bug: 888734
Change-Id: I46d15d71c979e5f12f334f5dcff41637f3f3aef4
Reviewed-on: https://chromium-review.googlesource.com/1244018
Commit-Queue: Erik Chen <erikchen@chromium.org>
Reviewed-by: Stephen Martinis <martiniss@chromium.org>
Cr-Commit-Position: refs/heads/master@{#594383}
[modify] https://crrev.com/74093950f272d0f7ca5a5984c6e4f350795314f1/testing/buildbot/chromium.android.json
[modify] https://crrev.com/74093950f272d0f7ca5a5984c6e4f350795314f1/testing/buildbot/chromium.clang.json
[modify] https://crrev.com/74093950f272d0f7ca5a5984c6e4f350795314f1/testing/buildbot/chromium.fyi.json
[modify] https://crrev.com/74093950f272d0f7ca5a5984c6e4f350795314f1/testing/buildbot/chromium.linux.json
[modify] https://crrev.com/74093950f272d0f7ca5a5984c6e4f350795314f1/testing/buildbot/chromium.mac.json
[modify] https://crrev.com/74093950f272d0f7ca5a5984c6e4f350795314f1/testing/buildbot/chromium.webkit.json
[modify] https://crrev.com/74093950f272d0f7ca5a5984c6e4f350795314f1/testing/buildbot/chromium.win.json
[modify] https://crrev.com/74093950f272d0f7ca5a5984c6e4f350795314f1/testing/buildbot/test_suites.pyl

Blocking: 892225
Cc: nednguyen@chromium.org martiniss@chromium.org liaoyuke@chromium.org st...@chromium.org
I audited every flaky failure [11] caused by win7_chromium_rel_ng between 10-03 and 10-04. Notes are in attachment. Here's what we need to fix:

(1) 'retry without patch' runs a failing test up to X times, and checks to see if it ever succeeds. Instead, it needs to rerun a failing test X times, and check to see if it ever fails.

context: ToT contained a flaky test [> 50% flakiness]. Here's a common sequence of events:

* Test fails 4 times 'with patch'.
* Test fails three times. Passes on 4th attempt in 'without patch'.
* Test fails 4 times in 'retry without patch'.

CL is marked as a failure, even though it should be marked as a success. Tracked at issue 892307.

(2) Compile error causes immediate recipe failure. Compile error appears to be ops related machine issue. Something cygwin/NaCl related. We need to be able to gracefully recover. Tracked at issue 892309

(3) flakiness in webkit_layout_tests, tracked at  Issue 889036 .
win7_flaky_audit_10_03_10_04.txt
10.5 KB View Download
Labels: Infra-Platform-Test
Blocking: 915319

Sign in to add a comment