New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 844264 link

Starred by 2 users

Issue metadata

Status: Duplicate
Merged: issue 852588
Owner:
Closed: Jun 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

FindIt reverted a CL and broke the build due to a depending CL landed first

Project Member Reported by thestig@chromium.org, May 18 2018

Issue description

1) r559371 landed on 2018/05/16 17:49 PDT.
2) r559751, which depends on (1), landed 2018/05/17 17:24.
3) FindIt reverted (1) at 2018/05/17 17:47 as r559755.

Now the tree is broken.
 

Comment 1 by st...@chromium.org, May 18 2018

Cc: m...@chromium.org
Labels: -Pri-3 Pri-1
Status: Available (was: Untriaged)
@thestig: many thanks for the bug and fixing the breakage!

A bad timing. The original CL was landed within a 24-hour time window upon when it is identified as the culprit. (Just a 2-minute gap).

Maybe we'd better make it more strict, like a 16-hour time window.
There seems no safe solution here -- even we let the revert go through CQ, there is till a chance of race (e.g. r559751 lands after the revert is in CQ but before the revert lands).
Chan: could you follow up on the auto-revert? Or should Jeff follow up?


@miu: would you mind confirming whether Findit's revert is a false positive?
1) You mentioned that there are other tests failing. What are they?
2) For the output "Xlib:  extension "RANDR" missing on display ":99".", is it a warning or an error? Is "RANDR" required to run the newly added test in r559371? If so, why some iterations passed without the error while the other timed out? You may check all the test runs in this analysis https://findit-for-me.appspot.com/waterfall/flake?key=ag9zfmZpbmRpdC1mb3ItbWVyxQELEhdNYXN0ZXJGbGFrZUFuYWx5c2lzUm9vdCKOAWNocm9taXVtLmxpbnV4L0xpbnV4IFRlc3RzLzY5OTY2L2NvbnRlbnRfYnJvd3NlcnRlc3RzL1FYVnlZVmRwYm1SdmQxWnBaR1Z2UTJGd2RIVnlaVVJsZG1salpVSnliM2R6WlhKVVpYTjBVQzVEWVhCMGRYSmxjME52Ym5SbGJuUkRhR0Z1WjJWekx6ST0MCxITTWFzdGVyRmxha2VBbmFseXNpcxgBDA

Jeff: could you follow up on the analysis finding?

Comment 2 by st...@chromium.org, May 18 2018

Cc: chanli@chromium.org lijeffrey@chromium.org

Comment 3 by st...@chromium.org, May 18 2018

Summary: FindIt reverted a CL and broke the build due to a depending CL landed first (was: FindIt reverted a CL and broke the build)

Comment 4 by m...@chromium.org, May 18 2018

It turns out Findit was correct. The flake was caused by a transient color space issue that occurred in the middle of the test, throwing off some logic. I've fixed and relanded the change: https://chromium-review.googlesource.com/c/chromium/src/+/1066676

Idea: When I originally uploaded both CLs, Gerrit noticed my local branching and made the 2nd one dependent on the first. If Findit had access to that metadata, it might have delegated to the sheriffs for manual handling in this case. FWIW, however, I did have to rebase off ToT before landing the 2nd CL (so the dependency metadata may have been obliterated by the time it landed).

Another idea: Perhaps Findit could do its own simple "test a revert" build. When it's about to auto-revert, it could do a bot build with the CL reverted to make sure this doesn't break the compile step (or any earlier step). With this functionality, it seems that the time window issue could be resolved.

Answer to questions: This was an "example build" provided by Findit: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_rel_ng/96258 I looked at the output of content_browsertests: https://chromium-swarm.appspot.com/task?id=3d87ed12bf261410&refresh=10&show_raw=1&wide_logs=true From that, I noticed these other tests had timed-out/failed (and had the similar XRANDR error: TouchAccessibilityBrowserTest.TouchExplorationSendsHoverEvents (TIMED OUT), TouchSelectionForCrossProcessFramesTests/TouchSelectionControllerClientAuraSiteIsolationTest.BasicSelectionIsolatedScrollMainframe/1 (FAILED) I'm not sure if "RANDR" was required, but it seemed like it could be symptomatic of a bot config issue. My new browser tests DID depend on having pixel output enabled, since they were screen capture-related. Also, I was quick to judge because I've encountered this issue for other changes in the past (though not in the last 2 years or so).

Comment 5 by st...@chromium.org, May 30 2018

Owner: chanli@chromium.org
Status: Assigned (was: Available)
Many thanks miu@ for the suggestions! We will follow up further to reduce the chance of  similar cases.

chanli@: do you want to take this one? Please feel free to assign it back to me.

Comment 6 by chanli@chromium.org, May 30 2018

Sure. I'll put this in my list.
And somehow I end up being the one who notices and reverts.

Comment 9 by chanli@chromium.org, Jun 13 2018

Mergedinto: 852588
Status: Duplicate (was: Assigned)
Project Member

Comment 10 by bugdroid1@chromium.org, Jun 21 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/fcdc7a43b260f8691ef85f78951d9895e1ece4ad

commit fcdc7a43b260f8691ef85f78951d9895e1ece4ad
Author: Chan <chanli@chromium.org>
Date: Thu Jun 21 02:37:34 2018

[Findit] Don't commit a revert for a culprit of test/flake failure if the culprit author has committed another change.

Noted that this is a naive fix which can potentially stop us committing some reverts that are actually safe to revert. Later a more reliable solution(most likely another try job) would be in place to compensate that.

Bug:  844264 
Change-Id: I02fc4cecbf5d8123acb1dc030a394db92f524b12
Reviewed-on: https://chromium-review.googlesource.com/1096418
Commit-Queue: Chan Li <chanli@chromium.org>
Reviewed-by: Shuotao Gao <stgao@chromium.org>
Reviewed-by: Jeffrey Li <lijeffrey@chromium.org>

[modify] https://crrev.com/fcdc7a43b260f8691ef85f78951d9895e1ece4ad/appengine/findit/services/compile_failure/compile_failure_analysis.py
[modify] https://crrev.com/fcdc7a43b260f8691ef85f78951d9895e1ece4ad/appengine/findit/services/flake_failure/culprit_util.py
[modify] https://crrev.com/fcdc7a43b260f8691ef85f78951d9895e1ece4ad/appengine/findit/services/test_failure/test_culprit_action.py
[modify] https://crrev.com/fcdc7a43b260f8691ef85f78951d9895e1ece4ad/appengine/findit/services/test_failure/test/test_culprit_action_test.py
[modify] https://crrev.com/fcdc7a43b260f8691ef85f78951d9895e1ece4ad/appengine/findit/services/test_failure/test_failure_analysis.py
[modify] https://crrev.com/fcdc7a43b260f8691ef85f78951d9895e1ece4ad/appengine/findit/libs/gitiles/git_repository.py
[modify] https://crrev.com/fcdc7a43b260f8691ef85f78951d9895e1ece4ad/appengine/findit/services/git.py
[modify] https://crrev.com/fcdc7a43b260f8691ef85f78951d9895e1ece4ad/appengine/findit/libs/gitiles/gitiles_repository.py
[modify] https://crrev.com/fcdc7a43b260f8691ef85f78951d9895e1ece4ad/appengine/findit/services/flake_failure/test/culprit_util_test.py
[modify] https://crrev.com/fcdc7a43b260f8691ef85f78951d9895e1ece4ad/appengine/findit/pipelines/test_failure/revert_and_notify_test_culprit_pipeline.py
[modify] https://crrev.com/fcdc7a43b260f8691ef85f78951d9895e1ece4ad/appengine/findit/services/test/git_test.py

Project Member

Comment 11 by bugdroid1@chromium.org, Jul 20

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/418489634e9132b22bcdb6c9eeedc6d28cd492a1

commit 418489634e9132b22bcdb6c9eeedc6d28cd492a1
Author: Chan <chanli@chromium.org>
Date: Fri Jul 20 23:30:14 2018

[Findit] Don't commit a revert if there is already a change depending on the culprit.

Query Gerrit to get all changes which are authored by the culprit's author, has Code-Review+1 label and modified after the culprit's commit time.
Then checks if in any of the patchsets of these changes they have depended on the culprit. If yes, consider a dependency found and don't commit the revert.

Bug:  844264 
Change-Id: Iab96a577da4e616a15925155f7bfc0767d08c046
Reviewed-on: https://chromium-review.googlesource.com/1125313
Commit-Queue: Chan Li <chanli@chromium.org>
Reviewed-by: Roberto Carrillo <robertocn@chromium.org>

[modify] https://crrev.com/418489634e9132b22bcdb6c9eeedc6d28cd492a1/appengine/findit/services/flake_failure/culprit_util.py
[modify] https://crrev.com/418489634e9132b22bcdb6c9eeedc6d28cd492a1/appengine/findit/services/test_failure/test_culprit_action.py
[modify] https://crrev.com/418489634e9132b22bcdb6c9eeedc6d28cd492a1/appengine/findit/services/test_failure/test/test_try_job_test.py
[modify] https://crrev.com/418489634e9132b22bcdb6c9eeedc6d28cd492a1/appengine/findit/services/compile_failure/compile_try_job.py
[modify] https://crrev.com/418489634e9132b22bcdb6c9eeedc6d28cd492a1/appengine/findit/services/git.py
[modify] https://crrev.com/418489634e9132b22bcdb6c9eeedc6d28cd492a1/appengine/findit/services/test/build_ahead_test.py
[modify] https://crrev.com/418489634e9132b22bcdb6c9eeedc6d28cd492a1/appengine/findit/waterfall/test/suspected_cl_util_test.py
[modify] https://crrev.com/418489634e9132b22bcdb6c9eeedc6d28cd492a1/appengine/findit/services/culprit_action.py
[modify] https://crrev.com/418489634e9132b22bcdb6c9eeedc6d28cd492a1/appengine/findit/services/flake_failure/test/culprit_util_test.py
[modify] https://crrev.com/418489634e9132b22bcdb6c9eeedc6d28cd492a1/appengine/findit/services/test_failure/test/test_culprit_action_test.py
[modify] https://crrev.com/418489634e9132b22bcdb6c9eeedc6d28cd492a1/appengine/findit/services/test/gerrit_test.py
[modify] https://crrev.com/418489634e9132b22bcdb6c9eeedc6d28cd492a1/appengine/findit/services/gerrit.py
[modify] https://crrev.com/418489634e9132b22bcdb6c9eeedc6d28cd492a1/appengine/findit/services/test/culprit_action_test.py
[modify] https://crrev.com/418489634e9132b22bcdb6c9eeedc6d28cd492a1/appengine/findit/services/test/git_test.py
[modify] https://crrev.com/418489634e9132b22bcdb6c9eeedc6d28cd492a1/appengine/findit/services/test_failure/test_try_job.py

Sign in to add a comment