New issue
Advanced search Search tips

Issue 807938 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Rebaselining and unskipping a test while a WPT import was ongoing caused the latter to land the wrong baseline

Project Member Reported by raphael....@intel.com, Feb 1 2018

Issue description

I got curious when I saw  bug 807791  and its related commits, as the bug should not have happened with our WPT import infrastructure (i.e. import a test but not rebaseline it).

Looking at the import CL (https://chromium-review.googlesource.com/c/chromium/src/+/895811/1) it's possible to see that none of the *_blink_rel trybots reported a failure in external/wpt/html/dom/interfaces.html, even though the failure is present in each bot's layout-test-results.zip, outdir_json and failing_results.json in Google Storage.

It turns out https://chromium-review.googlesource.com/c/chromium/src/+/834758 (from January 10th) marked external/wpt/html/dom/interfaces.html with NeedsManualRebaseline, so the expectations file hadn't been updated since then.

While the import CL was being processed, https://chromium-review.googlesource.com/c/chromium/src/+/874802 ("Rebaseline WPT tests after harmony-function-tostring",  bug 753073 ) was also being worked on. It removed external/wpt/html/dom/interfaces.html from TestExpectations and rebaselined it without the change from https://chromium-review.googlesource.com/c/chromium/src/+/895811 (which hadn't landed yet).

Robert's harmony-function-tostring CL landed at Jan 31 2018 22:52:11 CET, whereas layout tests in the WPT import CL finished running before that and the CL landed at Jan 31 2018 23:23:51 CET. In other words, we had the right expectations for the test for about 30min, when a new version of html.idl was landed without the right expectation due to the "race condition we" had there.

Is there anything we could do to prevent this from happening again?
 
Thanks for your investigation! I agree with your analysis. It's a nasty race condition that I don't have a good solution off the top of my head. 

The hard part is that two good changes with no conflicts might introduce failures when combined together. The failure mode is actually broader than rebaselining+unskipping. In general, one CL can change a test, while the other CL can change the behavior under test. The new test passes with the old behavior, while the new behavior passes with the old test. Both CLs may pass CQ and land (in a timeline similar to this incident), then the waterfall CI breaks.

I believe this is also why CI is always needed even if CQ (pre-merge check) mirrors CI perfectly.
I guess we can just close this bug then?
Let me put in a little more thought to this when I'm done with other more urgent stuff.

Sign in to add a comment