New issue
Advanced search Search tips
Starred by 2 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment
link

Issue 701234: [WPT import] Import doesn't add flaky expectations for flaky tests

Reported by qyears...@chromium.org, Mar 14 2017 Project Member

Issue description

One of the general recurring problems that the auto-import process has is how to deal with flaky tests.

In theory, a successful import generally involves running the tests once on all of the tryserver.blink bots with the import patch, and then a second time on all of the CQ bots. If it's consistent between those runs, then it gets committed.

But it's still fairly easy for flaky tests to get past this.

Example case: flaky test external/wpt/service-workers/service-worker/clients-matchall-order.https.html was committed in https://chromium-review.googlesource.com/c/452943/.
 

Comment 1 by qyears...@chromium.org, Apr 15 2017

The last few failures of wpt-importer were all affected by tests with non-deterministic tests, e.g. testharness tests where:
 - there is something like a timestamp or generated code in the output
 - the order of PASS/FAIL lines can change depending on timing

Whenever this happens, currently, the importer adds -expected.txt files, then subsequently fails in the CQ.

Proposal to handle most of these types of failures: If a test fails in the CQ with a text mismatch on any CQ bot:
 (1) Add a "failure" expectation for that test for all platforms that failed earlier
 (2) Remove all the -expected.txt files. (This is what I've done for some manual imports.)

Another simpler idea that would handle more cases would be: If any imported test fails on the CQ in any way, add a skip expectation and retry. This would be simple and would allow import to proceed more reliably for non-flaky tests, but it would imply that someone has to go back and triage those skipped tests...

Comment 2 by foolip@chromium.org, Apr 17 2017

Is the simpler idea to skip in W3CImportExpectations, or in TestExpectations? Importing failing tests is very useful in order to later get them passing, so skipping in W3CImportExpectations would be unfortunate.

Would it be at all possible to run new tests with --iterations=10 or --repeat-each=10 and then add expectations for all of the results that were seen?

Comment 3 by qyears...@chromium.org, Apr 17 2017

Blockedon: 679742
I think it would be just as simple to skip in TestExpectations (leaving the file there, but still requiring someone to later unskip the test if they want to run it).


We just had a meeting where I talked about this; to summarize what I said there:

 - Running all new tests multiple times and adding expectations would capture Crash/Pass, Timeout/Pass, Fail/Timeout/Pass, etc. Flakiness, but not flakiness where the test always fails but with different text mismatch.

 - In order to do this now, we would probably want to just rely on failure retries instead of using --iterations or --repeat-each, since running all tests many times would take a long time. Also we still want to run all tests rather than just affected tests, since a wpt import can affect tests indirectly by changing resources (including testharness.js).


I just realized that we're already looking at all of the actual results and expecting all actual results if a test was retried in some cases; so for a test that had TIMEOUT then PASS, a [ Timeout Pass ] expectations will be added: https://chromium.googlesource.com/chromium/src/third_party/+/44801ce5aecc6168018b558fff455f069e77eaec/WebKit/Tools/Scripts/webkitpy/w3c/wpt_expectations_updater.py#204

For now, I believe the behavior for testharness tests with actual result "TEXT PASS" might be to consider them passing and not rebaseline, although in this case it would be best to add a [ Pass Failure ] expectation.

Meanwhile, in order to resolve flakiness where the test fails with different failure messages or different subtest order, we could perhaps continue changing the testharness output to contain only the essential information about which subtests failed: bug 679742.


Next steps:
 - Continue with the proposal in bug 679742 if possible
 - Ensure that if a test has actual results "TEXT PASS" in a try job, a "[ Pass Failure ]" expectation should be added.
 - In the last case, if an import fails CQ, try adding a Skip expectation and retry CQ.

Comment 4 by foolip@chromium.org, Jul 3 2017

Components: Blink>Infra>Ecosystem

Comment 5 by foolip@chromium.org, Jul 3 2017

Components: -Blink>Infra>Predictability

Comment 6 by qyears...@chromium.org, Aug 21 2017

Blockedon: -679742

Comment 7 by foolip@chromium.org, Aug 22 2017

Labels: -Pri-2 Pri-1
Owner: qyears...@chromium.org
Calling this a P1 because it can result in the importer checking in expectations for flaky tests that will fail, even though one could tell from the set of test results in the automatic import that the test was flaky.

Comment 8 by qyears...@chromium.org, Aug 23 2017

Summary: [WPT import] Import doesn't add flaky expectations for flaky tests (was: [WPT auto-import] Import doesn't handle flaky tests very well.)

Comment 9 by qyears...@chromium.org, Aug 23 2017

Status: Assigned (was: Available)

Comment 10 by qyears...@chromium.org, Sep 7 2017

Labels: -Pri-1 Pri-2
Status: Available (was: Assigned)

Comment 11 by qyears...@chromium.org, Sep 7 2017

Cc: qyears...@chromium.org
Owner: ----

Comment 12 by rbyers@chromium.org, Nov 7 2017

Cc: robertma@chromium.org
bug triage ping: robertma, what's your take?  Foolip argued in #7 that this should be Pri-7, WDYT?

Comment 13 by robertma@chromium.org, Nov 7 2017

Labels: -Pri-2 Pri-3
P3 sounds right. There's no short-term plan for this.

I reached out to the FindIt team last time. They have plans to make FindIt to support layout tests. Combined with the FindIt flake analyzer, it might be a valuable tool for us tackling this problem. But it'll be in the next year.

Comment 14 by sheriffbot@chromium.org, Nov 7

Project Member
Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 15 by foolip@chromium.org, Nov 13

Status: Available (was: Untriaged)
robertma@, does this bug still make sense? Should it be blocked on a bug to run affected tests many times to detect flakiness in the first place? And if we have that infrastructure, won't it actually be blocking imports rather than adding flaky expectations?

Sign in to add a comment