[WPT import] Import doesn't add flaky expectations for flaky tests |
||||||||||||||
Issue descriptionOne of the general recurring problems that the auto-import process has is how to deal with flaky tests. In theory, a successful import generally involves running the tests once on all of the tryserver.blink bots with the import patch, and then a second time on all of the CQ bots. If it's consistent between those runs, then it gets committed. But it's still fairly easy for flaky tests to get past this. Example case: flaky test external/wpt/service-workers/service-worker/clients-matchall-order.https.html was committed in https://chromium-review.googlesource.com/c/452943/.
,
Apr 17 2017
Is the simpler idea to skip in W3CImportExpectations, or in TestExpectations? Importing failing tests is very useful in order to later get them passing, so skipping in W3CImportExpectations would be unfortunate. Would it be at all possible to run new tests with --iterations=10 or --repeat-each=10 and then add expectations for all of the results that were seen?
,
Apr 17 2017
I think it would be just as simple to skip in TestExpectations (leaving the file there, but still requiring someone to later unskip the test if they want to run it). We just had a meeting where I talked about this; to summarize what I said there: - Running all new tests multiple times and adding expectations would capture Crash/Pass, Timeout/Pass, Fail/Timeout/Pass, etc. Flakiness, but not flakiness where the test always fails but with different text mismatch. - In order to do this now, we would probably want to just rely on failure retries instead of using --iterations or --repeat-each, since running all tests many times would take a long time. Also we still want to run all tests rather than just affected tests, since a wpt import can affect tests indirectly by changing resources (including testharness.js). I just realized that we're already looking at all of the actual results and expecting all actual results if a test was retried in some cases; so for a test that had TIMEOUT then PASS, a [ Timeout Pass ] expectations will be added: https://chromium.googlesource.com/chromium/src/third_party/+/44801ce5aecc6168018b558fff455f069e77eaec/WebKit/Tools/Scripts/webkitpy/w3c/wpt_expectations_updater.py#204 For now, I believe the behavior for testharness tests with actual result "TEXT PASS" might be to consider them passing and not rebaseline, although in this case it would be best to add a [ Pass Failure ] expectation. Meanwhile, in order to resolve flakiness where the test fails with different failure messages or different subtest order, we could perhaps continue changing the testharness output to contain only the essential information about which subtests failed: bug 679742. Next steps: - Continue with the proposal in bug 679742 if possible - Ensure that if a test has actual results "TEXT PASS" in a try job, a "[ Pass Failure ]" expectation should be added. - In the last case, if an import fails CQ, try adding a Skip expectation and retry CQ.
,
Jul 3 2017
,
Jul 3 2017
,
Aug 21 2017
,
Aug 22 2017
Calling this a P1 because it can result in the importer checking in expectations for flaky tests that will fail, even though one could tell from the set of test results in the automatic import that the test was flaky.
,
Aug 23 2017
,
Aug 23 2017
,
Sep 7 2017
,
Sep 7 2017
,
Nov 7 2017
bug triage ping: robertma, what's your take? Foolip argued in #7 that this should be Pri-7, WDYT?
,
Nov 7 2017
P3 sounds right. There's no short-term plan for this. I reached out to the FindIt team last time. They have plans to make FindIt to support layout tests. Combined with the FindIt flake analyzer, it might be a valuable tool for us tackling this problem. But it'll be in the next year.
,
Nov 7
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Nov 13
robertma@, does this bug still make sense? Should it be blocked on a bug to run affected tests many times to detect flakiness in the first place? And if we have that infrastructure, won't it actually be blocking imports rather than adding flaky expectations? |
||||||||||||||
►
Sign in to add a comment |
||||||||||||||
Comment 1 by qyears...@chromium.org
, Apr 15 2017