New issue
Advanced search Search tips

Issue 765273 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jan 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

webkit-patch rebaseline-cl removes lines with [ Failure Pass ] or [ Skip ]

Project Member Reported by wangxianzhu@chromium.org, Sep 14 2017

Issue description

Steps to reproduce:

1. Create a local branch
2. git cl patch --force https://chromium-review.googlesource.com/c/chromium/src/+/664242
3. webkit-patch rebaseline-cl

Expected result:
For tests marked Skip or flaky on some platforms, we should rebaseline them on other platforms, and leave the TestExpectations lines as-is.

Actual result:
The script removes lines containing [ Failure Pass ] or [ Skip ], etc. from TestExpectations:

$ git diff LayoutTests/TestExpectations
--- a/third_party/WebKit/LayoutTests/TestExpectations
+++ b/third_party/WebKit/LayoutTests/TestExpectations
@@ -2915,7 +2915,6 @@  crbug.com/736177  [ Mac10.12 ] css2.1/t1202-counter-04-b.html [ Failure Pass ]
  crbug.com/736177  [ Mac10.12 ] css2.1/t1202-counters-04-b.html [ Failure Pass ]
  crbug.com/736177  [ Mac10.12 ] fast/block/basic/001.html [ Failure Pass ]
  crbug.com/736177  [ Mac10.12 ] fast/block/float/intruding-painted-twice.html [ Failure Pass ]
- crbug.com/736177  [ Mac10.12 ] fast/borders/inline-mask-overlay-image-outset.html [ Failure Pass ]
  crbug.com/736177  [ Mac10.12 ] fast/css-generated-content/012.html [ Failure Pass ]
  crbug.com/736177  [ Mac10.12 ] fast/css-generated-content/014.html [ Failure Pass ]
  crbug.com/736177  [ Mac10.12 ] fast/css/acid2-pixel.html [ Failure Pass ]
@@ -3484,7 +3483,6 @@ crbug.com/757165 [ Win ] http/tests/inspector/extensions/extensions-panel.html [
 crbug.com/757165 [ Win ] http/tests/inspector/sources/ui-source-code-metadata.html [ Skip ]
 crbug.com/757165 [ Win ] http/tests/misc/client-hints-accept-meta-preloader.html [ Skip ]
 crbug.com/757165 [ Win ] inspector-protocol/debugger/debugger-evaluate-in-worker-while-pause-in-page.js [ Skip ]
-crbug.com/757165 [ Win ] paint/invalidation/filter-repaint-accelerated-child-with-filter-child.html [ Skip ]
 crbug.com/757165 [ Win ] paint/invalidation/filter-repaint-on-accelerated-layer.html [ Skip ]
 crbug.com/757165 [ Win ] paint/invalidation/scrollbar-damage-and-full-viewport-repaint.html [ Skip ]
 crbug.com/757165 [ Win ] virtual/gpu/fast/canvas/canvas-filter-modified-save-restore.html [ Skip ]


 
Labels: -OS-Linux
Status: Available (was: Untriaged)
So, the proposed behavior is that rebaseline-cl should only rebaseline platforms that don't have flaky or skip expectations, because presumably if there are flaky expectation lines then rebaselining won't resolve the flakiness, and if the test is skipped, there's some reason why we don't want to run it.

Background:
The actual updating of the TestExpectations file is done when AbstractParallelRebaselineCommand.rebaseline is called. This method takes a TestBaselineSet, which represents the combination of platforms/tests to download baselines for.

Resolving this issue might involve changing rebaseline_cl.py:
 * Read in the expectations file
 * For each test to be rebaselined:
   * list the ports with skip/flaky expectations
   * remove those ports from the set of ports to rebaseline that test for
 * Also print log messages
#c1 SGTM.

Just one comment about Skip: perhaps it might be better to assume that the test produces the same new result on the specified platforms as on the fallback platforms. This will reduce extra platform baselines. For example, for paint/invalidation/filter-repaint-accelerated-child-with-filter-child.html, we have only the common baseline, but rebaseline-cl will create new baselines for linux and mac by assuming windows still produces the old result. If we assumed that windows also produce the same new result, we would update the common baseline only. The script can handle this in the same way as it fills in missing results for builders without result.

Comment 3 by foolip@chromium.org, Dec 18 2017

Cc: robertma@chromium.org
robertma@, have you observed this to be a problem for wpt import?
I didn't realize rebaseline-cl would also rebaseline tests marked as flaky... That's a bit counter-intuitive. But I don't remember seeing examples of this. .

As for skipped tests, I'm confused why they'd be rebaselined. IIUC, rebaseline-cl downloads results from the try jobs; and since the tests are skipped, they wouldn't have results in the first place. What do I miss?

I'll double check the code and the repro case.
I'm not sure, but I think the complicating issue here is: what happens when a test is expected flaky or skipped on some platforms, and we want to rebaseline for the other platforms, e.g. marked flaky on mac, skipped on android, and we want to update baselines for linux and win.

In theory, there is logic right now in rebaseline.py to handle rebaselining some platforms but not others: https://cs.chromium.org/chromium/src/third_party/WebKit/Tools/Scripts/webkitpy/tool/commands/rebaseline.py?l=383

In AbstractParallelRebaselineCommand.rebaseline, the platforms to rebaseline are kept inside the TestBaselineSet object that's passed in, and the lines/platforms to update in the TestExpectations file should be tracked in the lines_to_remove variable.

I haven't yet tried to reproduce this so I don't know what the actual behavior is :-/ Looking back on rebaseline.py I see that there were still more refactoring changes that I think should be made, and the updating of the test expectations files could possibly be simplified...
Cc: leon....@intel.com
 Issue 799076  has been merged into this issue.
Cc: -robertma@chromium.org
Components: Blink>Infra>Ecosystem
Labels: -Pri-2 Pri-1
Owner: robertma@chromium.org
Status: Started (was: Available)
Starting to work on this as a P1 as it blocks wpt-import.
I plan to fix the wrong update of expectations first, i.e., stop removing flaky or skipped expectations incorrectly.

As for optimizing the baselines, it'd be a bit more trickier. Though we do have logic for filling in missing results, it currently works on the level of builds, i.e. if a whole build is missing (did not complete), another build is selected to fill in that port. Yet in this case, the skipped platform will have complete build, but the skipped test cannot be found in the build. The fill-missing logic has to be modified to work on individual test level. It probably won't be too hard, but I'd like to separate the tasks: fix first, and then optimize.
Besides, regarding optimization, optimize-baseline might be in the better position. There, we can pretend skipped platforms have some kind of "wildcard baselines" that match everything, and then optimize-baseline can exercise its de-dupe algorithm. This also has the extra benefit of always achieving the optimal result, which cannot be guaranteed with the fill-missing approach (if baselines on other non-skipped ports are not all the same).
https://crrev.com/c/857600 for the fix
Project Member

Comment 11 by bugdroid1@chromium.org, Jan 10 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/78fefdf15bd63c0ef3c462ce97d2e86f2997d107

commit 78fefdf15bd63c0ef3c462ce97d2e86f2997d107
Author: Robert Ma <robertma@chromium.org>
Date: Wed Jan 10 23:16:16 2018

Stop removing flaky or skip expectations when rebaselining

Previously, `wekbit-patch rebaseline{-*}` would remove an expectation if
no baseline suffixes could be found for a test run, which was intended
to remove failure expectations for passing tests. (Behaviour was
introduced at 04d907f857afaabad4705d8b9c4919950d238070).

However, there were two problems in using the triggering condition:
1. Only failing_results.json was fetched, which does not include entries
   for skipped tests, so skipped tests might be incorrectly considered
   as unexpected passes and the skip expectations would be removed.
2. Flaky expectations were not honored. A passing result of a flaky test
   would cause its expectation to be removed.

Hence, we should 1) fetch full results; 2) rely on the "is_unexpected"
field in the results. In addition, 3) a pre-caution is added to check
the port name of the builder, and only proceed to update expectations if
the builder's port is same as the port we are rebaselining. This is
necessary because rebaseline-cl can fill in missing results of a port
with other builders, and we don't want to update expectations in such
case.

Unit tests are also overhauled and reorganized.

Bug:  765273 
Change-Id: I2631a7244e2c95a33ab0ef2322d24dfc946899f3
Reviewed-on: https://chromium-review.googlesource.com/857600
Commit-Queue: Robert Ma <robertma@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>
Cr-Commit-Position: refs/heads/master@{#528467}
[modify] https://crrev.com/78fefdf15bd63c0ef3c462ce97d2e86f2997d107/third_party/WebKit/Tools/Scripts/webkitpy/tool/commands/rebaseline.py
[modify] https://crrev.com/78fefdf15bd63c0ef3c462ce97d2e86f2997d107/third_party/WebKit/Tools/Scripts/webkitpy/tool/commands/rebaseline_cl.py
[modify] https://crrev.com/78fefdf15bd63c0ef3c462ce97d2e86f2997d107/third_party/WebKit/Tools/Scripts/webkitpy/tool/commands/rebaseline_test_unittest.py
[modify] https://crrev.com/78fefdf15bd63c0ef3c462ce97d2e86f2997d107/third_party/WebKit/Tools/Scripts/webkitpy/tool/commands/rebaseline_unittest.py

Status: Fixed (was: Started)
The CL above should provide correctness.

The optimization issue is tracked in a new issue 801287, which is a desired feature, but not an urgent bug. Closing this bug as FIXED.

Sign in to add a comment