Cc: qyears...@chromium.org Components: Blink>Infra>Ecosystem Owner: ---- Status: Available (was: Untriaged) Summary: In wpt import, rebaseline-cl fails to optimize virtual test baselines (was: web platform test importer generates extra expectation files)
Good point -- in that case it looks like if the virtual test baselines were removed it should still be correct, since the virtual tests should fall back to the non-virtual test baseline (which is the same).
wpt import uses webkit-patch rebaseline-cl to fetch and then deduplicate baselines. The baseline deduplication code is in baseline_optimizer.py. There is code there to deal with virtual test baselines, but maybe it's not working as expected in this case.
https://cs.chromium.org/chromium/src/third_party/WebKit/Tools/Scripts/webkitpy/common/checkout/baseline_optimizer.py
Note, I believe that this should be testable/reproducible by adding a virtual baseline identical to the non-virtual baseline, and then running `./webkit-patch optimize-baselines <test>`.
Also, falken@ just cleaned up a bunch of redundant virtual baselines in https://chromium-review.googlesource.com/c/chromium/src/+/677996.
optimize-baselines only correctly optimizes virtual tests when baselines are present for all virtual platforms, e.g.:
https://chromium-review.googlesource.com/c/chromium/src/+/699700/1 (see revision 2 for what it looks like after optimization)
If, in the example above, we take away any of the platform/*/virtual/mojo-blobs/fast/files/apply-blob-url-to-img-expected.txt, the optimization fails to remove them.
The root cause is because the original design implements a two-stage optimization for virtual tests:
1. Optimize the virtual fallback paths for platforms. i.e. pushing tests up the tree:
virtual generic <-- win <-- linux
^----------- mac
Each arrow represents a "push", which only happens when all the children are the same. In particular, win and mac baselines are pushed to virtual generic (LT/virtual/) only if they are the same.
2. Check if virtual generic can be removed (if it is identical to the non-virtual baseline).
The code is working as designed. However, the algorithm assumes that before optimization, all baselines need to be pushed down to specific platforms first. Otherwise, we run into this exact bug. For example, consider we only have identical baselines in
* fast/files/apply-blob-url-to-img-expected.txt
* platform/linux/virtual/mojo-blobs/fast/files/apply-blob-url-to-img-expected.txt
The second one can be safely removed. All platforms, virtual or non-virtual, will all fallback to the first one.
However, if we run `webkit optimize-baselines`, nothing will happen, because platform/mac and platform/win baselines are missing.
The fix, is to either guarantee the prerequisite of the algorithm (fix the pushdown), or change the algorithm.
Putting down some notes from my preliminary investigation before vacation so that I don't forget.
We do not directly invoke `webkit-patch optimize-baselines`, but rather use `webkit-patch rebaseline{-cl}` which by default also does optimization. Under the hood, rebaseline{-cl} works like this:
1. `webkit-patch copy-existing-baselines-internal` (an internal command of webkit-patch) to copy baselines from more generic dirs (later in the fallback path) to more specific dirs (but never overwrite more specific ones).
2. Download new baselines of the failing platforms to the most specific dirs possible.
3. `webkit-patch optimize-baselines` (see the comment above for details).
Step 1 is vital for both steps 2 and 3:
* Without step 1, step 2 would be incorrect.
Suppose a fallback path lin -> win -> generic. We only have a generic baseline. If win suddenly starts to fail but lin is same as the (old) generic, downloading a baseline for win will intercept the fallback path. We have to copy generic to lin to be correct.
* Without step 1, step 3 would be sub-optimal.
This is described in the comment above.
Unfortunately, I think the current implementation of step 1 is buggy: https://cs.chromium.org/chromium/src/third_party/WebKit/Tools/Scripts/webkitpy/tool/commands/copy_existing_baselines.py
The comment says it only copies the baseline *one level* down, which is insufficient for steps 2 & 3. However, the command actually *might* copy the baseline down multiple levels due to implementation (it copies baselines in
*each* dir down one level with no guarantees on the order of the dirs).
Some corrections from this week's findings so far:
* Comment 5 was wrong. optimize-baselines does not work by pushing baselines up, but removing redundant baselines instead. The core idea is to traverse baselines in the fallback order and for every baseline, remove it if it is the same as the next one in the fallback order. The root (generic directory), however, is an edge case: the script tries to push all immediate children of the root up to the root (only when they are all the same). This algorithm is much more robust than the one I described in comment 5, and does *not* require baselines to be pushed down in advance.
* Comment 6 hence needs some errata:
Step 1 is only required by step 2. And the current implementation of step 1 is correct. Its docstring (pushing "down one level") actually means for every platform that we are about to download new baselines to, find its current baseline *following the search path", and copy that baseline down to its immediate children. So even if win does not have a baseline, the generic one will be found and copied to linux.
The real bug is perhaps in step 3. The algorithm was designed to work without special prerequisites, but the support of virtual suites seems to introduce some intricate bugs.
I'll try to improve the documentation in the code so that future maintainers do not need to walk this maze again...
Indeed, the baseline optimization is unfortunately pretty complicated, and would be likely to confuse someone in the future!
(Side note: I think the Skia test baseline system works by storing all baseline images in a Google Storage bucket and referring to baselines with SHA-1 hashes instead of keeping files checked in to a repository... so there's no problem of storing duplicate images there.)
Comment 1 by qyears...@chromium.org
, Sep 21 2017Components: Blink>Infra>Ecosystem
Owner: ----
Status: Available (was: Untriaged)
Summary: In wpt import, rebaseline-cl fails to optimize virtual test baselines (was: web platform test importer generates extra expectation files)