Switch all LayouTets to rely on image baseline first ignoring text render tree baselines if an image baseline is available, and using the text render tree baseline as a backup if no image baseline exists.
I don't think we actually want this as a long-term thing unless the plan is to get rid of layout trees altogether, right? Presumably there are tests that need to look at both images and some kind of text output?
There are some but for the vast majority the pixel differences is the important part. For tests *without* pixel output then clearly the text dump is important.
A related idea is that for some tests, the pixel differences may not be the important part, and for some of those those, the test could be improved by converting it to a reference test ( bug 704951 ).
BTW, is R-V-G necessary for this bug?
Given the title says "all LayoutTests" and the change above says that this was done for "all LayoutTests owned by layout-dev team", I think that this hasn't been completed. May be P3.
I just found that blink_tool.py rebaseline-cl rebaselined some *-expected.txt files for a CL (https://chromium-review.googlesource.com/c/chromium/src/+/1239418) that doesn't change layout at all, then I found that the old *-expected.txt files were actually invalid (mismatching actual results).
(Note that rebaseline-cl is already good at not creating text baseline for an image-first test without existing text baseline.)
I have some questions:
1. What's the exact definition of "image-first"?
- If both image and text baselines exist, should we check both or just check the image?
- If not checking text if the image baseline exists, why does the text baseline exist?
IMHO it's useful to let some tests under image-first directories still have both image and text baselines and get both checked, e.g. for the tests containing testRunner.dumpAsTextWithPixelResults().
2. Is "switching all LayoutTests to rely on image baselines first" still a goal?
IMHO the goal is valid if we allow both image and text baselines and check both if exist. The current status of allowing text baseline but not checking it seems not good, especially for all layout tests.
I hit extra invalid txt expectations again today, under fast/borders.
I would like to do the following, if no objections:
- Keep LayoutTests/ImageFirstTests because we do want both image and text expectations for some tests. Thus "switching all layout tests to rely on image baselines first" is a no-goal.
- For a test listed in ImaegFirstTest, disallow text baseline if the test has an image baseline. Write a lint rule to enforce it.
- Remove extra text baselines for all image-first tests.
While I am working on the plan in #c13, I feel that we could have other methods that are cleaner and easier to maintain than the current ImageFirst method.
Just thought of the following:
- Abandon the ImageFirst mechanism, and
- Just don't dump layout tree when running a layout test.
The implementation would be:
1. If a test calls any of the testRunner.dumpAsXXX() functions, it will run in the current way.
2. Otherwise,
2a. if content_shell runs the test with the '--pixel-test' option (which is mostly the case when conent_shell is called from run_web_tests.py), then content_shell will dump the pixel result, not the layout tree.
2b. if content_shell runs the test without the '--pixel-test' option, content_shell will dump the layout tree.
3. If some test does always want the layout tree, it can call testRunner.dumpAsLayoutTree().
What do you think? Any feedback will be highly appreciated.
Any chance we could get rid of --pixel-test and just have testRunner.dumpAsImage(), alongside (or replacing) testRunner.dumpAsTextWithPixelResults()?
Whether a test produces pixel results should be decided by the test itself, not by the way it is run. (The parsing logic for --pixel-test is super hacky as well.)
Re #c15: I think --pixel-test is useful for debugging. I often run content_shell --run-web-tests from command line or gdb. If a test always dumps pixels, the console will be full of garbage, and sometimes enter a weird mode if the pixel output happens to contain some particular escape sequences. However, I think we can make it a parameter of content_shell instead of the tricky "'--pixel-test" postfix in the test name, or let content_shell output binary data only if the standard output is not a console.
Any opinion about suppressing/ignoring text output of default pixel tests (i.e. the current ImageFirst feature or alternatives)?
Making --pixel-test a normal flag would be an improvement.
I didn't know about ImageFirstTests until I saw this bug. I'm in favor of reducing the number of "modes" and their complexity, so the proposal to remove ImageFirstTests and make pixel tests care only about pixels and not text by default SGTM.
Now I totally agree to remove --pixel-test from content_shell and --pixel-tests from run_web_tests.py. Not outputting binary data to console seems to work to avoid garbage in console when running content_shell from command line.
However, let's focus on text output of pixel tests in this bug :)
CC Ned: Emil mentioned this (having both image & text outputs while most of the time the text outputs, layout tree dump, aren't really interesting) as a pain point during the meeting earlier. Xianzhu is tackling the problem using the approach in c#14, which will drastically simplify the situation.
Thanks for doing this, Xianzhu!
After a bunch of CLs, I'm marking this as fixed.
A useful follow-up might be to report some unnecessary baselines (e.g. -expected.png for reftest, all-pass testharness baselines) as failures instead of just warning messages. To achieve that we need proper presentation of them in results.html. Not reporting them as failures for now because results.html will present them in a very confusing way (i.e. showing IMAGE failure without any difference between expected and actual results for -expected.png for reftests).
atotic@ I haven't thought of a good solution yet. We might need a new failure type to inform results.html about the extra baseline failures to present them properly, but I'm reluctant to do that. Would like to keep it as-is until we get a good solution.
Would the test be treated as an actual failure, or as a warning?
If failure, we'd need a new failure type.
If warning, we can add a boolean to the result (excess_baselines), and I can present it with a filter (Unexpected baselines). Excess baselines could be printed to stderr.
They are warnings only for now. I think it a good idea to add a boolean field or a field whose value is the full path to the extra baseline file.
Alternatively or additionally, we can write the message field of TestFailure (https://cs.chromium.org/chromium/src/third_party/blink/tools/blinkpy/web_tests/models/test_failures.py?rcl=bf516b2241f388dbde339c6e3a3ea452552b1523&l=107) into the json result and present it in results.html. This way might be the easiest to implement, and seems flexible enough.
The ability to filter the extra baseline failures separately looks not very necessary. We can already remove the extra baselines with 'run_web_tests.py --reset-results' and 'blink_tool.py rebaseline-cl'.
Comment 1 by dpranke@chromium.org
, Mar 22 2017