Enable threaded compositing by default in Web Platform Tests |
|||||||||||||||||
Issue descriptionRight now, layout tests run without threaded compositing. This is one of the main ways in which layout tests differ from Chrome, and it's a source of confusion about why layout tests behave differently from opening a page directly in Chrome. There will always be some tests that should be run without threaded compositing, but it would be better if they had to opt out of threaded compositing, instead of threaded compositing being opt-in.
,
Oct 5 2017
I recall enne had some thoughts about this
,
Oct 5 2017
Sorry, I get asked this a lot, and so have a strong and extreme opinionated position. I think we should work towards making no tests use threaded compositing. I think it's a bug that threaded compositing causes different behavior in Blink. We should add/support "compositor thread" animations even when the compositor is single threaded. https://groups.google.com/a/chromium.org/d/msg/blink-dev/ZW2wDejtRdo/bNIfUiCNDwAJ
,
Oct 6 2017
"I think it's a bug that threaded compositing causes different behavior in Blink." I'm not completely convinced this is a bug, but let's suppose for a moment that it is. How do you propose fixing this bug? Do you think supporting "compositor thread" animations in the renderer main thread will address this completely? My intuition is that we'd still see significant differences in behavior, especially around the kind of complex timing issues that make layout tests tricky to write correctly (we see a lot of this in Layout Tests which synthesize input). Unless this bug is 100% fixed, it's better to test what we ship than some approximation of it. I'm not convinced it's worth the effort to fix this 100%.
,
Oct 6 2017
Sorry, missed the link to the blink-dev discussion. Thanks for the context. To be clear, I don't think we should try to migrate all existing tests. I would support having an approach better than virtual test suites for switching individual directories to use the threaded compositor. At some point, I think we should change the default for new directories, and over time, migrate additional things to use the threaded compositor. Here's another justification for doing this work. - Web Platform Tests must pass in the browser. - To ensure Web Platform Tests pass in the browser (and aren't flaky etc), we need to run Web Platform Tests with the threaded compositor. - (Controversial Point) Layout Tests and Web Platform Tests should be as similar as possible. I believe Layout Tests and Web Platform Tests should be as similar as possible for a few reasons: - Converting a Layout Test into a Web Platform Test should be trivial. This is common when we're iterating on something within Chromium behind a flag using layout tests, and then spec and ship the feature. - Most engineers writing Layout Tests should also be writing Web Platform Tests, and having different behavior is confusing.
,
Oct 6 2017
Why do web platform tests need a threaded compositor? Re: timing issues. My experience is that the threaded compositor introduces far more timing issues than the carefully controlled commits from the single threaded mode that most things are in. I'm not sure I understand why input changes anything here. Do you have some test examples that would help me understand?
,
Oct 11 2017
Web platform tests eventually (in theory) should pass in all browsers (Edge, Firefox, Chrome), and be runnable by end users, who will be using Chrome proper. Details here: https://github.com/w3c/web-platform-tests/blob/master/README.md. You're right that the threaded compositor introduces a bunch of timing issues that aren't present without it. The concern is that these timing "issues" occur in the real world, and not using the threaded compositor may paper over real issues. Here's an example of a test which follow a common pattern for dealing with this kind of timing issue. https://cs.chromium.org/chromium/src/third_party/WebKit/LayoutTests/fast/events/wheel/mouse-wheel-scroll-latching.html We synthesize some input, and then want to wait until we're done responding to the input. This is hard with the threaded compositor, but (I think) not too hard without it. However, scrolling is implemented completely differently on the threaded compositor, so just testing main thread scrolling would leave us with huge gaps in test coverage. Many of these tests just wait until we reach the desired end state, so all failure results in the test timing out.
,
Oct 11 2017
Re: input. I think what I said in that link above still applies. I'm not sure that layout tests are the best place to test input, especially compositor thread scrolling. Input doesn't get delivered to Blink first, so it sure looks a bit unreal to my eyes already. Compositor thread scrolling seems more like something that could be handled at the sim test level and less like something that you'd test cross-browser at the wpt level. Also, the single threaded compositor doesn't support scrolling on the compositor thread, but there's no reason it can't, especially if that makes testing easier. Re: timing issues. I think there's only so many orderings events can happen in. Being able to control this order lets you test all the permutations that you care about. Having the test flakily fall into a particular ordering isn't a robust testing strategy. Re: wpt. Those are reasonable arguments. I'd say any test should probably pass in both modes, but I could understand wanting to run only one configuration.
,
Oct 11 2017
FWIW it's the WPT case that I care most about. I'd restate #7 even more strongly: WPT is primarily measured via a system that can run only real Chrome (https://wpt.fyi) and is required to be replicatable by anyone using real Chrome (ideally without any special command-line flags). So to prevent regression it's essential that we're testing in (at least) the same configuration as that in our blink infra (though content_shell instead of chrome might be OK if we don't see much difference in practice). Note that the ecosystem infrastructure team (foolip/robertma) has a long-term goal of eventually deprecating LayoutTests. That is, all blink tests are either unit tests or interop tests (end-to-end tests).
,
Oct 12 2017
Re #8: when we're synthesizing input, we do (in many cases) send it through the browser process, so the input pipeline is pretty close to the real pipeline. I agree that in many cases, these input driven tests should be SIM tests. However, moving forward, we are going to want web platform tests which test scrolling. The input approach is still being sorted out here, but long term we will have some approach for synthesizing input at the browser level in web platform tests. These web platform tests should run with the threaded compositor. Perhaps this bug should instead be to enable threaded compositing by default in web platform tests?
,
Oct 25 2017
I've re-titled the bug to focus only on Web Platform Tests. enne@, does this seem reasonable to you?
,
Oct 25 2017
I continue to think that the real bug here is still that scrolling and animations are not supported in single threaded mode and so that there is a functional difference between these modes. I can understand pragmatically wanting to have wpt tests run with the threaded compositor given that.
,
Oct 26 2017
Rick, is this something it would make sense to have predictability folks own?
,
Oct 26 2017
Perhaps. This is an example of something that could test results to be different on wpt.fyi (using real chrome) versus chromium CI. I'm not sure we know how bad of a problem that is today in practice. Robert / Philip, any thoughts?
,
Nov 21 2017
Surprisingly, I don't yet know of any concrete cases where a test fails in Chrome but passes in content_shell or vice versa for any "interesting" reason, are there such differences here? FWIW, I agree with comment #5 that "Layout Tests and Web Platform Tests should be as similar as possible " as in LayoutTests and LayoutTests/external/wpt/. So the original title "Enable threaded compositing by default in Layout Tests" made more sense to me. Given that we're running a real Chrome on wpt.fyi, what are the bad things that might happen here if we do nothing on this bug?
,
Nov 22 2017
When we run into issues, we force this on for specific folders. https://cs.chromium.org/search/?q=file:virtualtestsuites+enable-threaded-compositing&sq=package:chromium&type=cs I suspect that we'll start seeing more issues with coverage once we start checking in web platform tests that deal with input. I anticipate cases where we write tests which are non-flaky in Chrome without threaded compositing, but that are flaky with threaded compositing, resulting in flake on wpt.fyi.
,
Dec 15 2017
tdresser@, I'm not sure how to prioritize this, marking as available and P2 so that it doesn't just fall off the radar, but not assigning anyone. What would happen if we just invert the default as you suggest, should we expect the runs to be slower? Flakier? Or faster and more robust? If it's the latter then I think you can just try doing it.
,
Dec 18 2017
I'll do a test run and see!
,
Jan 8 2018
I haven't done much investigation into flakiness yet, but there are some consistently reproducible changes. A few flaky timeouts seem a bit more frequent with compositing than without, but its hard to say without a more rigorous investigation. Consistently time out with compositing: * external/wpt/fetch/api/redirect/redirect-location.html failed unexpectedly (test timed out) * external/wpt/pointerevents/pointerevent_setpointercapture_relatedtarget-manual.html failed unexpectedly (test timed out) * external/wpt/pointerevents/extension/pointerevent_coalesced_events_attributes-manual.html failed unexpectedly (test timed out) * external/wpt/pointerevents/pointerlock/pointerevent_pointerlock_supercedes_capture-manual.html failed unexpectedly (test timed out) * external/wpt/pointerevents/pointerlock/pointerevent_pointerlock_after_pointercapture-manual.html failed unexpectedly (test timed out) * external/wpt/pointerevents/pointerevent_capture_mouse-manual.html failed unexpectedly (test timed out) Fail with compositing: * external/wpt/preload/modulepreload.html failed unexpectedly (asserts failed) * external/wpt/requestidlecallback/callback-removed-frame.html (asserts failed) * external/wpt/web-animations...el/timelines/timelines.html failed unexpectedly (asserts failed) Pass with compositing, fail without: * external/wpt/requestidlecallback/basic.html * external/wpt/requestidlecallback/callback-exception.html * external/wpt/requestidlecallback/callback-idle-periods.html * external/wpt/requestidlecallback/callback-iframe.html * external/wpt/requestidlecallback/callback-invoked.html * external/wpt/requestidlecallback/callback-timeout-when-busy.html * external/wpt/requestidlecallback/callback-timeout.html * external/wpt/requestidlecallback/callback-xhr-sync.html * external/wpt/requestidlecallback/cancel-invoked.html * external/wpt/requestidlecallback/idlharness.html I think this is evidence that there's a big enough difference between these two modes that our test coverage suffers from testing without threaded compositing. It might would be worth fixing the tests which fail when compositing is enabled, and then (for a short period of time to get flakiness data) run a virtual test suite for all (or a subset) of wpt with threaded compositing enabled. We should at least fix the tests, as they'll fail in Chrome proper. If it makes sense to others, I'll file bugs for the tests which are failing with threaded compositing enabled, and block this bug on them.
,
Jan 8 2018
Regarding the failures, a lot of them are manual input tests, automated by our in-house automation scripts (in external/wpt_automation), which use Chrome internal testing APIs to synthesize inputs (similar to those in fast/events).
,
Jan 10 2018
Parking on myself for now to drive investigation. +nzolghadr@ regarding pointer event tests.
,
Feb 1 2018
Navid, any chance you'll be able to dig into the pointer event test failures? Or perhaps Lan?
,
Feb 1 2018
Lan's not coming back for another quarter. I'll take care of this. Do we have a separate bug for the pointerevent tests? I didn't seem to find any options to turn this on for the layout tests. Do you happen to have the flag handy?
,
Feb 1 2018
There's a flag to pass additional flags to the test runner. Something like --additional-driver-flags=--enable-threaded-compositing is what you want. There's currently no separate bug for the pointer event tests - feel free to file one.
,
Apr 3 2018
Just checking for an update since this is marked P2.
,
Apr 11 2018
,
May 9 2018
+flackr@ We're trying to figure out what the priority of this is. It's hard to know what teams care most about consistency in web platform tests with respect to compositing. Would animations potentially be able to contribute a bit here?
,
May 16 2018
Update from pointer events thread: "I ran all pointerevent tests listed in this bug one by one with --repeat-each=10 and --additional-driver-flags=enable-threaded-compositing and for all of them the first one failed and the rest passed. To be accurate we have changed some of the stuff since this bug was filed for the pointerevents. Those could have fixed this issue. So I don't think at this point there is any specific issues with pointerevents and threaded compositing." Sounds like there's a clear next step here, based on some sort of infrastructure issue. I don't have bandwidth here, and it isn't super relevant to my team's work at this point. flackr@ or foolip@, and idea on who we could get to dig in here? Leaving assigned to me for now, as I don't want this dropped.
,
Jun 1 2018
tdresser@, do you have a CL where you tried to add the flag? I'd like to run it for all layout tests and see if there are regressions, as a first step.
,
Jun 1 2018
I just ran locally with the flag on. The flag is here: https://cs.chromium.org/chromium/src/content/public/common/content_switches.cc?rcl=5275ba4e1b4ff957327d7e4fd01391cfc32c9ddc&l=437 I had a hard time figuring out how to enable the flag for all layout tests.
,
Jun 1 2018
,
Jun 1 2018
Great, thanks!
,
Jun 1 2018
There are rather a lot of regressions when enabling that flag, so if we're going to do it someone would have to look through that and rebaseline, or fix any bugs exposed. On whether we should enable this, is it a fair assessment that it will increase flakiness, as enne@ suspects? Are there are other uses of non-threaded compositing, or would that code path simply be deleted?
,
Jun 4 2018
If it increases flakiness, that implies that (for web platform tests at least) these tests are already flaky when run in Chrome proper, which feels like a bug we should fix. Leaving this for layouttests overall seems non-ideal, but acceptable to me. Regarding getting rid of the code path completely, I'm not certain. flackr@, any thoughts?
,
Jun 7 2018
Yes, if tests are flaky in Chrome then that'll show up in wpt.fyi, and we don't want that. But is there any value in testing both modes?
,
Jun 7 2018
Rob: Ping re #34. There's no value in testing both modes, but disabling threaded compositing might be useful for debugging or similar.
,
Jun 7 2018
I don't think we'd need to keep the non threaded code path. It's not well maintained, and I don't believe it has other uses. Debugging multi-threaded applications (especially in a single process) is common practice already; I believe non-threaded compositing generally hasn't worked reliably (ironically despite layout testing) anyways.
,
Jun 11 2018
The technical work here, then, would be to take https://chromium-review.googlesource.com/c/chromium/src/+/1082312 and inspect all the failures, adding new baselines using third_party/blink/tools/blink_tool.py rebaseline-cl. Notably, there are new timeouts and failures for testharness.js tests that could plausibly be related, so it requires attention and care.
,
Jun 11 2018
Based on https://bugs.chromium.org/p/chromium/issues/detail?id=770028#c28, it sounds like there are some failures that are infrastructure related.
,
Aug 8
,
Aug 8
I don't have bandwidth for this, but it sounds like Ned might?
,
Aug 17
,
Sep 11
,
Sep 11
,
Sep 11
Any other blocker for this issue? When I was looking at the tests that were failing issue 831653 there was one issue that I guess only first test was failing and the rest of tests were passing. Even the same test would pass in the subsequent runs. So something wasn't done initializing the first time or something like that.
,
Sep 13
,
Oct 10
,
Oct 10
,
Dec 5
Over to Philip for triage. |
|||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||
Comment 1 by foolip@chromium.org
, Sep 29 2017