scroll cnn.com while loading is really choppy |
|||||||||||
Issue descriptionLoad cnn.com in Chrome on N6P. While it is loading, try to scroll the page. Expected: smooth scrolling Observed: scroll stalls frequently I grabbed a trace, see attached, and you can see one touch start was delayed for over 1s. In another time, I saw multiple delay over 300ms. Note: it seems the site has a top level touch handler, which prevents us to go directly to the fast path scrolling. But I don't see stall on iPhone. Wondering why. The trace was from Chrome stable. I see the similar stall in Chrome Dev. ⛆ |
|
|
,
Mar 29 2016
I see a touchStart handler on my Moto X (although not for a spoofed "Nexus 6P" in desktop devtools, presumably it uses some other detection mechanism than useragent to decide to add it). I'd have to set up Safari devtools to figure out whether it serves a touch handler to iPhone. The new "passive event listener" feature is now enabled in ToT and should be widely available in M51 (see http://crbug.com/489802 ). Dave, is there a good way to hack cnn.com to force it into a passive event listener mode and observe the difference? Maybe we could make a before-and-after trace and video for CNN, and put in on a public blog post while announcing this feature to web developers.
,
Mar 29 2016
Another site with document-wide touch event handlers is mobile.nytimes.com. That one is even worse because it also has a touchMove handler, not just touchStart. I think nytimes added it a few months ago, it's been bothering me recently in my personal use.
,
Mar 29 2016
Alex, if you managed to set up Safari devtools, can you check whether they have the long event handling like what we saw here?
,
Mar 29 2016
If we are seeing site behavior differences between iOS and Android, we should do our best to determine why the developers wanted different behavior between the two OS's. Dave, if you need someone to dig into the Safari devtools, I can take a look.
,
Mar 29 2016
The trace was too big to open so I truncated it to the first 50%. Looks like the memory-infra category was enabled, which adds a lot of overhead to tracing, so we shouldn't draw too many conclusions from it. The thing that stood out from the truncated part of the trace was that ResourceDispatcher::OnRequestComplete is invoking multiple onload-type callbacks. If we split them into individual tasks we might be able to process input sooner. However on this page each callback takes 300+ ms.
,
Mar 29 2016
I took another trace on a 6P without memory dumps and saw one jank. In that case the input was stuck behind a long running XHR handler plus an expensive BeginMainFrame (which was scheduled before the input arrived and currently cannot be reordered).
,
Mar 29 2016
I examined cnn.com and mobile.nytimes.com on my iPod Touch in Safari Inspector (using "getEventListeners(document)" in the Console), and the same event handlers are served to them as to us. But it's indeed difficult to repro scroll jank while the page is loading. Perhaps Safari recently started applying some kind of new touch listener intervention? I'm thinking now that passive event listeners could also be deployed as an intervention. We could change document-element touch listeners to passive by default, and make developers explicitly specify "{ passive: false }" on them if they truly want that. That would give short-term performance win and create a "pit of success" for the longer term. By restricting to the document element, we could hopefully keep the breakage to a manageable level that we can outreach our way out of in one or two beta cycles. Rick, Tim, Dave; what do you think?
,
Mar 29 2016
Alex, did you have a chance to look at how things are scheduled? I wonder if we could somehow look at the timing of things in Safari vs. Chrome, we could find some interesting insight.
,
Mar 29 2016
Not yet, I'll take a further look. Reassigning to myself to track the investigation.
,
Mar 30 2016
Re: #9, I took a look at the Safari timeline. There is fairly rich information on touch, scrolling, and timers in "Timeline" -> "Javascript and Events". I didn't learn anything useful from it myself, but maybe Sami would see clues there. (Safari doesn't support saving recordings, so he'd have to take one locally.) I also tried some other experiments today. The first is a synthetic JS example with a 2-second busy wait on each touchMove: http://jsfiddle.net/vk421b7u/23/. In that jsfiddle, our 200ms touch timeout intervention is doing its job in Chrome and it allows me to scroll with only a modest jank. On the other hand, I can see that iOS Safari is *not* shipping such a timeout intervention currently, as that jsfiddle causes Safari to entirely hang. So whatever helps Safari on cnn/nytimes, it's not the timeout. (On the other hand, I find it surprising and curious that our touch timeout does not help with these real-world sites, where we observe >1sec janks. Is there some unexpected interaction causing our timeout intervention not to get the win we expect?) The second thing I tried is a Chrome build hacked to make all touch events handlers passive by always returning TouchEventPassive in EventHandlerRegistry::eventTypeToClass(). This build is visibly very responsive during page load, arguably even better than iOS Safari. This proves that the problem indeed has to do with the touch event handlers in the first place, and secondly that the intervention I proposed in #8 should deliver the win. (Since it seems our existing timeout intervention is *not* delivering, perhaps we could replace one intervention with the other?)
,
Mar 30 2016
I suspect that the intervention you're proposing would break too much content. Lan is doing some research here on how we can make a subset of event listeners passive. We're currently testing a couple heuristics offline on some deep reports using tracemapper. Ideally there will be some heuristic that gets rid of most of the scroll blocking time without causing double handling issues too frequently.
,
Mar 31 2016
We're definitely planning an intervention here, but it's tricky and so we must be careful. I've filed issue 599609 to track our current thinking (more details coming soon). In the interim, is there a particular library where the document-wide touch handlers are coming from on CNN? Something we can advocate for opting into passive touch listeners ( issue 489802 - https://github.com/WICG/EventListenerOptions/blob/gh-pages/explainer.md)? We're not quite ready to start that outreach in force but expect to be very soon.
,
Mar 31 2016
Note that our touch timeout is currently 1000ms on mobile optimized sites (200ms broke too many sites). We'll eliminate the timeout completely along with issue 599609 (it's really an intervention that has the worst of both worlds - developers can't reason about it, it breaks stuff, AND you still get janks). So the jank here in Chrome is due (at least in part) to a sync XHR? Maybe mobile safari doesn't support sync XHR or something? Or maybe they're somehow just blocking for a lot less time than we are?
,
Mar 31 2016
+1 to rbyers keeping the eye on the core of the problem. Before we brainstorm solutions, let's figure out where the jank is coming from.
,
Apr 2 2016
Hook up web inspector with Mobile Safari, within 5 seconds, there are over 6 times JS event taking over 100ms, including 2 over 500ms. And I do see the stall when touch scroll just as Chrome on Android. So their JS is not that fast. Also they seem have UI fast path unless inspector is enabled.
,
Apr 2 2016
Right, I also noticed enabling Inspector slows down iOS Safari a lot. But I don't know what the nature of a "fast path" disabled by Inspector might be. I observed they fully block on touch events in my jsfiddle example, even with Inspector disabled, so it's not that. My assumption was that the cost of log dumping causes Safari to slow down with Inspector enabled.
,
Apr 2 2016
But 500ms JS events probably still take a long time even without logging. As a user, I do not observe the stall when inspector is not recording, just like when we treat all touch events as passive on your local build.
,
Apr 2 2016
Two ideas: 1) Perhaps Safari can't use the full FTL jit when debugging/symbols are enabled? I couldn't find any discussion of perf differences with devtools open (eg. here https://webkit.org/blog/3362/introducing-the-webkit-ftl-jit/). But this should be possible to test by instrumenting the CNN code (break on load, inject some script which does measurements - eg. calculates time between requestAnimationFrame callbacks) and then close the inspector. 2) There are some known race conditions in Safari's touch ACK handling (see https://bugs.webkit.org/show_bug.cgi?id=144688), so it's possible that Safari is not blocking on the touch handler in some cases. I explore their behavior in some detail at one point and couldn't make any sense of it. Jank seemed to increase the chance of scrolling not blocking, but it was very inconsistent. Talking to Ben it sounds like there is just some subtle race in how the messages are handled which is more likely when the thread is busy. I wasn't able to get any more details than that. Regardless, the Safari team sees passive event listeners ( issue 489802 ) as the solution to this sort of problem, saying that touch listeners continue to be a problem for scroll performance for them.
,
Apr 2 2016
I had the theory that maybe the jsfiddle environment changed something, so I uploaded my artificial example to an independent page with a meta viewport tag at http://bit.ly/1X71nZN . But it didn't change anything to iOS Safari, which still blocks for the full 2s. On Chrome, it changes our behavior to block for 1s, as expected from comment #14.
,
Apr 2 2016
I've been able to reproduce some apparently "timeout" behavior using this test app: http://rbyers.github.io/touch-timeout.html. In particular by adding jank to the thread (but not directly inside the event handlers), but it's very unreliable with that that persists across reload but not across tab close and re-open. Seems consistent with some race condition bugs as Ben indicated. If it is something like this, it should also be fairly easy to test - just inject a touchstart handle that always cancels the events into cnn on load. If scrolling is ever possible, it indicates scrolling didn't wait for the handler to return.
,
Apr 3 2016
Here's a handy folder to share traces and collaborate: https://drive.google.com/open?id=0B0w8IQeoE99FZ1JZOENDcnV1MDA
,
Apr 3 2016
Before we start looking at passive event listeners as the hammer to all the things that look like nails, let's study the traces a bit more. Here'a quick annotated trace I put together: https://docs.google.com/a/google.com/presentation/d/1wL5eH6p3RKli7T4W8YGNLyyaqWBQ3rs8Z5qhpiZ61lY/edit?usp=sharing There are two clear bugs in this trace: 1) The network request completions tasks are processed while the finger is on glass. I think we should investigate just not doing those. They are racy already, and it should be totally web-compatible to just delay processing those tasks until after the user lifts the finger. 2) There's a bunch of freaky OnResize handlers that just don't make sense in Android. You can see them in the annotated trace, but really, they're all over. We need to weed those out. I guess what I am trying to advocate is let's take a look at how the whole stack behaves, rather than hacking up a quick/dirty fix.
,
Apr 3 2016
,
Apr 3 2016
,
Apr 4 2016
Thanks for the trace Dimitri. Agreed that it might help to look at the whole picture here. About the two things you pointed out:
1. The XHR handler runs while there is a compositor driven fling animation in progress, so it shouldn't be causing user-visible jank. One way to tell is that the cyan scroll_offset_y graph increases smoothly during this period. However the handler does cause a 300ms delay to stopping the fling and starting another one, which can definitely feel bad.
The scheduler has a policy to block expensive tasks like these, but it's currently only behind a 10% experiment. It can be forced on with --enable-features=SchedulerExpensiveTaskBlocking ("TaskBlocked" trace events will show where it's kicking in).
I took a quick trace on 6P with the policy enabled and the scheduler prevented 12 expensive tasks from messing things up. (Attaching trace here since I don't have write permission to the drive folder.)
2. I'm guessing the resizes are related to omnibox hiding. Maybe we need to do some throttling or better scheduling for them too.
,
Apr 4 2016
\o/
,
Apr 4 2016
,
Apr 6 2016
I spent a bit of time today comparing Chrome Android to Safari iOS behavior on this site. A couple observations (not sure anything here is really critical though): 1. Safari appears to be delaying the first paint longer than we are (when devtools is not connected), so by the time you can scroll much more of the load has occurred. Reload looks really nice actually - none of the flashing we see when reloading in Chrome. 2. Safari seems to be preloading from the URL bar. Eg. if I type in cnn.com and wait before navigating, navigation pops the page into view fully loaded. Chrome doesn't (I thought we used to do this?). This can make it easy to think the load in Safari is faster. 3. With devtools connected I can see there's still a fair amount of long-running script (2 tasks >500ms). 4. Safari has quite a few errors about blocking cross script communication but Chrome doesn't. Are our CORS policies different? Is it possible that a lot of ads/tracker work just isn't running in Safari as a result but is running in Chrome? I'm out of time for now, but I can dig into some of these areas some more if people thing any of them are promising.
,
Apr 6 2016
Thanks for the investigation. #1, Chrome is focusing on first paint speed. So this difference may be intentional. #2, see https://bugs.chromium.org/p/chromium/issues/detail?id=416684 #3, can we figure out how they avoid the touch pause during these long JS?
,
Apr 6 2016
#3: Maybe they're not. In dev tools I only saw a couple instances of long running JS (>2 500ms), and I do feel the occasional scroll jank that seems consistent with that. But when I repro without devtools attached I don't see the jank. I thought maybe there was some side effect of dev tools slowing things down so I build this little JS-based page-load responsiveness profiler: http://rbyers.github.io/pageload.html#cnn.com The crazy thing is that under this, things are TERRIBLE - many many >2s frames, scrolling feels absolutely awful etc. Maybe they're just down-prioritizing iframes, but testing a few other sites I don't see obvious issues. cnn.com seems to have some problems being hosted in an iframe, so maybe this method is invalid for it. Also note that on my older iPad, cnn.com feels pretty terrible too (even outside devtools). Very weird. The only other idea I have is injected some scripts via a http proxy. That's more work...
,
Apr 12 2016
,
Apr 12 2016
Video of CNN on a nexus 5 with all the touch listeners forced to be passive: https://www.youtube.com/watch?v=NPM6172J22g
,
Apr 12 2016
Wow, this is pretty dramatic. I am sold.
,
Apr 12 2016
+1 That looks significantly better. The real challenge will be making sure we don't break things that need the touch listeners to function correctly.
,
Apr 14 2016
Re: comment 31, it looks like the iframe doesn't set the width correctly, causing the iframe test to look at what is effectively the desktop site.
,
Apr 18 2016
More investigation here: https://docs.google.com/document/d/1P_b0ZXm4SLcsjMt8F96w8TKywAbUddEhsPAT_6s_VaA/edit Mobile Firefox feels similarly choppy, I'll test mobile Edge when I have access to an appropriate phone.
,
Apr 19 2016
Just to give an update here, I've experimented with the Blink scheduler's expensive task blocking policy on this page and after fixing some use case detection issues it was able to remove all janks in this page (expect for a possible first one when an unpredicted gesture comes in). We'll land those fixes against bug 600202.
,
Apr 19 2016
YUUUGE!!
,
Apr 19 2016
Forgot to mention that we agreed with Tim that we still most likely want both interventions (passive touch listeners and task blocking) because there are cases that the latter fundamentally can't improve (e.g., touch handlers themselves being expensive). Once we have data from both we can make a more informed choice about when to deploy the interventions.
,
Apr 21 2016
Edge on a Lumia 950XL doesn't display anything for a good couple seconds after we've rendered everything onscreen for a Nexus 5X. Once it actually gets around to display content, it feels pretty smooth though.
,
Apr 22 2016
Yes, another part of this is that we might be overemphasizing the time to first paint metric and Edge's behavior to hold showing anything until it's closer to ready has its virtues. I remember seeing a user perception comparison between Chrome and Edge on Windows that Edge was perceived as faster than Chrome, even though it was strictly slower both to first paint and final load, purely because Edge just transitioned from nothing to full load without spamming the screen with visual noise in the meantime. I can't find it now though.
,
Apr 22 2016
Alex, see this (internal) study on that behavior: https://docs.google.com/document/d/1LqmNbU_m_94_wEjp-su0bWUNJYeiANkHjqPk9QvKc2w/edit#heading=h.pqcfmdyxb2ri
,
Apr 29 2016
I've updated the doc with some new findings. It looks like Safari has an intervention which makes repeated fling gestures not block, similar to what we're discussing here: https://bugs.chromium.org/p/chromium/issues/detail?id=595327 A description and sample page can be found here: https://docs.google.com/document/d/1P_b0ZXm4SLcsjMt8F96w8TKywAbUddEhsPAT_6s_VaA/edit#bookmark=id.o9hmksngf2ag
,
Jun 9 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5104c0de46b3193240e75394897efb4320e9a41c commit 5104c0de46b3193240e75394897efb4320e9a41c Author: skyostil <skyostil@chromium.org> Date: Thu Jun 09 17:11:17 2016 scheduler: Enable expensive task blocking by default BUG= 598248 , 574343 ,581039 Review-Url: https://codereview.chromium.org/2025863002 Cr-Commit-Position: refs/heads/master@{#398924} [modify] https://crrev.com/5104c0de46b3193240e75394897efb4320e9a41c/components/scheduler/renderer/renderer_scheduler.cc
,
Jun 15 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5104c0de46b3193240e75394897efb4320e9a41c commit 5104c0de46b3193240e75394897efb4320e9a41c Author: skyostil <skyostil@chromium.org> Date: Thu Jun 09 17:11:17 2016 scheduler: Enable expensive task blocking by default BUG= 598248 , 574343 ,581039 Review-Url: https://codereview.chromium.org/2025863002 Cr-Commit-Position: refs/heads/master@{#398924} [modify] https://crrev.com/5104c0de46b3193240e75394897efb4320e9a41c/components/scheduler/renderer/renderer_scheduler.cc
,
Jan 24 2017
I can't repro any significant responsiveness issue on cnn.com while loading anymore on my Galaxy S7/M56 (either the passive listeners helped, or a combination of things), so closing.
,
Jan 25 2017
Actually closing :) |
||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by klo...@chromium.org
, Mar 28 2016