New issue
Advanced search Search tips

Issue 849108 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 1
Type: Bug-Regression

Blocking:
issue 463348
issue 812893



Sign in to add a comment

UseModernMediaControls makes Wikipedia's first paint slower on Chrome 67

Reported by phedens...@wikimedia.org, Jun 3

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36

Steps to reproduce the problem:
Wikipedia seems to render a little but slower on 67 than on 66, I've seen this on multiple synthetic environments and waiting on getting more users on 67 to also see it RUM metrics.

1. Access for example https://en.wikipedia.org/wiki/Barack_Obama (I can see this on many pages) with Chrome 67 
2. Do the same with Chrome 66 and collect visual metrics and compare them.

In our lab environment we can see that First Visual Change, Speed Index and first paint (collected directly from Chrome) is affected. I'm pretty sure this is a Chrome issue, because I can switch between just the browser version and see the metrics change. I've seen this now both using Browsertime, sitespeed.io and WebPageTest. Since I can see it in different tools, changing the browser back and forth, I'm pretty sure the change is isolated to the Chrome version.

I've documented the difference at Github https://github.com/sitespeedio/sitespeed.io/issues/2069 and on https://phabricator.wikimedia.org/T196242

For WebPageTest we can see the diff is 100 ms, in our isolated environment (using Chrome and WebPageReplay) the diff is 33 ms on most pages (that's the lowest granularity we have). Checking the differences just using first paint (collected from Chrome) the diff is ca 50 ms. 

I also run tests on a couple of other sites but I could only see Wikipedia affected. But maybe something changed so this is expected, let me know if that is the case!

What is the expected behavior?
The render time shouldn't differ between browser versions.

What went wrong?
The latest Chrome 67 renders a little bit slower than Chrome 66.

Did this work before? N/A 

Does this work in other browsers? Yes

Chrome version: 66.0.3359.139  Channel: n/a
OS Version: OS X 10.13.4
Flash Version: 

At the moment we have so little traffic from users on Chrome 67 so we cannot see a diff in the RUM data, you can look yourself:
https://grafana.wikimedia.org/dashboard/db/navigation-timing-by-browser?refresh=5m&orgId=1&var-metric=firstPaint&var-browserFamily=Chrome&var-browserVersion=66&var-browserVersion=67&var-m=median
 
All the tests where we see the regression are using Docker (Ubuntu based) or Ubuntu directly. I haven't tested on other OS.
We also keep raw logs from Chrome. This is when we test https://en.wikipedia.org/wiki/Facebook with 67:
http://webpagereplay-wikimedia.s3-website-us-east-1.amazonaws.com/?prefix=enwiki/desktop/chrome/100/Facebook/2018-06-03-15-41/

And then with 66:
http://webpagereplay-wikimedia.s3-website-us-east-1.amazonaws.com/?prefix=enwiki/desktop/chrome/100/Facebook/2018-06-03-14-24/

When I look through the only big difference I could see is that in 67 time spent in UpdateLayoutTree is larger. It seems to spend 100 ms more there in 67 than on 66. But I could have missed something, it's quite much raw data :)
Labels: Needs-Triage-M66
I've look into it some more and I don't see the same pattern on all URLs (some are a little faster too). There's a summary here: https://phabricator.wikimedia.org/T196242#4252705

The change I could see in the events is that more time is spent for all URLs on UploadLayerTree (100 ms max) and a little more on BlinkGCMarking (30 ms). Else everything looks the same.
Cc: chrishtr@chromium.org
Components: -Blink>Paint Blink>Compositing
Owner: wangxianzhu@chromium.org
Status: Assigned (was: Unconfirmed)
This seems likely due to the Slimming Paint V1.75 change. Do we have any insight on whether this is a known regression offset by other progression?
Labels: -OS-Mac OS-Linux
Thanks phedenskog@wikimedia.org for the report.

Just a question about the summary in https://phabricator.wikimedia.org/T196242#4252705: In the last 3 graphs, what do the three numbers (e.g. "UpdateLayoutTree  192ms 296ms 291ms" in the last graph) in each row mean? Are they CPU duration numbers before and after the time?

I looked into the traces, and put the numbers into a spreadsheet (https://docs.google.com/spreadsheets/d/1GmLaVp6-bn_YP4Bxf5G69dZDq_vV6sVPzoeH9WRUDMo/edit?usp=sharing), summarized as below:

			66	67	
Layout			362	470	+29.8%
UpdateLayoutTree	278	382	+37.3%
UpdateLayerTree		138	103	-25.7%

Based on the result, M67 actually has progression for UpdateLayerTree (which is expected because SPv175 reduces PrePaint work). The increase of Layout and UpdateLayoutTree is unexpected.

I also noticed that we seem to schedule animation and layout differently. On M67 we schedule more animation and layout. See the attached screenshots. I suspect the changes of Layout and UpdateLayoutTree are caused by some reason other than SPv175. I'm looking into whether we can bisect this.

m66.png
31.9 KB View Download
m67.png
34.7 KB View Download
Ah sorry thsi is a little bit unclear:

> In the last 3 graphs, what do the three numbers (e.g. "UpdateLayoutTree  192ms 296ms 291ms" in the last graph) in each row mean? Are they CPU duration numbers before and after the time?

Yep the blue vertical line is when I did the change to 67 (from 66). We run Chrome with -*,devtools.timeline and we summarize time spent per event type.

Can I interpret the following line
  UpdateLayoutTree  192ms 296ms 291ms
as:
 192ms for 66
 296ms when changed to 67
 291ms for 67
?

The numbers of UpdateLayerTree look at the same scale as my statistics (see #c7), while UpdateLayerTree looks different. I see progression of UpdateLayerTree. Are we using the different sets of traces? My method is: open a trace, select all the captured events, and look at the summary of slices. I did this for all of the 11x2 traces for m66 and m67 in #c2.
Cc: wangxianzhu@chromium.org
Components: -Blink>Compositing Blink>Scheduling Blink>Layout
Labels: -Needs-Triage-M66 RegressedIn-67 FoundIn-67
Owner: szager@chromium.org
Now I believe this is a scheduling issue. The increased Layout and UpdateLayoutTree times are both because of an extra event happens after an Animation, respectively.

(ULT = UpdateLayoutTree)
M66:
  ULT-138ms  Layout-273ms
   (~same)

M67 (new events marked '(+)'):
             Animation(+)   Animation(+)
  ULT-129ms  Layout-288ms   Layout-57ms(+)
   (~same)   ULT-77ms(+)

Based on the above behaviors, I bisected to
You are probably looking for a change made after 542547 (known good), but no later than 542562 (first known bad).
CHANGELOG URL:
  https://chromium.googlesource.com/chromium/src/+log/c06159611876d8b4bb1ea997e5647f8ba888ea60..89f128f6486e5efe7e8f44ebf8cddcdfad627cd3

My test method during bisect is:
1. Open https://en.wikipedia.org/wiki/Facebook (listed in the bisect.py command line);
2. Open DevTools
3. Switch to Performance panel, and click Record
4. Reload the page and wait for completion
5. Stop recording in DevTools
6. Expand "Interactions" and "Animation" below "Interactions"
7. Observe if there are animation events (purple) the blue dashed vertical line

Suepecting https://chromium.googlesource.com/chromium/src/+/f8bf3f03dd8f4b81cb36d21c8af07844de4e9ecb. Could it cause previously throttled frame not throttled?
Owner: wangxianzhu@chromium.org
The test page has no iframes in it, so it can't be a throttling issue.

The test page also doesn't use IntersectionObserver, so it seems quite unlikely that my change is the cause.

Punting back to wangxianzhu.
Owner: szager@chromium.org
Sorry for punting back, but I locally bisected, and the result still pointed to:[f8bf3f03dd8f4b81cb36d21c8af07844de4e9ecb] [IntersectionObserver] Don't snap bounding rects.
Just want say this is really important for us and it would super if you have time to look at it before 67 has fully rolled out.

I've checked our metrics and 67 is slowly getting more user  (25% of 66+67 right now at least for the users where we collect timing metrics) but it's hard to say if the first paint is affected (it looks that way but I want more users to be sure).
Cc: -wangxianzhu@chromium.org szager@chromium.org
Owner: wangxianzhu@chromium.org
szager@ told me offline that code changed in the CL didn't execute for the case. I'll investigate more to figure out the root case of the extra animations.
Blocking: 812893
Cc: hbengali@chromium.org mlamouri@chromium.org dalecur...@chromium.org
Components: -Blink>Scheduling -Blink>Layout Blink>Media>Controls
Labels: -Type-Bug -Pri-2 Pri-1 Type-Bug-Regression
Owner: steimel@chromium.org
Summary: UseModernMediaControls makes Wikipedia's first paint slower on Chrome 67 (was: Wikipedia first paint is slower on Chrome 67 than 66)
I bisected locally again, and the result still pointed to szager@'s CL. However, just on that revision the changed code is not executed when loading wikipedia. Then I tried from the revision before szager@'s CL, and add one CL each time, then the extra animation appeared after crrev.com/fdfc5ae4154a67e0539b9d0a77b3c5e2da38f40b. So I guess my bisects were influenced by the field trial.

I tried --disable-features=UseModernMediaControls on crrev.com/fdfc5ae4154a67e0539b9d0a77b3c5e2da38f40b, and the extra animations didn't appear. Also tried to enable and disable feature on ToT, and the extra animations appear only with the feature enabled.

 
Cc: beccahughes@chromium.org
+beccahughes who looked at a controls performance.

What's surprising here is that none of the pages I tried (Barack Obama, Facebook and Sweden) have an audio or video element in the DOM. I wonder if there is something created but not added to the tree. If that's the case, on top of my head, only the loading of the CSS would be different.
Pinging this again, is there something we can do to help or can we add more info? This is really important for us. Thanks
I had a quick look and I could see audio elements on all three of those pages quickly flash with our new media controls and then disappear.
To echo what beccahughes said, all of these pages do have media controls.  You'll find them at the bottom of the page, below the citations.

Other wikis that use a slightly different layout have a much more pronounced version of this.  For example, if you look at https://nl.wikipedia.org/wiki/Amsterdam you'll see the same behavior for the audio player on that page, which is on the top right-hand corner of the page just below the search bar.  Whether this is the culprit for the ~100ms increase in firstPaint times after Chrome 67 isn't clear, but it would seem to be worth addressing in its own right.


Blocking: 463348
I've been going through our RUM metrics from when users switched from Chrome 66 to 67 and I think I can see slower first paint with 67. Attaching three images: Our user change from 66 -> 67, median and p75 first paint during that period.
chrome67-66.png
168 KB View Download
chrome66-67.png
179 KB View Download
Screen Shot 2018-09-28 at 11.00.02 AM.png
399 KB View Download

Sign in to add a comment