PageLoad.PaintTiming.NavigationToFirstContentfulPaint regressed in Canary
Reported by
dakut...@codeanimu.net,
Aug 15 2016
|
|||||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2824.0 Safari/537.36 Example URL: Steps to reproduce the problem: 1. Keep Chrome open several hours (4+). 2. Visit any content-heavy page. What is the expected behavior? Page loads quickly. What went wrong? Content-heavy pages take nearly twice as long to load as normal, browser slows to a crawl while this happening. The info bar on these pages also tend to take a long time to load extensions (although there doesn't seem to be any specific extension that is hanging, seems to be random). A few examples of slow loading pages (after a few hours): * http://www.mmo-champion.com/content/ * https://www.reddit.com/ * https://www.twitch.tv (Any stream) * https://www.youtube.com (Any video) I have two timeline files I can provide if need be (I would upload them here but can't due to the 10MB limit). Both for the same page, one after a browser reload and another a few hours after. Does it occur on multiple sites: Yes Is it a problem with a plugin? N/A Did this work before? Yes Mid July is the earliest I can remember it working fine. Does this work in other browsers? Yes Chrome version: 54.0.2824.0 Channel: n/a OS Version: 6.1 (Windows 7, Windows Server 2008 R2) Flash Version: Shockwave Flash 22.0 r0 I have noticed that the pages that tend to slow down are ones that use some form of plugin or extension. As I noted above though, there doesn't seem to be any specific plugin/extension causing this. I should also mention that pages that aren't too content-heavy have no noticeable difference in load time (Like the Google front page).
,
Aug 15 2016
Can't seem to compress the file down to 10MB, so I had to upload it elsewhere: http://codeanimu.net/random/archive/uploads/trace_Mon_Aug_15_2016_23.46.13.json.gz Also had an error appear, but the file seems to have saved alright: RangeError: Inflated gzip data too long to fit into a string (306663794). at Function.GzipImporter.inflateGzipData_ (chrome://tracing/tracing.js:4681:204) at GzipImporter.extractSubtraces (chrome://tracing/tracing.js:4685:276) at Import.createImports (chrome://tracing/tracing.js:1471:112) at Task.run (chrome://tracing/tracing.js:2060:13) at runAnother (chrome://tracing/tracing.js:2072:136) at runTask (chrome://tracing/tracing.js:2035:57) at processIdleWork (chrome://tracing/tracing.js:2041:116) at window.requestIdleCallback.timeout (chrome://tracing/tracing.js:2029:81)
,
Aug 17 2016
same problem here its really frustrating after like 1 - 2 hours of use its become extremely laggy...even the circle indicator on the tabs lag while animating. and loading heavy page makes the whole chrome slow as hell like if its all running as a one process ( lol like firefox )
,
Aug 17 2016
Issue 638699 (Which seems like it's probably the same problem) has a good/bad trace of the same issue (Which I encountered on https://codereview.chromium.org). Killing the slow renderer process temporarily fixes the issue, as does restarting Chrome. Doesn't look like a memory issue.
,
Aug 17 2016
yep its not memory issue at all
,
Aug 17 2016
Looking at our histogram, looks like this is showing up on our performance histograms. At the 50th percentile, PageLoad.PaintTiming.NavigationToFirstContentfulPaint increased by about 12%. From performance data, looks like the regression is almost definitely between 54.0.2823.0 and 54.0.2824.0, which gives us about 230 CLs to consider: https://chromium.googlesource.com/chromium/src/+log/54.0.2823.0..54.0.2824.0?pretty=fuller&n=10000
,
Aug 17 2016
,
Aug 17 2016
Elevating priority. Can reconsider the release block label if need be, but seems like a major regression.
,
Aug 17 2016
Here's a trace with more options enabled: https://drive.google.com/a/google.com/file/d/0ByZXkrqJh4uQWEZTNWtqNkRud2c/view?usp=sharing (Shared via my corp account, so have to use a google.com account to access)
,
Aug 18 2016
I'm seeing this issue on multiple computers with the same Chrome version. It is quite frustrating. During development I'm seeing page load times go from ~1 sec to ~14 sec after about 10 mins of development/refreshing/testing/etc. It seems to be network related, as the network graph shows a much higher transfer time (the blue bar, the other are fine) than normal. It seems to be tab specific. Loading the exact same page in another tab reduces the load times down to ~1 sec again. (only for the new tab, the old tab remains at ~14 sec per refresh)
,
Aug 18 2016
I've looked through Chrome's network traces, and they look fine, and the network stack has no notion of a tab, or a process, so at least it seems to be closer to the renderer than that.
,
Aug 18 2016
I can confirm it happens on chromium build 410522 (So no chrome only, not caused by a field trial, I believe), so I'm going to try and bisect it on chromium builds. Unfortunately, it can take 30+ minutes to repro (Unclear if it repros if browser is not being used), so longer than that to determine the issue most likely is not present on a particular version. Also can confirm it's per-process, as tyscorp's comment indicates: Two tabs of the same site using the same process will both show the issue, but a tab in another process will work fine at the same time, displaying the same site. Marking this as assigned to me, just to indicate I'm actively investigating. If anyone who actually works on stuff in the renderer process wants to take it from me, feel free.
,
Aug 18 2016
Please update the issue with the ongoing results of the bisect. We may not need a full bisect to find suspicious changes. Also fixing up labels so this might catch the eye of someone more knowledgeable.
,
Aug 18 2016
Here are some screenshots of the network tab, with: The issue occurring: https://i.imgur.com/OWlhp5j.png The same page opened in a new tab: https://i.imgur.com/QZIEsx2.png Navigating to another page and then pressing the back button seems to fix the issue for the slow tab, but refreshing will not.
,
Aug 18 2016
Will do, thanks for asking me - I wouldn't have done so, otherwise. 410423 looks good to me. I'm not certain of that, which will be the case of all my "looks good" updates, but I'm moving on, or will after lunch, at least.
,
Aug 18 2016
A wild guess based on a suspicious name: "Don't use PostTask queueing between compositor and main thread." https://codereview.chromium.org/2162143002 FWIW, the bug is pretty quick to appear on https://stackoverflow.com pages.
,
Aug 18 2016
I agree. +dtapuska any insights here? Could your change have caused a scheduling problem?
,
Aug 18 2016
+skyostil@ I wouldn't anticipate any scheduling changes do to my patch we are still sending events still the same via a PostTask.
,
Aug 18 2016
woxxom: Stackoverflow works much better in reproducing the issue. I can confirm that 410495 is definitely bad. Haven't been able to repro with 410474 yet, but still trying.
,
Aug 18 2016
#18: yeah, that should be the case. A trace with disabled-by-default-renderer.scheduler turned on might be useful for seeing what's going on.
,
Aug 18 2016
Looks like 410484 is good (Used it and a buggy version side by side, hammering Stack Overflow, it didn't show any issues well after the bug surfaced on the other). Accidentally did a cross-process navigation, so my borked tab is no more. I'll provide a trace with that enabled once I manage to repo again. I think my earlier trace at https://drive.google.com/a/google.com/file/d/0ByZXkrqJh4uQWEZTNWtqNkRud2c/view?usp=sharing may already have had it enabled, but I'm not sure.
,
Aug 18 2016
Is the issue reproducible in non 54.0.2825.0 or later? My change had a use after free which was fixed the next morning see issue 635870.
,
Aug 18 2016
Yes, it's still present on Canary.
,
Aug 18 2016
I'm having trouble reproing on Canary, now (Despite being able to do it 15 minutes ago without too much trouble) - enough that I'm not at all sure that 410484 is good. Turns out prerender can also mess up a repro attempts, which is something I also wasn't on the lookout for.
,
Aug 18 2016
...And the instant I finally give up on it, to review code until it happens again, it happens again. Here's a trace from Canary 54.0.2832.0 with all 3 resource scheduler options enabled.
,
Aug 18 2016
[skyostil / dtapuska]: Think the CL can be reverted cleanly? I'm thinking I use 410484 for the rest of the day, and if I can't repro with it, we revert, and see if the Canary per numbers recover. It generally takes less than an hour to repro when reviewing code, and that's ~3 hours of using it.
,
Aug 18 2016
skyostil@ I did see in the trace that then pending_input_event_count_ is 1. And I did reproduce the pending_input_event_count_ getting out of skew when the mousemove handler on a page is janky. I've put together a quick patch to fix this in https://codereview.chromium.org/2259983002 My hope is that we can get this into the next canary and see if it is better.
,
Aug 18 2016
Awesome, passing the buck (bug?)
,
Aug 18 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/73d217001f42b22ce119e5269e74e1f6923d80ee commit 73d217001f42b22ce119e5269e74e1f6923d80ee Author: dtapuska <dtapuska@chromium.org> Date: Thu Aug 18 23:18:34 2016 Pending input event count can get thrown off by coalesced events. When the main thread event queue coalesces multiple events the count inside the render scheduler can be throw off. Ensure that for every coalesced event we let the render scheduler know it was handled. BUG= 637640 Review-Url: https://codereview.chromium.org/2259983002 Cr-Commit-Position: refs/heads/master@{#412968} [modify] https://crrev.com/73d217001f42b22ce119e5269e74e1f6923d80ee/content/renderer/input/main_thread_event_queue.cc
,
Aug 19 2016
Give the method in issue 637523 a whirl? You're playing with the probabilistic chance of click-dragging by accident, while that one reproduces every time.
,
Aug 20 2016
Appears to be fixed in 54.0.2832.2 dev...
,
Aug 20 2016
Curious... That doesn't have the above fix, I believe. Instead it has the fix to issue 638868 .
,
Aug 20 2016
@mmenke, indeed, it just doesn't happen on stackoverflow.com anymore for me. Now I see it on https://support.google.com/chrome_webstore/contact/developer_support/?hl=en and https://developer.chrome.com/webstore/faq#troubleshooting
,
Aug 20 2016
Looks like 54.0.2834.0 is now making the rounds. Please try it, and tell us if the issue is still present.
,
Aug 21 2016
Issue 639648 has been merged into this issue.
,
Aug 22 2016
The PageLoad.PaintTiming.NavigationToFirstContentfulPaint graph has normalized based on the new versions with the fix. Marking issue as fixed.
,
Aug 22 2016
woxxom: Thanks for your help investigating this!
,
Aug 22 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/dad74f04e448eab5ecbdbd06e087d5780bdfa065 commit dad74f04e448eab5ecbdbd06e087d5780bdfa065 Author: dtapuska <dtapuska@chromium.org> Date: Mon Aug 22 14:55:02 2016 Add a unit test expection for calling the render scheduler on coalesced events. Change the API around a little so that we can mock out the callback to ensure that we are calling in the correct scenarios. BUG= 637640 Review-Url: https://codereview.chromium.org/2259823007 Cr-Commit-Position: refs/heads/master@{#413442} [modify] https://crrev.com/dad74f04e448eab5ecbdbd06e087d5780bdfa065/content/renderer/input/input_event_filter.cc [modify] https://crrev.com/dad74f04e448eab5ecbdbd06e087d5780bdfa065/content/renderer/input/input_event_filter.h [modify] https://crrev.com/dad74f04e448eab5ecbdbd06e087d5780bdfa065/content/renderer/input/main_thread_event_queue.cc [modify] https://crrev.com/dad74f04e448eab5ecbdbd06e087d5780bdfa065/content/renderer/input/main_thread_event_queue.h [modify] https://crrev.com/dad74f04e448eab5ecbdbd06e087d5780bdfa065/content/renderer/input/main_thread_event_queue_unittest.cc [modify] https://crrev.com/dad74f04e448eab5ecbdbd06e087d5780bdfa065/third_party/WebKit/public/blink_headers.gypi [add] https://crrev.com/dad74f04e448eab5ecbdbd06e087d5780bdfa065/third_party/WebKit/public/platform/scheduler/test/DEPS [add] https://crrev.com/dad74f04e448eab5ecbdbd06e087d5780bdfa065/third_party/WebKit/public/platform/scheduler/test/mock_renderer_scheduler.h
,
Aug 22 2016
+kouhei, it would be great to write a regression test for PCv2 for this one. If it seems doable let's open up a separate bug.
,
Aug 22 2016
Issue 639927 has been merged into this issue.
,
Aug 23 2016
Verified the fix on Windows 7 for Google Chrome Dev Version - 54.0.2837.0 Screen-recording is attached. TE-Verified labels are attached.
,
Aug 23 2016
>c39 Sadly, testing long running cases are hard :/ I don't see a naive way to implement it on top of PCv2+current speed infra. We historically had a Chrome Endure test, which run Chrome for hours and monitored memory consumption. However, CL range of each run was huge and long running time made it infeasible for auto-bisect, so we gave up on the approach.
,
Aug 23 2016
I'm not sure the description is fully accurate it. It seems like some of the later comments were able to repro quickly. dtapuska@ is there a way to repro this edge case easily? If PCv2 supports adding custom html we may be able to simulate this problem (though I'm not sure if that approach aligns with PCv2 goals).
,
Aug 23 2016
The quick repro for this case is to add to do something like this to the document:
document.addEventListener("mousemove", function() { var start = performance.now(); var now = null; do { now = performance.now(); } while(now - start < 100); });
And then press the mouse button down and drag.
,
Aug 23 2016
I think this is out of scope of PCv2, as PCv2 measures loading-time key metrics (ttfmp/tti, etc). If we are to add this test, we should add this as a separate telemetry test for tough_scheduling_cases, it seems completely do-able as a custom story + TBMv2 responsivenessMetric.
,
Aug 25 2016
Issue 637818 has been merged into this issue.
,
Aug 25 2016
Issue 638700 has been merged into this issue.
,
Aug 28 2016
Still noticing this occuring in 54.0.2837.0 dev-m (64-bit). The recent update did seem to cause it to not occur as quick anymore though. Slowdown used to occur after 2-3+ hours, now it's about 6-8+.
,
Aug 28 2016
Yep still happening here too. I tried chrome stable for compression and got no slow downs whatsoever. You can really notice it the most when loading heavy websites the whole browser slow down which not the case on the stable builds.
,
Aug 28 2016
Thanks for the update. 54.0.2837.0 is at position 413618, and the fix was 413442, so the fix should indeed be on dev. Re-opening the bug. dtapuska: Are you sure the issue was completely fixed?
,
Aug 29 2016
mmenke@ the PageLoad.PaintTiming.NavigationToFirstContentfulPaint is definitely fixed as you can clearly see on the UMA timeline graphs. I think we should consider this bug solely related to that. I've adjusted the title appropriately. The trace in #25 was what was used to discover the fix for this issue. dakutree@ something else must be going on. Can you please file a new issue with the repro steps and provide a trace at that time? (cc mmenke@ and myself and we will look at the traces).
,
Aug 29 2016
dakutree, yahyoh1991: End users can't CC people when filing a new bug, so whichever one of you files the bug, please just post the new bug ID here (That'll also let anyone following this bug who's still running into issues find the new one).
,
Aug 29 2016
And please one of you do file the new bug - it looks to me like there may be another regression in this histogram, which started before the issue dtapuska fixed, so there may well be some other, earlier regression in a related area of the code that you guys are running into. We're not just dismissing the fact you guys are running into a similar-looking / possibly related bug.
,
Aug 29 2016
Filed a new bug @ https://bugs.chromium.org/p/chromium/issues/detail?id=641958. Will try and provide a trace whenever the issue pops up again.
,
Sep 27 2016
|
|||||||||||||
►
Sign in to add a comment |
|||||||||||||
Comment 1 by cbiesin...@chromium.org
, Aug 15 2016