New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 637640 link

Starred by 21 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug



Sign in to add a comment

PageLoad.PaintTiming.NavigationToFirstContentfulPaint regressed in Canary

Reported by dakut...@codeanimu.net, Aug 15 2016

Issue description

UserAgent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2824.0 Safari/537.36

Example URL:

Steps to reproduce the problem:
1. Keep Chrome open several hours (4+).
2. Visit any content-heavy page.

What is the expected behavior?
Page loads quickly.

What went wrong?
Content-heavy pages take nearly twice as long to load as normal, browser slows to a crawl while this happening.
The info bar on these pages also tend to take a long time to load extensions (although there doesn't seem to be any specific extension that is hanging, seems to be random).

A few examples of slow loading pages (after a few hours):
* http://www.mmo-champion.com/content/
* https://www.reddit.com/
* https://www.twitch.tv (Any stream)
* https://www.youtube.com (Any video)

I have two timeline files I can provide if need be (I would upload them here but can't due to the 10MB limit). Both for the same page, one after a browser reload and another a few hours after.

Does it occur on multiple sites: Yes

Is it a problem with a plugin? N/A 

Did this work before? Yes Mid July is the earliest I can remember it working fine.

Does this work in other browsers? Yes 

Chrome version: 54.0.2824.0  Channel: n/a
OS Version: 6.1 (Windows 7, Windows Server 2008 R2)
Flash Version: Shockwave Flash 22.0 r0

I have noticed that the pages that tend to slow down are ones that use some form of plugin or extension. As I noted above though, there doesn't seem to be any specific plugin/extension causing this.

I should also mention that pages that aren't too content-heavy have no noticeable difference in load time (Like the Google front page).
 
Labels: Needs-Feedback
Could you attach a trace as per https://www.chromium.org/developers/how-tos/submitting-a-performance-bug ? Thanks!
Can't seem to compress the file down to 10MB, so I had to upload it elsewhere: http://codeanimu.net/random/archive/uploads/trace_Mon_Aug_15_2016_23.46.13.json.gz

Also had an error appear, but the file seems to have saved alright:
RangeError: Inflated gzip data too long to fit into a string (306663794).
    at Function.GzipImporter.inflateGzipData_ (chrome://tracing/tracing.js:4681:204)
    at GzipImporter.extractSubtraces (chrome://tracing/tracing.js:4685:276)
    at Import.createImports (chrome://tracing/tracing.js:1471:112)
    at Task.run (chrome://tracing/tracing.js:2060:13)
    at runAnother (chrome://tracing/tracing.js:2072:136)
    at runTask (chrome://tracing/tracing.js:2035:57)
    at processIdleWork (chrome://tracing/tracing.js:2041:116)
    at window.requestIdleCallback.timeout (chrome://tracing/tracing.js:2029:81)
same problem here its really frustrating after like 1 - 2 hours of use its become extremely laggy...even the circle indicator on the tabs lag while animating.
and loading heavy page makes the whole chrome slow as hell like if its all running as a one process ( lol like firefox ) 
   

Comment 4 by mmenke@chromium.org, Aug 17 2016

 Issue 638699  (Which seems like it's probably the same problem) has a good/bad trace of the same issue (Which I encountered on https://codereview.chromium.org).  Killing the slow renderer process temporarily fixes the issue, as does restarting Chrome.

Doesn't look like a memory issue.
yep its not memory issue at all 

Comment 6 by mmenke@chromium.org, Aug 17 2016

Looking at our histogram, looks like this is showing up on our performance histograms.  At the 50th percentile, PageLoad.PaintTiming.NavigationToFirstContentfulPaint increased by about 12%.

From performance data, looks like the regression is almost definitely between 54.0.2823.0 and 54.0.2824.0, which gives us about 230 CLs to consider:  https://chromium.googlesource.com/chromium/src/+log/54.0.2823.0..54.0.2824.0?pretty=fuller&n=10000

Comment 7 by mmenke@chromium.org, Aug 17 2016

Cc: mmenke@chromium.org csharrison@chromium.org
 Issue 638699  has been merged into this issue.

Comment 8 by mmenke@chromium.org, Aug 17 2016

Labels: -Pri-2 ReleaseBlock-Stable M-54 Pri-1
Elevating priority.  Can reconsider the release block label if need be, but seems like a major regression.

Comment 9 by mmenke@chromium.org, Aug 17 2016

Here's a trace with more options enabled:  https://drive.google.com/a/google.com/file/d/0ByZXkrqJh4uQWEZTNWtqNkRud2c/view?usp=sharing (Shared via my corp account, so have to use a google.com account to access)

Comment 10 by tysc...@gmail.com, Aug 18 2016

I'm seeing this issue on multiple computers with the same Chrome version. It is quite frustrating.

During development I'm seeing page load times go from ~1 sec to ~14 sec after about 10 mins of development/refreshing/testing/etc. It seems to be network related, as the network graph shows a much higher transfer time (the blue bar, the other are fine) than normal.

It seems to be tab specific. Loading the exact same page in another tab reduces the load times down to ~1 sec again. (only for the new tab, the old tab remains at ~14 sec per refresh)
I've looked through Chrome's network traces, and they look fine, and the network stack has no notion of a tab, or a process, so at least it seems to be closer to the renderer than that.
Cc: -mmenke@chromium.org
Labels: -Needs-Feedback -Arch-x86_64
Owner: mmenke@chromium.org
Status: Assigned (was: Unconfirmed)
I can confirm it happens on chromium build 410522 (So no chrome only, not caused by a field trial, I believe), so I'm going to try and bisect it on chromium builds.  Unfortunately, it can take 30+ minutes to repro (Unclear if it repros if browser is not being used), so longer than that to determine the issue most likely is not present on a particular version.

Also can confirm it's per-process, as tyscorp's comment indicates:  Two tabs of the same site using the same process will both show the issue, but a tab in another process will work fine at the same time, displaying the same site.

Marking this as assigned to me, just to indicate I'm actively investigating.  If anyone who actually works on stuff in the renderer process wants to take it from me, feel free.
Components: -Blink Blink>Scheduling Blink>HTML>Parser Blink>Loader
Please update the issue with the ongoing results of the bisect. We may not need a full bisect to find suspicious changes.

Also fixing up labels so this might catch the eye of someone more knowledgeable.

Comment 14 by tysc...@gmail.com, Aug 18 2016

Here are some screenshots of the network tab, with:

The issue occurring: https://i.imgur.com/OWlhp5j.png
The same page opened in a new tab: https://i.imgur.com/QZIEsx2.png

Navigating to another page and then pressing the back button seems to fix the issue for the slow tab, but refreshing will not.
Will do, thanks for asking me - I wouldn't have done so, otherwise.  410423 looks good to me.  I'm not certain of that, which will be the case of all my "looks good" updates, but I'm moving on, or will after lunch, at least.

Comment 16 by woxxom@gmail.com, Aug 18 2016

A wild guess based on a suspicious name:

"Don't use PostTask queueing between compositor and main thread." 
https://codereview.chromium.org/2162143002

FWIW, the bug is pretty quick to appear on https://stackoverflow.com pages.
Cc: dtapu...@chromium.org
I agree. +dtapuska any insights here? Could your change have caused a scheduling problem?
Cc: skyos...@chromium.org
+skyostil@  I wouldn't anticipate any scheduling changes do to my patch we are still sending events still the same via a PostTask.
woxxom:  Stackoverflow works much better in reproducing the issue.  I can confirm that 410495 is definitely bad.  Haven't been able to repro with 410474 yet, but still trying.
#18: yeah, that should be the case. A trace with disabled-by-default-renderer.scheduler turned on might be useful for seeing what's going on.
Looks like 410484 is good (Used it and a buggy version side by side, hammering Stack Overflow, it didn't show any issues well after the bug surfaced on the other).

Accidentally did a cross-process navigation, so my borked tab is no more.  I'll provide a trace with that enabled once I manage to repo again.  I think my earlier trace at https://drive.google.com/a/google.com/file/d/0ByZXkrqJh4uQWEZTNWtqNkRud2c/view?usp=sharing may already have had it enabled, but I'm not sure.
Is the issue reproducible in non 54.0.2825.0 or later?

My change had a use after free which was fixed the next morning see issue 635870.
Yes, it's still present on Canary.
I'm having trouble reproing on Canary, now (Despite being able to do it 15 minutes ago without too much trouble) - enough that I'm not at all sure that 410484 is good.  Turns out prerender can also mess up a repro attempts, which is something I also wasn't on the lookout for.
...And the instant I finally give up on it, to review code until it happens again, it happens again.  Here's a trace from Canary 54.0.2832.0 with all 3 resource scheduler options enabled.
trace_painfully_slow.json.gz
9.2 MB Download
[skyostil / dtapuska]:  Think the CL can be reverted cleanly?  I'm thinking I use 410484 for the rest of the day, and if I can't repro with it, we revert, and see if the Canary per numbers recover.  It generally takes less than an hour to repro when reviewing code, and that's ~3 hours of using it.
skyostil@ I did see in the trace that then pending_input_event_count_ is 1. 

And I did reproduce the pending_input_event_count_ getting out of skew when the mousemove handler on a page is janky. I've put together a quick patch to fix this in https://codereview.chromium.org/2259983002

My hope is that we can get this into the next canary and see if it is better.


Cc: mmenke@chromium.org
Owner: dtapu...@chromium.org
Awesome, passing the buck (bug?)
Project Member

Comment 29 by bugdroid1@chromium.org, Aug 18 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/73d217001f42b22ce119e5269e74e1f6923d80ee

commit 73d217001f42b22ce119e5269e74e1f6923d80ee
Author: dtapuska <dtapuska@chromium.org>
Date: Thu Aug 18 23:18:34 2016

Pending input event count can get thrown off by coalesced events.

When the main thread event queue coalesces multiple events the count inside
the render scheduler can be throw off. Ensure that for every coalesced
event we let the render scheduler know it was handled.

BUG= 637640 

Review-Url: https://codereview.chromium.org/2259983002
Cr-Commit-Position: refs/heads/master@{#412968}

[modify] https://crrev.com/73d217001f42b22ce119e5269e74e1f6923d80ee/content/renderer/input/main_thread_event_queue.cc

Comment 30 by ranz...@gmail.com, Aug 19 2016

Give the method in  issue 637523  a whirl? You're playing with the probabilistic chance of click-dragging by accident, while that one reproduces every time.

Comment 31 by woxxom@gmail.com, Aug 20 2016

Appears to be fixed in 54.0.2832.2 dev...
Curious...  That doesn't have the above fix, I believe.  Instead it has the fix to  issue 638868 .
Looks like 54.0.2834.0 is now making the rounds.  Please try it, and tell us if the issue is still present.

Comment 35 by horo@chromium.org, Aug 21 2016

Issue 639648 has been merged into this issue.
Labels: Hotlist-Fixit
Status: Fixed (was: Assigned)
The PageLoad.PaintTiming.NavigationToFirstContentfulPaint graph has normalized based on the new versions with the fix. Marking issue as fixed.
woxxom:  Thanks for your help investigating this!
Project Member

Comment 38 by bugdroid1@chromium.org, Aug 22 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/dad74f04e448eab5ecbdbd06e087d5780bdfa065

commit dad74f04e448eab5ecbdbd06e087d5780bdfa065
Author: dtapuska <dtapuska@chromium.org>
Date: Mon Aug 22 14:55:02 2016

Add a unit test expection for calling the render scheduler on coalesced events.

Change the API around a little so that we can mock out the callback to
ensure that we are calling in the correct scenarios.

BUG= 637640 

Review-Url: https://codereview.chromium.org/2259823007
Cr-Commit-Position: refs/heads/master@{#413442}

[modify] https://crrev.com/dad74f04e448eab5ecbdbd06e087d5780bdfa065/content/renderer/input/input_event_filter.cc
[modify] https://crrev.com/dad74f04e448eab5ecbdbd06e087d5780bdfa065/content/renderer/input/input_event_filter.h
[modify] https://crrev.com/dad74f04e448eab5ecbdbd06e087d5780bdfa065/content/renderer/input/main_thread_event_queue.cc
[modify] https://crrev.com/dad74f04e448eab5ecbdbd06e087d5780bdfa065/content/renderer/input/main_thread_event_queue.h
[modify] https://crrev.com/dad74f04e448eab5ecbdbd06e087d5780bdfa065/content/renderer/input/main_thread_event_queue_unittest.cc
[modify] https://crrev.com/dad74f04e448eab5ecbdbd06e087d5780bdfa065/third_party/WebKit/public/blink_headers.gypi
[add] https://crrev.com/dad74f04e448eab5ecbdbd06e087d5780bdfa065/third_party/WebKit/public/platform/scheduler/test/DEPS
[add] https://crrev.com/dad74f04e448eab5ecbdbd06e087d5780bdfa065/third_party/WebKit/public/platform/scheduler/test/mock_renderer_scheduler.h

Cc: kouhei@chromium.org
+kouhei, it would be great to write a regression test for PCv2 for this one. If it seems doable let's open up a separate bug.
Issue 639927 has been merged into this issue.
Cc: rnimmagadda@chromium.org
Labels: TE-Verified-M54 TE-Verified-54.0.2837.0
Verified the fix on Windows 7 for Google Chrome Dev Version - 54.0.2837.0 

Screen-recording is attached.

TE-Verified labels are attached.
637640.mp4
1.6 MB View Download
>c39
Sadly, testing long running cases are hard :/ I don't see a naive way to implement it on top of PCv2+current speed infra.

We historically had a Chrome Endure test, which run Chrome for hours and monitored memory consumption. However, CL range of each run was huge and long running time made it infeasible for auto-bisect, so we gave up on the approach.
I'm not sure the description is fully accurate it. It seems like some of the later comments were able to repro quickly. dtapuska@ is there a way to repro this edge case easily?

If PCv2 supports adding custom html we may be able to simulate this problem (though I'm not sure if that approach aligns with PCv2 goals).
The quick repro for this case is to add to do something like this to the document:

document.addEventListener("mousemove", function() { var start = performance.now(); var now = null; do { now = performance.now(); } while(now - start < 100); });

And then press the mouse button down and drag.
I think this is out of scope of PCv2, as PCv2 measures loading-time key metrics (ttfmp/tti, etc).
If we are to add this test, we should add this as a separate telemetry test for tough_scheduling_cases, it seems completely do-able as a custom story + TBMv2 responsivenessMetric.
Issue 637818 has been merged into this issue.
 Issue 638700  has been merged into this issue.
Still noticing this occuring in 54.0.2837.0 dev-m (64-bit).
The recent update did seem to cause it to not occur as quick anymore though. Slowdown used to occur after 2-3+ hours, now it's about 6-8+.
Yep still happening here too. 
I tried chrome stable for compression and got no slow downs whatsoever. 

You can really notice it the most when loading heavy websites the whole browser slow down which not the case on the stable builds. 
Status: Assigned (was: Fixed)
Thanks for the update. 54.0.2837.0 is at position 413618, and the fix was 413442, so the fix should indeed be on dev.

Re-opening the bug.

dtapuska:  Are you sure the issue was completely fixed?
Status: Fixed (was: Assigned)
Summary: PageLoad.PaintTiming.NavigationToFirstContentfulPaint regressed in Canary (was: Chrome becoming extremely sluggish after being open for a few hours)
mmenke@ the PageLoad.PaintTiming.NavigationToFirstContentfulPaint is definitely fixed as you can clearly see on the UMA timeline graphs. I think we should consider this bug solely related to that. I've adjusted the title appropriately. The trace in #25 was what was used to discover the fix for this issue.

dakutree@ something else must be going on. Can you please file a new issue with the repro steps and provide a trace at that time? (cc mmenke@ and myself and we will look at the traces).
dakutree, yahyoh1991:  End users can't CC people when filing a new bug, so whichever one of you files the bug, please just post the new bug ID here (That'll also let anyone following this bug who's still running into issues find the new one).
And please one of you do file the new bug - it looks to me like there may be another regression in this histogram, which started before the issue dtapuska fixed, so there may well be some other, earlier regression in a related area of the code that you guys are running into.  We're not just dismissing the fact you guys are running into a similar-looking / possibly related bug.
Filed a new bug @ https://bugs.chromium.org/p/chromium/issues/detail?id=641958.
Will try and provide a trace whenever the issue pops up again.
Labels: Hotlist-Input-Dev

Sign in to add a comment