Issue metadata
Sign in to add a comment
|
2.3% regression in smoothness.key_mobile_sites_smooth at 393679:393717 |
||||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
May 25 2016
Exception has intrupted last bisect. I kicked another one with narrower range (393678 : 393698) based on partial result from last bisect. ===== BISECT JOB RESULTS ===== Status: started ===== TESTED REVISIONS ===== Revision Mean Std Dev N Good? chromium@393678 16.8928 0.0151531 5 good chromium@393698 17.2264 0.0607864 5 bad chromium@393717 17.2387 0.019965 5 bad Bisect job ran on: android_nexus7_perf_bisect Bug ID: 613193 Test Command: src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --also-run-disabled-tests smoothness.key_mobile_sites_smooth Test Metric: frame_times/frame_times Relative Change: 2.05% Score: 0 Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/android_nexus7_perf_bisect/builds/2983 Job details: https://chromeperf.appspot.com/buildbucket_job_status/9012219037553688112 Not what you expected? We'll investigate and get back to you! https://chromeperf.appspot.com/bad_bisect?try_job_id=5273161961046016 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Tests>AutoBisect. Thank you!
,
May 25 2016
=== Auto-CCing suspected CL author mdjones@chromium.org === Hi mdjones@chromium.org, the bisect results pointed to your CL below as possibly causing a regression. Please have a look at this info and see whether your CL be related. ===== BISECT JOB RESULTS ===== Status: completed ===== SUSPECTED CL(s) ===== Subject : StaticTabSceneLayer owns ToolbarLayer Author : mdjones Commit description: Since StaticLayout is the only layout that needs the toolbar layout, its ownership has been moved to StaticLayout's SceneLayer. This puts the ToolbarLayer in the content tree for SceneOverlays. BUG=584340 Review-Url: https://codereview.chromium.org/1706293005 Cr-Commit-Position: refs/heads/master@{#393696} Commit : 94e6d2cbf7bf2691014401bc11be703dc45f43b0 Date : Sat May 14 00:10:58 2016 ===== TESTED REVISIONS ===== Revision Mean Std Dev N Good? chromium@393678 16.9687 0.0291235 5 good chromium@393688 16.9529 0.024522 5 good chromium@393693 16.9796 0.0248545 5 good chromium@393695 16.9691 0.0251036 4 good chromium@393696 17.3024 0.0888717 5 bad <-- chromium@393698 17.3271 0.0342943 5 bad Bisect job ran on: android_nexus7_perf_bisect Bug ID: 613193 Test Command: src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --also-run-disabled-tests smoothness.key_mobile_sites_smooth Test Metric: frame_times/frame_times Relative Change: 2.11% Score: 99.8 Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/android_nexus7_perf_bisect/builds/2987 Job details: https://chromeperf.appspot.com/buildbucket_job_status/9011681186022374512 Not what you expected? We'll investigate and get back to you! https://chromeperf.appspot.com/bad_bisect?try_job_id=5281443261972480 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Tests>AutoBisect. Thank you!
,
Jun 1 2016
Moving this nonessential bug to the next milestone. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jun 23 2016
friendly ping! mdjones@ any idea why your CL has caused this regression?
,
Jun 23 2016
My change didn't affect the behavior of the toolbar, it only moved some of the code around. I'll double check to make sure something bad isn't happening.
,
Jul 1 2016
The bisect looks pretty clean, and the regression hasn't recovered yet. Did you find anything?
,
Jul 6 2016
This code changed again in https://codereview.chromium.org/1988023010/. The only possible cause I can possibly see for this regression is the cost of rebuilding the UI layer tree. The toolbar layer gets updated at the same frequency as it did before, but is included in the tree a slightly different way. Unfortunately the reason for these changes was to facilitate other features and organize code; I will continue to look into this issue.
,
Jul 9 2016
This issue has been moved once and is lower than Pri-1. Removing the milestone. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jul 15 2016
Ping mdjones@! Any update on this?
,
Jul 15 2016
Unfortunately I haven't found anything that would cause this performance regression.
,
Jul 22 2016
Ryan, what should we do about this regression? The bisect is pretty conclusive, but Matthew hasn't found a way to fix this.
,
Aug 5 2016
pinging rschoen@.
,
Aug 18 2016
Perf sheriff ping: reminder to follow up on possible performance issues
,
Aug 18 2016
We received reports from Amazon about jank regressions on tablet devices. There is a 5X increase in jank counts across key mobile sites, affecting Nexus 9 and Nexus7v2. This seems conclusive to be a tablet UI related regression. Jank report https://chromeperf.appspot.com/report?sid=c494d7d9e2bac141e3a8d64132b7b9835506f94ed41c2576a1ca75da6fc7a968&start_rev=391254&end_rev=396559&rev=393735
,
Aug 18 2016
A large regressor is www.boingboing.net, going from ~5 to ~83 janks. https://chromeperf.appspot.com/report?sid=33e718ebe26d65d675844d3b3e1bab28ade73e9d55cb488c828421227e8bd1f8&start_rev=391254&end_rev=396559&rev=393735
,
Aug 18 2016
52 is already out the door and this seems difficult to revert, but let's aim for a fix in 53.
,
Aug 18 2016
Attached the telemetry results for Chrome V52 vs Chrome V51 on a Amazon Fire Device(1 GB) which shows regression across smoothness metrics with jank_count being highly impacted. We did a side by side comparison on two Nexus 7 tablet devices running Android 5.1.1 with M51 and M52 and could see the jankyness across many sites(Please refer to URLs in the attached html file).
,
Aug 18 2016
By manual comparison of a drag scroll on boingboing.net on N9 between 51.0.2704.61, 52.0.2740.0 and 54.0.2830.0, and subjectively the jank level looks the same to me. Adding a ton of tabs to the top bar doesn't visibly change the jank level. On the bot, the before/after trace o also has the exact same count of DelegatedFrame swaps although other counts like GLRenderer swaps vary. Not sure how to interpret this. The patch definitely changed some behavior but it's not clear how. Maybe it only affects the version of boingboing.net that is stored in the pageset, for example?
,
Aug 18 2016
Thanks. I tried amazon.com and reddit.com (the two URLs in #18 that telemetry reports as regressed) on Nexus 9 and couldn't observe any jankiness either. Could you name a specific URL and ideally provide a video of the jankiness so I can confirm we're seeing the same thing?
,
Aug 18 2016
See my request above.
,
Aug 18 2016
Files are large to attach Please find videos on google drive(link sharing enabled) 52.0.2743.98 - https://drive.google.com/open?id=0B4TfWF1uhSYudERWYVVkQTV6bjA 51.0.2704.81 - https://drive.google.com/open?id=0B4TfWF1uhSYuUDZQODhRbGc3VEE Notice the jankyness on 52 compared to 51. If you can open both at the same time and play side by side that would be great.
,
Aug 18 2016
The test was performed on two Nexus 7 Tablets running Android 5.1.1
,
Aug 18 2016
The difference between the videos isn't obvious at a glance and there would be a lot of noise from the fact that the page was still loading while you were scrolling (the telemetry test is post-load to avoid this noise). So I still don't see real evidence that there was a perceptible regression in M52. It seems your report purely comes from the same telemetry metrics we use internally, not from anything user-observed. We should try to better understand why the jank_count changed so much. I'm removing ReleaseBlock label for now.
,
Aug 19 2016
I think I can reproduce a visible regression on www.cnn.com from M51 to M52 on Nexus 7. At first glance it seems to be worst while page loading throbber is animating. Attached traces.
,
Aug 19 2016
Thanks. Yes, agreed that the jank is caused by the loading spinner. The pages in the testset that show a regression are ones where the spinner keeps running a long time. I made an example page that keeps the spinner running forever by busy-waiting the main thread: http://jsbin.com/yaremadoxo/ . The regression in 52 is easy to see on it, so bringing back ReleaseBlock label.
,
Aug 22 2016
This is confusing because the loading spinner is not part of the toolbar class that changed/moved. The only thing that would have changed with regard to that is its position in the layer tree and I can't imagine that would cause such a regression.
,
Aug 24 2016
#28: Just checked and your change may have been wrongly blamed; I'm seeing the janks before r393696.
,
Aug 24 2016
Oh noes, I bisected the main regression to when BeginFrameSource was turned on for UI. 19c108580b99c469ed192066310e3162bf8c196e is the first bad commit commit 19c108580b99c469ed192066310e3162bf8c196e Author: enne <enne@chromium.org> Date: Wed Apr 13 20:35:32 2016 -0700 Hook up ui::Compositor to Display's BeginFrameSource This hooks up the first SurfaceFactoryClient to the real BeginFrameSource owned by the OnScreenDisplayClient. This is done by adding an OutputSurfaceClient::SetBeginFrameSource method. As the SurfaceDisplayOutputSurface is also a SurfaceFactoryClient, it can hand the real begin frame source directly to ui::Compositor's scheduler. This allows the removal of some of the browser vsync plumbing, but not all of it. Once the renderer compositors have been hooked up to use this path as well, then this and all of the VSyncObserver code can be ripped out. The BeginFrameSource is created by the BrowserCompositorOutputSurface which updates it based on vsync information that it receives. This BeginFrameSource is passed to the Display (via OutputSurfaceClient), which informs the SurfaceManager that the compositor using that Display should be driven by that BeginFrameSource. The SurfaceManager then informs the SurfaceDisplayOutputSurface about the BeginFrameSource, which passes it into the single thread proxy's cc::Scheduler for that ui::Compositor's instance. Plumbing! The path from SurfaceManager to SurfaceDisplayOutputSurface was added in https://codereview.chromium.org/1673783004, but the rest of the plumbing is new to this patch. CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel Review URL: https://codereview.chromium.org/1821863002 Cr-Commit-Position: refs/heads/master@{#387228}
,
Aug 24 2016
Updated copy of aelias@' test: http://jsbin.com/suluteh (original seems expired)
,
Aug 24 2016
,
Aug 24 2016
http://jsbin.com/yaremadoxo/ still works for me. Note that it seems to tickle a cache bug. At some point it started refusing to load and I had to Settings -> Apps -> Chrome -> Clear Data to get it to show again.
,
Aug 24 2016
If this needs to block M53, it needs to be fixed and merged back to branch 2785 by next Tuesday @ 5PM.
,
Aug 25 2016
RB-Stable == Pri-1 by definition (Pri-1 implies need for current release).
,
Aug 25 2016
Re: cnn traces. I looked through them a bit but didn't see anything particularly suspicious. vmiura: was there something there you saw? I built ToT and the forever-loading jsbin page scrolled just fine. I could not see any jank. Do other people see this on ToT or has this been fixed between 52 and 54? I'm trying to build m52 on Android at the moment and will see if I can see anything different there.
,
Aug 25 2016
My ToT test was on an N7.
,
Aug 25 2016
I tried 54.0.2838.0 and it is janky compared to M51.
,
Aug 25 2016
Ok, I got a 51 build and I see the difference. 54 and 52 are similarly janky relative to 51.
,
Aug 25 2016
#37: I couldn't tell much from the cnn traces. I attached a trace of the test page on M54. One thing I've noticed looking odd laterly is in traces if you turn on vsync in the trace "View Options", the vysnc interval seems irregular. On M54 it looks regular while scrolling; on M52 it looked irregular all the time. This may be nothing, but thought I'd mention it. In the attached trace I think the long "Scheduler:pending_swaps" on the Renderer process align with the janks. The DisplayScheduler::OnBeginFrameDeadline is getting scheduled very early in the frame; not waiting for the renderer to produce a frame. Looking at trace time 1,962.342 ms - Renderer submits a frame - DisplayCompositor swaps 0.7ms later - Browser submits a frame 10m later - DisplayCompositor swaps after next Vsync without waiting for next Renderer frame.
,
Aug 26 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ab408ac5e81ddf37772a9410cec2b7f747973a2d commit ab408ac5e81ddf37772a9410cec2b7f747973a2d Author: enne <enne@chromium.org> Date: Fri Aug 26 18:52:32 2016 Fix jank on Android due to bad scheduler deadline https://codereview.chromium.org/1821863002 changed the display scheduler on Android to use the CompositorImpl ExternalBeginFrameSource directly instead of a synthetic BeginFrameSource that had its vsync parameters set from WindowAndroidCompositor::OnVSync. The frame time given to the display scheduler is always in the past, which made the past logic always give an immediate deadline, causing the display scheduler to run immediately. Fix this by providing a correct deadline in the future. This fixes bad jank where the display scheduler would never give any other compositors time to give frames. So, if the browser compositor was producing frames (such as a loading spinner) then the renderer frames would be extremely janky as they would often miss the immediate (incorrect) deadline. BUG= 613193 R=sievers@chromium.org Review-Url: https://codereview.chromium.org/2275253003 Cr-Commit-Position: refs/heads/master@{#414765} [modify] https://crrev.com/ab408ac5e81ddf37772a9410cec2b7f747973a2d/content/browser/renderer_host/compositor_impl_android.cc
,
Aug 26 2016
This needs to bake on master a bit, but adding merge requests so they don't get forgotten.
,
Aug 26 2016
Your change meets the bar and is auto-approved for M53 (branch: 2785)
,
Aug 26 2016
Your change meets the bar and is auto-approved for M54 (branch: 2840)
,
Aug 26 2016
Your change meets the bar and is auto-approved for M53 (branch: 2785)
,
Aug 27 2016
Great, looks like a full recovery to numbers before mdjones@ patch. mdjones@: It's interesting that your patch tickled this bug; it may have just changed the UI's timing in some way. The loading spinner seems to animate on a timer that requests frames, which may have changed it's offset relative to the display deadline. brianderson@: Does the display deadline being |start of choreographer frame time| + |vsync interval| look OK? Also, when the UI requests a frame for the loading spinner, it seems like it can run right away rather than wait for the next begin frame. Any potential issues with that?
,
Aug 29 2016
I don't think the UI knows that it is requesting a frame just for the loading spinner. I think it should wait for a begin frame, because if there's input that comes in, submitting a frame early will add latency to handling that input.
,
Aug 29 2016
I also chatted with sievers about whether we should use vsync interval directly or whether we should make it some fraction of the vsync interval. He wasn't sure. I decided to just go with the full vsync interval for now, and if we get a regression then we can use that to tune the number appropriately. I'd rather do that then to just pick a number arbitrarily.
,
Aug 29 2016
> I don't think the UI knows that it is requesting a frame just for the loading > spinner. I think it should wait for a begin frame, because if there's input > that comes in, submitting a frame early will add latency to handling that > input. Sorry I think I phrased it unclearly above. What I meant was, the UI update seems to be running right away instead of waiting for a begin frame. Shouldn't we make it wait?
,
Aug 29 2016
Are you talking about the MISSED begin frame?
,
Aug 29 2016
> Does the display deadline being |start of choreographer frame time| + |vsync interval| look OK? That looks reasonable. @dcastagna has a patch up to use a newish Android API that gives us SurfaceFlinger's deadline that we'll eventually want to use. > Also, when the UI requests a frame for the loading spinner, it seems like it can run right away rather than wait for the next begin frame. Any potential issues with that? There are couple small issues, but nothing major. * If UI raster is unfortunately aligned, our UI single-buffering memory optimization may block the final composite. * DisplayScheduler currently waits for 60fps Renderers even if the UI updates quickly. In this example, the Renderer is slightly less than 60fps, which would cause the DisplayScheduler to give up on the Renderer early sometimes. Now that we have unified begin frame, we can improve that heuristic to instead wait for "Renderers that are requesting BeginFrames", which should let us sneak in a few more frames.
,
Aug 29 2016
51: Previously I think it would have been MISSED but with the fix I think it's less likely it would be missed. Here's what I'm thinking specifically: - It looks to me like the UI update & commit run right after we get a SingleThreadProxy::SetNeedsAnimate() (called via some timer). - The above may happen close to the DisplayScheduler::OnBeginFrameDeadline due time. - UI update and commit takes time, say ~2ms. The update runs on the Browser's main thread, which may delay the DisplayScheduler past it's deadline. I'm wondering if we shouldn't delay the UI update to the next BMF in this case? Perhaps we do... just confirming.
,
Aug 29 2016
I'm confused. If you're looking at a trace, you should be able to see whether it was MISSED or not. Why do you say it's less likely? If there hasn't been an update in a few frames, the browser compositor would stop observing begin frames. A SetNeedsAnimate call would call AddObserver on the begin frame source, which would cause a MISSED begin frame to resend the last begin frame immediately. Arguably there's been no input on this frame, or it would have already been sent and a frame produced. Agreed that this can happen close to a deadline and could delay the DisplayScheduler. I think we could do some more thinking about how MISSED begin frames work. Both the Android external begin frame source and the previous cc::SyntheticBeginFrameSource that was driving the ui::Compositor both send missed begin frames, although the synthetic one has a slightly different condition for it. It'd be nice to be more consistent, and maybe we should be more careful about when we send them, although that seems like a different issue than this regression.
,
Aug 30 2016
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Aug 30 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/48355185f9590a6284ba40d101fb785c3b8ecdd2 commit 48355185f9590a6284ba40d101fb785c3b8ecdd2 Author: Alex Mineer <amineer@chromium.org> Date: Tue Aug 30 18:54:16 2016 Fix jank on Android due to bad scheduler deadline https://codereview.chromium.org/1821863002 changed the display scheduler on Android to use the CompositorImpl ExternalBeginFrameSource directly instead of a synthetic BeginFrameSource that had its vsync parameters set from WindowAndroidCompositor::OnVSync. The frame time given to the display scheduler is always in the past, which made the past logic always give an immediate deadline, causing the display scheduler to run immediately. Fix this by providing a correct deadline in the future. This fixes bad jank where the display scheduler would never give any other compositors time to give frames. So, if the browser compositor was producing frames (such as a loading spinner) then the renderer frames would be extremely janky as they would often miss the immediate (incorrect) deadline. BUG= 613193 R=sievers@chromium.org (cherry picked from commit ab408ac5e81ddf37772a9410cec2b7f747973a2d) Review-Url: https://codereview.chromium.org/2275253003 Cr-Original-Commit-Position: refs/heads/master@{#414765} Cr-Commit-Position: refs/branch-heads/2785@{#791} Cr-Branched-From: 68623971be0cfc492a2cb0427d7f478e7b214c24-refs/heads/master@{#403382} [modify] https://crrev.com/48355185f9590a6284ba40d101fb785c3b8ecdd2/content/browser/renderer_host/compositor_impl_android.cc
,
Aug 30 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1194983de5b5fb8ec58bbe47e3574bc95048fc50 commit 1194983de5b5fb8ec58bbe47e3574bc95048fc50 Author: Adrienne Walker <enne@chromium.org> Date: Tue Aug 30 20:12:23 2016 Fix jank on Android due to bad scheduler deadline https://codereview.chromium.org/1821863002 changed the display scheduler on Android to use the CompositorImpl ExternalBeginFrameSource directly instead of a synthetic BeginFrameSource that had its vsync parameters set from WindowAndroidCompositor::OnVSync. The frame time given to the display scheduler is always in the past, which made the past logic always give an immediate deadline, causing the display scheduler to run immediately. Fix this by providing a correct deadline in the future. This fixes bad jank where the display scheduler would never give any other compositors time to give frames. So, if the browser compositor was producing frames (such as a loading spinner) then the renderer frames would be extremely janky as they would often miss the immediate (incorrect) deadline. BUG= 613193 R=sievers@chromium.org Review-Url: https://codereview.chromium.org/2275253003 Cr-Commit-Position: refs/heads/master@{#414765} (cherry picked from commit ab408ac5e81ddf37772a9410cec2b7f747973a2d) Review URL: https://codereview.chromium.org/2292313002 . Cr-Commit-Position: refs/branch-heads/2840@{#53} Cr-Branched-From: 1ae106dbab4bddd85132d5b75c670794311f4c57-refs/heads/master@{#414607} [modify] https://crrev.com/1194983de5b5fb8ec58bbe47e3574bc95048fc50/content/browser/renderer_host/compositor_impl_android.cc
,
Aug 30 2016
,
Oct 27 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1194983de5b5fb8ec58bbe47e3574bc95048fc50 commit 1194983de5b5fb8ec58bbe47e3574bc95048fc50 Author: Adrienne Walker <enne@chromium.org> Date: Tue Aug 30 20:12:23 2016 Fix jank on Android due to bad scheduler deadline https://codereview.chromium.org/1821863002 changed the display scheduler on Android to use the CompositorImpl ExternalBeginFrameSource directly instead of a synthetic BeginFrameSource that had its vsync parameters set from WindowAndroidCompositor::OnVSync. The frame time given to the display scheduler is always in the past, which made the past logic always give an immediate deadline, causing the display scheduler to run immediately. Fix this by providing a correct deadline in the future. This fixes bad jank where the display scheduler would never give any other compositors time to give frames. So, if the browser compositor was producing frames (such as a loading spinner) then the renderer frames would be extremely janky as they would often miss the immediate (incorrect) deadline. BUG= 613193 R=sievers@chromium.org Review-Url: https://codereview.chromium.org/2275253003 Cr-Commit-Position: refs/heads/master@{#414765} (cherry picked from commit ab408ac5e81ddf37772a9410cec2b7f747973a2d) Review URL: https://codereview.chromium.org/2292313002 . Cr-Commit-Position: refs/branch-heads/2840@{#53} Cr-Branched-From: 1ae106dbab4bddd85132d5b75c670794311f4c57-refs/heads/master@{#414607} [modify] https://crrev.com/1194983de5b5fb8ec58bbe47e3574bc95048fc50/content/browser/renderer_host/compositor_impl_android.cc |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by majidvp@chromium.org
, May 19 2016