Issue metadata
Sign in to add a comment
|
1.6%-234.4% regression in rendering.mobile at 587172:587302 |
||||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Aug 30
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/101acd99640000
,
Aug 30
📍 Found a significant difference after 1 commit. https://pinpoint-dot-chromeperf.appspot.com/job/101acd99640000 Android OOP-D: Enable OOP-D for Android Waterfall by ericrk@chromium.org https://chromium.googlesource.com/chromium/src/+/86da5de5d854b46ef4e85bbef645a1c0866ed13b 62.97 → 199.6 (+136.6) Understanding performance regressions: http://g.co/ChromePerformanceRegressions
,
Sep 10
Hi All, I'm looking into the regressions seen here: https://bugs.chromium.org/p/chromium/issues/detail?id=879207 The largest regression first gesture input latency regression on nyc_gov_scroll_2018. Here's a before/after trace: before: https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/nyc_gov_scroll_2018_2018-08-29_06-34-33_37905.html after: https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/nyc_gov_scroll_2018_2018-08-30_05-21-12_29671.html Looking at these traces, we see some interesting behavior: Before OOP-D, the first input event is fast, then we get backed up on the second. See before_oop_d.png After OOP-D we immediately get backed up. See after_oop_d.png So in both cases, we get really backed up at the beginning of the scroll. However, in the non-OOP-D case we get one frame in, *then* get backed up 9 gestures deep, while in the OOP-D case we immediately get backed up 12 gestures deep. Not sure how meaningful of a real-world regression this is, as we get pretty backed up in both cases. However, this isn't really the most interesting part. The weird part is that, the entire time the scroll gesture is going on, we seem to be pumping frames - the display is actually drawing / swapping. Additionally, no thread seems to be backed up. See after_frame_production.png. I'm not really sure how we can pump so many frames without processing these input events' associated latency info... unless we somehow queue up a lot of frames right before the input event starts... but I don't see evidence of this. Am taking a pinpoint with additional trace categories to try to narrow down the root cause: https://pinpoint-dot-chromeperf.appspot.com/job/14c1371d640000
,
Sep 10
I'm not an input event expert but what happens if events get coalesced?
,
Sep 11
,
Sep 11
When we coalesce two events we report one less data point. The new latency we report is also the result of coalescing the latency infos. This will be the difference from the time that the new coalesced event is handled (depends on the checkpoint this has different meaning) and the earliest event creation time.
,
Sep 11
And when do we decide to coalesce? One possible explanation here is with OOP-D, the timing shifts such that we coalesce the first event into subsequent events whereas we don't without OOP-D and that reports as a regression. If that is the case, is this really a regression?
,
Sep 11
We always coalesce if the pipeline is busy. For example I believe in OOPIF if the previous event hasn't gone through (like finding the target frames and whatnot) there is coalescing/queuing there. Obviously only the coalescable events get coalesced and the rest of them get queued. https://cs.chromium.org/chromium/src/ui/events/blink/blink_event_util.cc?type=cs&sq=package:chromium&g=0&l=510 Another example is in the renderer when main thread is busy (or for rAF aligning) and compositor coalesces events in the queue until main thread picks them up. In general coalescing does make the metrics worse (if it is caused by the rest of the pipeline busy on the previous task) and it is kind of regression. If you look at the worst case scenarios (like high percentiles) we are essentially handling the first event in that coalesced pack later compared to when things weren't busy and we could avoid coalescing.
,
Sep 11
The InputLatency::GestureScrollUpdate traces have the full set of LatencyInfo components and associated times. The events have different DISPLAY_COMPOSITOR_RECEIVED_FRAME_COMPONENT, so that means they weren't coalesced, since they arrived in different CompositorFrames. All the INPUT_EVENT_GPU_SWAP_BUFFER_COMPONENT and INPUT_EVENT_LATENCY_FRAME_SWAP_COMPONENT times I looked at were the same. This seems like either a surface sync issue or bug in display compositor code. For surface sync if the the newer renderer CompositorFrames aren't being used during aggregation I'd expect to see this. Alternatively there could be a bug with what LatencyInfo objects are being recorded in traces after a particular swap ack?
,
Sep 11
Hmm, I originally dismissed this being a surface sync issue because I turned on surface sync separately a few days prior to turning on OOP-D but with that said, it's possible a slight timing perturbation could put us in a not-so-great state timing wise. With surface sync we will keep pumping frames from the renderer even if they're not embedded by the browser. We do not throttle frame production child-side. Perhaps that's the issue?
,
Sep 12
,
Sep 13
,
Sep 13
,
Sep 18
,
Sep 18
Issue 883756 has been merged into this issue.
,
Sep 19
Ok, figured out the issue here... could use some Viz expertise to understand the best solution (or to help with a fix): Basically, this is the combination of two things: First, when Viz receives a frame, it checks if that frame causes damage. If not, it immediately acks it. This has odd interaction when the renderer frame arrives before the browser frame that embeds it. In the case we're seeing here, the renderer frame arrives before the browser frame, detects no damage, and immediately acks. This means that the renderer can get an arbitrary number of frames ahead of the browser. Eventually, a browser frame comes in for each renderer frame and all are drawn (none skipped), but in traces I've looked at we've had the renderer 3 frames ahead of the browser. Second, if we get a new renderer frame for a given FrameSinkId, we coalesce the latency infos from the old frame into the new one. I assume this is under the assumption that the old frame will never be drawn. (this code I believe: https://cs.chromium.org/chromium/src/components/viz/service/surfaces/surface.cc?rcl=65181316897fd37d04cfcb07245159597556f39a&l=418) When these two things combine, we end up in a situation where we keep coalescing latency info from frame A into frame B, before A is drawn. A is then drawn, but has no latency info. It seems like this could be fixed in two ways: 1) Assume that the renderer *can* get ahead of the browser in frame submission, and delay coalescing latency infos *until* a frame is actually drawn, rather than at activation time. 2) Prevent the renderer from getting ahead of the browser by avoiding ACKing the renderer frame when a browser frame for it may be pending (this seems tricky, not sure how to handle this best). WDYT?
,
Sep 19
Over to fsamuel@ for a fix - will split out any non input-latency regressions into a new bug. My feeling is that the user impact is low - this is mainly a reporting issue, so a fix in 71 seems fine.
,
Sep 20
I believe I have a fix in progress. I'm running a pinpoint job to be sure: https://chromium-review.googlesource.com/c/chromium/src/+/1236931
,
Sep 21
+piman@: FYI: My solution is 2. The renderer can get ahead of the browser in frame submission: child-initiated synchronization events such as top bar controls scrolling. I am throttling the child on the parent for child-initiated synchronizations.
,
Oct 2
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/12ab5d2ce40000
,
Oct 2
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/16923b72e40000
,
Oct 2
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/165cb828e40000
,
Oct 3
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/1654c530e40000
,
Oct 3
I've attached some local telemetry results with an upcoming patch to throttle child frames. It makes a substantial improvement.
,
Oct 3
This will be addressed in M71
,
Oct 3
😿 Pinpoint job stopped with an error. https://pinpoint-dot-chromeperf.appspot.com/job/12ab5d2ce40000 The swarming task expired. The bots are likely overloaded, dead, or misconfigured.
,
Oct 3
😿 Pinpoint job stopped with an error. https://pinpoint-dot-chromeperf.appspot.com/job/16923b72e40000 The swarming task expired. The bots are likely overloaded, dead, or misconfigured.
,
Oct 3
😿 Pinpoint job stopped with an error. https://pinpoint-dot-chromeperf.appspot.com/job/165cb828e40000 The swarming task expired. The bots are likely overloaded, dead, or misconfigured.
,
Oct 3
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f96f0ea3b55efe11726a463a08e34cee8de57686 commit f96f0ea3b55efe11726a463a08e34cee8de57686 Author: Fady Samuel <fsamuel@chromium.org> Date: Wed Oct 03 18:49:09 2018 Surface Synchronization: Throttle child synchronization events This CL enables throttling of child synchronization events. In a child- initiated synchronization, the child's frame will likely activate before the dependent parent's CompositorFrame arrives. Upon activation, the child will see that the new surface does not cause any damage to the display (because it's not embedded by any parent yet) and calls the draw callback immediately, thereby sending back a DidReceiveCompositorFrameAck to the client immediately. This, in turn, results in the client sending another CompositorFrame to Viz, and another and so on, letting the child get further and further ahead of the parent. This results in a lot of wasted work because the child's surface will not be shown on screen until the parent embeds it and so multiple child initiated synchronization events are pointless. This CL throttles DidReceiveCompositorFrameAck of the child on child-initiated synchronization events until the parent references the child. Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel Change-Id: I62416d87f8d871b32f4caad565cff515a9c55445 Bug: 672962 , 879207 Reviewed-on: https://chromium-review.googlesource.com/1236931 Commit-Queue: Fady Samuel <fsamuel@chromium.org> Reviewed-by: Saman Sami <samans@chromium.org> Cr-Commit-Position: refs/heads/master@{#596300} [modify] https://crrev.com/f96f0ea3b55efe11726a463a08e34cee8de57686/components/viz/service/frame_sinks/compositor_frame_sink_support.cc [modify] https://crrev.com/f96f0ea3b55efe11726a463a08e34cee8de57686/components/viz/service/frame_sinks/compositor_frame_sink_support.h [modify] https://crrev.com/f96f0ea3b55efe11726a463a08e34cee8de57686/components/viz/service/frame_sinks/compositor_frame_sink_support_unittest.cc [modify] https://crrev.com/f96f0ea3b55efe11726a463a08e34cee8de57686/components/viz/service/frame_sinks/surface_synchronization_unittest.cc [modify] https://crrev.com/f96f0ea3b55efe11726a463a08e34cee8de57686/components/viz/service/surfaces/surface.cc [modify] https://crrev.com/f96f0ea3b55efe11726a463a08e34cee8de57686/components/viz/service/surfaces/surface.h [modify] https://crrev.com/f96f0ea3b55efe11726a463a08e34cee8de57686/components/viz/service/surfaces/surface_dependency_tracker.cc [modify] https://crrev.com/f96f0ea3b55efe11726a463a08e34cee8de57686/components/viz/service/surfaces/surface_dependency_tracker.h [modify] https://crrev.com/f96f0ea3b55efe11726a463a08e34cee8de57686/components/viz/service/surfaces/surface_manager.cc [modify] https://crrev.com/f96f0ea3b55efe11726a463a08e34cee8de57686/components/viz/service/surfaces/surface_manager.h
,
Oct 4
😿 Pinpoint job stopped with an error. https://pinpoint-dot-chromeperf.appspot.com/job/1654c530e40000 The swarming task expired. The bots are likely overloaded, dead, or misconfigured.
,
Oct 4
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6f72fd2c91cfbe156601a2e74acdb873187d0d36 commit 6f72fd2c91cfbe156601a2e74acdb873187d0d36 Author: Fady Samuel <fsamuel@chromium.org> Date: Thu Oct 04 18:00:14 2018 Surface Synchronization: Allow out-of-order activation With child-initiated synchronization throttling, it's possible for frames from a given client to activate out-of-order. For example, a child-initiated synchronization that is blocked on the parent may be superceded by a parent-initiated synchronization. That parent-inititaed surface will activate first, before the child- initiated one, causing surfaces to activate out of order. That's okay! This CL fixes some CHECKs and marks older surfaces at activation time as candidates for garbage collection. Bug: 672962 , 879207 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel Change-Id: I19c033c6400f467b29658c706556a99c5eaf0791 Reviewed-on: https://chromium-review.googlesource.com/c/1260019 Reviewed-by: Saman Sami <samans@chromium.org> Commit-Queue: Fady Samuel <fsamuel@chromium.org> Cr-Commit-Position: refs/heads/master@{#596750} [modify] https://crrev.com/6f72fd2c91cfbe156601a2e74acdb873187d0d36/components/viz/service/frame_sinks/compositor_frame_sink_support.cc [modify] https://crrev.com/6f72fd2c91cfbe156601a2e74acdb873187d0d36/components/viz/service/frame_sinks/surface_synchronization_unittest.cc
,
Nov 15
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Aug 30