Frame metadata latency info has 150k entries, causing Surface::TakeLatencyInfo to spin CPU at 100% |
|||||||||||||||||||||||||||||||||
Issue description
Version 68.0.3397.0 (Official Build) canary (32-bit)
Siggi is noticing his Browser spinning at 100% CPU. This has been happening the last couple of days. The trace shows that browser main thread is spending all its time in Surface::ActivateFrame. We didn't have a good mechanism to sample the process, so Siggi attached a debugger, paused the process, checked the stack of the main thread, and then resumed the process. He repeated this several times and always saw the same code on the stack, so this suggests that we're spinning in the same function.
00 ntdll_770f0000!RtlFreeHeap
01 chrome!base::allocator::WinHeapFree
02 chrome!`anonymous namespace'::DefaultWinHeapFreeImpl
03 chrome!heap_profiling::`anonymous namespace'::HookFree
04 chrome!ShimFree
05 chrome!free
06 chrome!std::_Deallocate
07 chrome!std::allocator<std::pair<std::pair<ui::LatencyComponentType,long long>,ui::LatencyInfo::LatencyComponent> >::deallocate
08 chrome!std::vector<std::pair<std::pair<ui::LatencyComponentType,long long>,ui::LatencyInfo::LatencyComponent>,std::allocator<std::pair<std::pair<ui::LatencyComponentType,long long>,ui::LatencyInfo::LatencyComponent> > >::_Tidy
09 chrome!std::vector<std::pair<std::pair<ui::LatencyComponentType,long long>,ui::LatencyInfo::LatencyComponent>,std::allocator<std::pair<std::pair<ui::LatencyComponentType,long long>,ui::LatencyInfo::LatencyComponent> > >::~vector
0a chrome!base::internal::flat_tree<std::pair<ui::LatencyComponentType,long long>,std::pair<std::pair<ui::LatencyComponentType,long long>,ui::LatencyInfo::LatencyComponent>,base::internal::GetKeyFromValuePairFirst<std::pair<ui::LatencyComponentType,long long>,ui::LatencyInfo::LatencyComponent>,std::less<void> >::Impl::~Impl
0b chrome!base::internal::flat_tree<std::pair<ui::LatencyComponentType,long long>,std::pair<std::pair<ui::LatencyComponentType,long long>,ui::LatencyInfo::LatencyComponent>,base::internal::GetKeyFromValuePairFirst<std::pair<ui::LatencyComponentType,long long>,ui::LatencyInfo::LatencyComponent>,std::less<void> >::~flat_tree
0c chrome!ui::LatencyInfo::~LatencyInfo
0d chrome!std::_Default_allocator_traits<std::allocator<ui::LatencyInfo> >::destroy
0e chrome!std::_Destroy_range1
0f chrome!std::_Destroy_range
10 chrome!std::vector<ui::LatencyInfo,std::allocator<ui::LatencyInfo> >::_Destroy
11 chrome!std::vector<ui::LatencyInfo,std::allocator<ui::LatencyInfo> >::_Change_array
12 chrome!std::vector<ui::LatencyInfo,std::allocator<ui::LatencyInfo> >::emplace_back<const ui::LatencyInfo &>
13 chrome!std::vector<ui::LatencyInfo,std::allocator<ui::LatencyInfo> >::push_back
14 chrome!std::back_insert_iterator<std::vector<ui::LatencyInfo,std::allocator<ui::LatencyInfo> > >::operator=
15 chrome!std::_Copy_unchecked1
16 chrome!std::_Copy_unchecked
17 chrome!std::_Copy_no_deprecate
18 chrome!std::copy
19 chrome!viz::Surface::TakeLatencyInfoFromFrame
1a chrome!viz::Surface::TakeLatencyInfo
1b chrome!viz::Surface::ActivateFrame
1c chrome!viz::Surface::QueueFrame
1d chrome!viz::CompositorFrameSinkSupport::MaybeSubmitCompositorFrame
1e chrome!viz::CompositorFrameSinkSupport::SubmitCompositorFrame
1f chrome!content::RenderWidgetHostViewChildFrame::SubmitCompositorFrame
20 chrome!content::RenderWidgetHostImpl::SubmitCompositorFrame
21 chrome!viz::mojom::CompositorFrameSinkStubDispatch::Accept
22 chrome!viz::mojom::CompositorFrameSinkStub<mojo::RawPtrImplRefTraits<viz::mojom::CompositorFrameSink> >::Accept
23 chrome!mojo::internal::MultiplexRouter::ProcessIncomingMessage
24 chrome!mojo::internal::MultiplexRouter::Accept
25 chrome!mojo::Connector::ReadSingleMessage
26 chrome!mojo::Connector::ReadAllAvailableMessages
27 chrome!mojo::Connector::OnHandleReadyInternal
28 chrome!base::internal::FunctorTraits<void (base::internal::AdaptCallbackForRepeatingHelper<const SkBitmap &>::*)(const SkBitmap &) __attribute__((thiscall)),void>::Invoke
29 chrome!base::internal::InvokeHelper<0,void>::MakeItSo
2a chrome!base::internal::Invoker<base::internal::BindState<void (base::internal::AdaptCallbackForRepeatingHelper<const SkBitmap &>::*)(const SkBitmap &) __attribute__((thiscall)),std::unique_ptr<base::internal::AdaptCallbackForRepeatingHelper<const SkBitmap &>,std::default_delete<base::internal::AdaptCallbackForRepeatingHelper<const SkBitmap &> > > >,void (const SkBitmap &)>::RunImpl
2b chrome!base::internal::Invoker<base::internal::BindState<void (base::internal::AdaptCallbackForRepeatingHelper<const SkBitmap &>::*)(const SkBitmap &) __attribute__((thiscall)),std::unique_ptr<base::internal::AdaptCallbackForRepeatingHelper<const SkBitmap &>,std::default_delete<base::internal::AdaptCallbackForRepeatingHelper<const SkBitmap &> > > >,void (const SkBitmap &)>::Run
2c chrome!base::RepeatingCallback<void (unsigned int)>::Run
2d chrome!mojo::SimpleWatcher::DiscardReadyState
2e chrome!base::internal::FunctorTraits<void (*)(const InlineLoginHandlerImpl::FinishCompleteLoginParams &, Profile *, Profile::CreateStatus),void>::Invoke
2f chrome!base::internal::InvokeHelper<0,void>::MakeItSo
30 chrome!base::internal::Invoker<base::internal::BindState<void (*)(const InlineLoginHandlerImpl::FinishCompleteLoginParams &, Profile *, Profile::CreateStatus),InlineLoginHandlerImpl::FinishCompleteLoginParams>,void (Profile *, Profile::CreateStatus)>::RunImpl
31 chrome!base::internal::Invoker<base::internal::BindState<void (*)(const InlineLoginHandlerImpl::FinishCompleteLoginParams &, Profile *, Profile::CreateStatus),InlineLoginHandlerImpl::FinishCompleteLoginParams>,void (Profile *, Profile::CreateStatus)>::Run
32 chrome!base::RepeatingCallback<void (unsigned int, const mojo::HandleSignalsState &)>::Run
33 chrome!mojo::SimpleWatcher::OnHandleReady
34 chrome!base::internal::FunctorTraits<void (mojo::SimpleWatcher::*)(int, unsigned int, const mojo::HandleSignalsState &) __attribute__((thiscall)),void>::Invoke
35 chrome!base::internal::InvokeHelper<1,void>::MakeItSo
36 chrome!base::internal::Invoker<base::internal::BindState<void (mojo::SimpleWatcher::*)(int, unsigned int, const mojo::HandleSignalsState &) __attribute__((thiscall)),base::WeakPtr<mojo::SimpleWatcher>,int,unsigned int,mojo::HandleSignalsState>,void ()>::RunImpl
37 chrome!base::internal::Invoker<base::internal::BindState<void (mojo::SimpleWatcher::*)(int, unsigned int, const mojo::HandleSignalsState &) __attribute__((thiscall)),base::WeakPtr<mojo::SimpleWatcher>,int,unsigned int,mojo::HandleSignalsState>,void ()>::Run
38 chrome!base::OnceCallback<void ()>::Run
39 chrome!base::debug::TaskAnnotator::RunTask
3a chrome!base::internal::IncomingTaskQueue::RunTask
3b chrome!base::MessageLoop::RunTask
3c chrome!base::MessageLoop::DeferOrRunPendingTask
3d chrome!base::MessageLoop::DoWork
3e chrome!base::MessagePumpForUI::DoRunLoop
3f chrome!base::MessagePumpWin::Run
40 chrome!base::MessageLoop::Run
41 chrome!base::RunLoop::Run
42 chrome!ChromeBrowserMainParts::MainMessageLoopRun
43 chrome!content::BrowserMainLoop::RunMainMessageLoopParts
44 chrome!content::BrowserMainRunnerImpl::Run
45 chrome!content::BrowserMain
46 chrome!content::RunNamedProcessTypeMain
47 chrome!content::ContentMainRunnerImpl::Run
48 chrome!service_manager::Main
49 chrome!content::ContentMain
4a chrome!ChromeMain
4b chrome_exe!MainDllLoader::Launch
4c chrome_exe!wWinMain
4d chrome_exe!invoke_main
4e chrome_exe!__scrt_common_main_seh
4f KERNEL32!BaseThreadInitThunk
50 ntdll_770f0000!__RtlUserThreadStart
51 ntdll_770f0000!_RtlUserThreadStart
Aside: We are profiling heap allocations on siggi's machine [as evidenced by the top of the stack], but this uses sampling [so only profiles a very small number of allocations] and has low overhead. I've confirmed via Task Manager that the Profiling Service is happily chugging along without any issues.
My guess is that somehow, the latency info vector is getting a very large number of entires, causing:
"""
std::copy(frame->metadata.latency_info.begin(),
frame->metadata.latency_info.end(),
std::back_inserter(*latency_info));
"""
to take ~150ms a pop. Luckily, we had heap profiling enabled. So I was able to confirm that the vector has ~150k entries.
,
Apr 18 2018
In this case, it's the renderer compositor submitting 150k LatencyInfos it sounds like, as content::RenderWidgetHostImpl::SubmitCompositorFrame is in that stack.
,
Apr 18 2018
Actually, erikchen:
std::copy(frame->metadata.latency_info.begin(),
frame->metadata.latency_info.end(),
std::back_inserter(*latency_info));
Which vector was 150k here. The |*latency_info|, or the |frame->metadata.latency_info|
,
Apr 18 2018
#3 is on point... In particular I'm not quite sure I follow how accumulated LatencyInfos are ever cleared - Surface::ActivateFrame seems to accumulate from the previous frame to the new frame.
,
Apr 18 2018
I've attached the symbolized heap dump - perhaps you can discern more from it than me. I can't tell whether it's the old or new vector.
,
Apr 18 2018
#4 Once the surface gets aggregated for display the LatencyInfos get copied into the display CompositorFrame where they're processed next. https://cs.chromium.org/chromium/src/components/viz/service/display/surface_aggregator.cc?l=1238&rcl=4b87d1beadcd90546ea8e9590bc64561f06c1534
,
Apr 18 2018
A surface could be offscreen and never aggregated with OOPIF, right, but keep submitting frames?
,
Apr 18 2018
I added a log for the |latency_info| size in Surface::TakeLatencyInfo() and ran with --site-per-process. One of them is steadily climbing, so it appears that's the issue.
,
Apr 18 2018
,
Apr 19 2018
Beyond knowing where LatencyInfo copying happens I don't know what the right fix is. I would say just drop the anything more than N LatencyInfos, but that will probably break telemetry. sadrul you have a better understanding, can you take a look?
,
Apr 19 2018
So I think the root cause of the problem is uncovering why a surface could be offscreen (never contributing to aggregation), and yet submitting CompositorFrames? Any ideas Kyle? I don't know if I'll get to this today but maybe tomorrow if you're stumped. I do think we should add a limit on LatencyInfo too though.
,
Apr 19 2018
This appears to be something that grows over time - my browser runs normally after a restart. After ~6 hours it gets to noticeable slowdown, and after a couple of days it's a total dog.
,
Apr 19 2018
Adding some site isolation folks just so they're also aware of the problem.
,
Apr 19 2018
How does it work if there is an OOPIF just outside the viewport. We still rasterize and submit a CompositorFrame for the OOPIF, along with the embedder including a SurfaceDrawQuad, to avoid checkerboarding on scroll right? At some point, SurfaceAggregator would drop the OOPIF SurfaceDrawQuad from the display CompositorFrame because it's not necessary. Could that cause this? Alternatively an OOPIF or extension could be submitting CompositorFrames when they're not supposed to.
,
Apr 19 2018
I kind of wonder about this block of code: https://cs.chromium.org/chromium/src/components/viz/service/surfaces/surface_manager.cc?type=cs&sq=package:chromium&l=551 We run the draw callback even if the damage is not visible. Should we be doing that? Maybe we can defer the DidReceiveCompositorFrameAck callback until the surface becomes reachable from the root? We'll stick tick BeginFrames but we won't accept any more CompositorFrames UNTIL it becomes reachable. It's not clear to me why what we're doing now is or isn't the right thing to do here.
,
Apr 19 2018
*still tick
,
Apr 19 2018
If we block a surface just outside the viewport it will be stale when it scrolls into view, as kyle was pointing out
,
Apr 19 2018
Well, we won't checkerboard...we'll have a frame but yes, it'll be stale. The cool thing about DidReceiveCompositorFrameAck, is the new frame is ready in the renderer, right? We've already done most of the work, so we'll likely have just one stale frame. I wonder if that's good enough? Maybe surface sync can help here by blocking the parent frame for a BeginFrame? I guess in the short term we should just limit the number of entries of latency info or change how latency info propagation works. Longer term, it would be nice to see if we can avoid pumping out frames if we don't need them.
,
Apr 19 2018
I think adding a limit to how many LatencyInfo we accumulate should be OK, and not break telemetry tests. But I agree with fsamuel@ that it is still very important to understand why it keeps accumulating the way it does currently. Presumably, if the oopif is offscreen, then it wouldn't be receiving any input events (nor screenshot requests). So I don't understand what is adding the LatencyInfo instances to the oopif's compositor-frame.
,
Apr 19 2018
+brianderson@ who might have ideas/thoughts.
,
Apr 20 2018
I'm seeing this, too. I happen to have a browser window in the state described in issue 832538 . Is it possible that fixing that (by reverting r548851) will magically resolve this?
,
Apr 23 2018
I have a similar experience with comment 12, although I don't know if it's the same issue. https://bugs.chromium.org/p/chromium/issues/detail?id=823439 Just to make sure, does anyone get 100% CPU usage on the "Browser" task when looking at the task manager? Does ending/killing the "GPU Process" task somehow magically reset the situation (instead of having to restart the entire browser)?
,
Apr 25 2018
re comment #19: it looks like we do always include a LatencyInfo to a frame even if there was no copy-request/input-event involved in that frame [1]. [1] https://cs.chromium.org/chromium/src/cc/trees/proxy_main.cc?sq=package:chromium&dr=Ss&l=306
,
Apr 25 2018
Issue 827788 looks similar judging by the recently added trace.
,
Apr 25 2018
CL to make sure we don't accumulate more than a limited (100) number of LatencyInfos: https://chromium-review.googlesource.com/c/chromium/src/+/1027432 I would still like to understand the behaviour better though. I was unable to repro with local test cases. kylechar@ what test page were you using when you saw LatencyInfo numbers increasing?
,
Apr 25 2018
$ ./chrome --site-per-process www.slashdot.org I can reproduce pretty consistently if I just do nothing and let the page load.
,
Apr 25 2018
Slashdot: the ultimate site isolation test. It’s never let me down.
,
Apr 26 2018
Comments 25-27: Glad you've got a lead on a repro and potential fix (at least for the short term)! Looks like it might be simple enough to merge to M67 as well? This would be great to resolve if possible. (Also, agreed on understanding the root problem if we can.)
,
Apr 27 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/292bb1c376b1361a30f16cc0ebe63685551ec005 commit 292bb1c376b1361a30f16cc0ebe63685551ec005 Author: Sadrul Habib Chowdhury <sadrul@chromium.org> Date: Fri Apr 27 05:44:21 2018 viz: Limit the amount of LatencyInfo we accumulate. It looks like some clients (OOPIFs, mostly like) are submitting compositor frames to viz, but they are not drawn on screen. As a result, the display compositor keeps accumulating the LatencyInfo associated with the frames. This can grow unbounded. Impose a limit of at most 100 such LatencyInfos being accumulated for a single client. BUG= 834421 Change-Id: I144db30ef2264ea4e9e1b952fe7b1cbcc53ebc46 Reviewed-on: https://chromium-review.googlesource.com/1027432 Reviewed-by: Brian Anderson <brianderson@chromium.org> Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org> Cr-Commit-Position: refs/heads/master@{#554321} [modify] https://crrev.com/292bb1c376b1361a30f16cc0ebe63685551ec005/components/viz/service/surfaces/surface.cc
,
Apr 27 2018
Thanks! If you're able to confirm that it avoids the 100% CPU usage in Monday's Canary (and confirm that it's a safe fix to merge), would you be able to request merge to M67? The cutoff for the next beta build is Monday at 4 pm Pacific. Much appreciated-- this could be a big performance improvement for users, depending on how often the problem was happening in practice.
,
Apr 30 2018
The NextAction date has arrived: 2018-04-30
,
Apr 30 2018
Tested this issue on Windows 10 using chrome-68.0.3415.0 & reported version-68.0.3397.0 as per C#0 & C#26. Steps followed: -------------- 1. Launched chrome with --site-per-process www.slashdot.org through command line 2. Navigated to dev tools 3. network-> start recording & observed the info 4. Right click on time ->Waterfal & select latencey But unable to find the exact issue . Please find the attached screencast for reference & let us know if we miss any steps to verify this issue from our end. Thanks..!
,
Apr 30 2018
sadrul: Can you verify if the fix is working as intended, and request merge to M67 if so? The deadline for this week's build is today at 4 pm Pacific. I tried to verify myself, but I wasn't seeing the 100% CPU usage on Windows Dev Channel (68.0.3409.2) without the fix when visiting https://slashdot.org/ with --site-per-process. Maybe you can post more specific repro steps? Thanks!
,
Apr 30 2018
For repro steps I did the following: 1. Add logging in Surface::TakeLatencyInfo() to print the size of the vector after "taking" the LatencyInfos. 2. Run Chrome with --site-per-process. 3. Load a website that has lots of ads in OOPIFs. 4. See if that std::vector<LatencyInfo> continues to grow after the page is fully loaded. Don't do anything if in that case. Otherwise restart the browser and try again. I had to experiment with different browser window sizes, different websites and different initial positions on the website to reproduce initially. I saw the std::vector<LatencyInfo> size hit 3000 after a couple of minutes in one case on slashdot and was able to reproduce similar results a few more times. It would be a while (hours?) before that started to have a big performance impact though. Trying again the next day I got something similar after 50 tries but accidentally scrolled. Maybe it's dependent on what ads are being loaded?
,
Apr 30 2018
Thanks. I've left it running for most of the morning and don't see a difference between Dev and Canary so far. I don't suppose there's something I could check in chrome://histograms or a trace which would be roughly equivalent to your log statements? If not, we may just want to request merge after ensuring there aren't any regressions. It seems the fix is understood and simple, and could have a huge benefit to long-running sessions in M67 when Site Isolation is enabled.
,
Apr 30 2018
,
Apr 30 2018
This bug requires manual review: M67 has already been promoted to the beta branch, so this requires manual review Please contact the milestone owner if you have questions. Owners: cmasso@(Android), cmasso@(iOS), kbleicher@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Apr 30 2018
At least according to the sampling-profiler output [1], it does look like there is a drop in time-spent in copying ui::LatencyInfos during ActivateFrame(). [1] https://uma.googleplex.com/p/chrome/callstacks/?sid=eb7e27dc9617472082c51fa86adaabea
,
Apr 30 2018
Approving merge to M67 branch 3396 based on comment #35 and per offline chat with creis@. Pls merge ASAP. Thank you.
,
Apr 30 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4ceb626ece56609d84dab12a6a65deaa2a325031 commit 4ceb626ece56609d84dab12a6a65deaa2a325031 Author: Sadrul Habib Chowdhury <sadrul@chromium.org> Date: Mon Apr 30 22:31:33 2018 viz: Limit the amount of LatencyInfo we accumulate. It looks like some clients (OOPIFs, mostly like) are submitting compositor frames to viz, but they are not drawn on screen. As a result, the display compositor keeps accumulating the LatencyInfo associated with the frames. This can grow unbounded. Impose a limit of at most 100 such LatencyInfos being accumulated for a single client. BUG= 834421 Change-Id: I144db30ef2264ea4e9e1b952fe7b1cbcc53ebc46 Reviewed-on: https://chromium-review.googlesource.com/1027432 Reviewed-by: Brian Anderson <brianderson@chromium.org> Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#554321}(cherry picked from commit 292bb1c376b1361a30f16cc0ebe63685551ec005) Reviewed-on: https://chromium-review.googlesource.com/1036483 Reviewed-by: Charlie Reis <creis@chromium.org> Cr-Commit-Position: refs/branch-heads/3396@{#398} Cr-Branched-From: 9ef2aa869bc7bc0c089e255d698cca6e47d6b038-refs/heads/master@{#550428} [modify] https://crrev.com/4ceb626ece56609d84dab12a6a65deaa2a325031/components/viz/service/surfaces/surface.cc
,
Apr 30 2018
,
May 1 2018
Is this supposed to be fixed in 68.0.3415.0? After a day's uptime, my browser is still chewing up a full core.
,
May 1 2018
Is this still caused by LatencyInfo? I’m happy to dig into the root cause if you have repro steps.
,
May 1 2018
Yeah, looks like this should have landed in time for 68.0.3412.0. See https://storage.googleapis.com/chromium-find-releases-static/292.html#292bb1c376b1361a30f16cc0ebe63685551ec005, but whatever was spinning earlier this morning has let up now that I have a debugger attached.
,
May 1 2018
Once you have reliable repro steps, please share and we'll continue investigation! :-)
,
May 1 2018
chrome:tracing may also include useful info (especially if it's still related to ui::LatencyInfo)
,
May 4 2018
siggi@: Can you post a trace? We just got a similar report on Linux in issue 823439 which shows that Surface::ActivateFrame is still a problem in 67.0.3396.30. Sadrul: Would you be able to look again to see if there's another reason these are taking up all of the CPU? It seems potentially important to fix in M67 if possible. Thanks!
,
May 8 2018
I don't have a repro, except to note that this seemed to be accumulating over time. Restarting the browser would clear it, then a few hours later it'd start to get annoying.
,
May 8 2018
Picking this up for further investigation.
,
May 9 2018
This is happening again for me on: Version 68.0.3425.1 (Official Build) canary-dcheck (32-bit). This is after running that build for about 3 hours, reading mail, watching YouTube, etc. Note that this is an official release build we ship with dchecks_always_on=true, so it's conceivable that this is some form of debug-only behavior. As this is a shipping build, I'd still argue that this should be fixed. Here's a canonical stack trace: chrome.dll!`anonymous namespace'::DefaultWinHeapFreeImpl+0xe chrome.dll!heap_profiling::`anonymous namespace'::HookFree+0x2e chrome.dll!free+0x11 chrome.dll!std::vector<std::pair<std::pair<ui::LatencyComponentType,long long>,ui::LatencyInfo::LatencyComponent>,std::allocator<std::pair<std::pair<ui::LatencyComponentType,long long>,ui::LatencyInfo::LatencyComponent> > >::_Tidy+0x49 chrome.dll!ui::LatencyInfo::~LatencyInfo+0xe chrome.dll!std::vector<ui::LatencyInfo,std::allocator<ui::LatencyInfo> >::emplace_back<const ui::LatencyInfo &>+0xf7 chrome.dll!viz::Surface::TakeLatencyInfoFromFrame+0x48 chrome.dll!viz::Surface::ActivateFrame+0x17a chrome.dll!viz::Surface::QueueFrame+0x5e4 chrome.dll!viz::CompositorFrameSinkSupport::MaybeSubmitCompositorFrame+0xa6a chrome.dll!viz::CompositorFrameSinkSupport::SubmitCompositorFrame+0x59 chrome.dll!content::RenderWidgetHostViewChildFrame::SubmitCompositorFrame+0xf2 chrome.dll!content::RenderWidgetHostImpl::SubmitCompositorFrame+0x309 chrome.dll!viz::mojom::CompositorFrameSinkStubDispatch::Accept+0x578 chrome.dll!viz::mojom::CompositorFrameSinkStub<mojo::RawPtrImplRefTraits<viz::mojom::CompositorFrameSink> >::Accept+0x13 chrome.dll!mojo::InterfaceEndpointClient::HandleValidatedMessage+0x221 chrome.dll!mojo::FilterChain::Accept+0x87 chrome.dll!mojo::InterfaceEndpointClient::HandleIncomingMessage+0x6e chrome.dll!mojo::internal::MultiplexRouter::ProcessIncomingMessage+0x2b9 chrome.dll!mojo::internal::MultiplexRouter::Accept+0x151 chrome.dll!mojo::FilterChain::Accept+0x87 chrome.dll!mojo::Connector::ReadSingleMessage+0x154 chrome.dll!mojo::Connector::ReadAllAvailableMessages+0x57 chrome.dll!mojo::Connector::OnHandleReadyInternal+0x82 chrome.dll!base::internal::Invoker<base::internal::BindState<void (base::internal::AdaptCallbackForRepeatingHelper<const SkBitmap &>::*)(const SkBitmap &) __attribute__((thiscall)),std::unique_ptr<base::internal::AdaptCallbackForRepeatingHelper<const SkBi chrome.dll!mojo::SimpleWatcher::DiscardReadyState+0x18 chrome.dll!base::internal::Invoker<base::internal::BindState<void (*)(const InlineLoginHandlerImpl::FinishCompleteLoginParams &, Profile *, Profile::CreateStatus),InlineLoginHandlerImpl::FinishCompleteLoginParams>,void (Profile *, Profile::CreateStatus)>: chrome.dll!mojo::SimpleWatcher::OnHandleReady+0xe4 chrome.dll!base::internal::Invoker<base::internal::BindState<void (mojo::SimpleWatcher::*)(int, unsigned int, const mojo::HandleSignalsState &) __attribute__((thiscall)),base::WeakPtr<mojo::SimpleWatcher>,int,unsigned int,mojo::HandleSignalsState>,void () chrome.dll!base::debug::TaskAnnotator::RunTask+0x138 chrome.dll!base::internal::IncomingTaskQueue::RunTask+0x6d chrome.dll!base::MessageLoop::RunTask+0x20f chrome.dll!base::MessageLoop::DeferOrRunPendingTask+0xa1 chrome.dll!base::MessageLoop::DoWork+0x20a chrome.dll!base::MessagePumpForUI::DoRunLoop+0x78 chrome.dll!base::MessagePumpWin::Run+0x6e chrome.dll!base::MessageLoop::Run+0x78 chrome.dll!base::RunLoop::Run+0xd4 chrome.dll!ChromeBrowserMainParts::MainMessageLoopRun+0xcf chrome.dll!content::BrowserMainLoop::RunMainMessageLoopParts+0x3b chrome.dll!content::BrowserMainRunnerImpl::Run+0x96 chrome.dll!content::BrowserMain+0x9d chrome.dll!content::RunNamedProcessTypeMain+0xee chrome.dll!content::ContentMainRunnerImpl::Run+0x11f chrome.dll!service_manager::Main+0x2a9 chrome.dll!content::ContentMain+0x33 chrome.dll!ChromeMain+0x11e chrome.exe!MainDllLoader::Launch+0x230 chrome.exe!wWinMain+0x54b chrome.exe!__scrt_common_main_seh+0xf6 KERNEL32.DLL!@BaseThreadInitThunk@12+0x24 ntdll.dll!__RtlUserThreadStart+0x2f ntdll.dll!__RtlUserThreadStart@8+0x1b
,
May 9 2018
Re #50: That build should be after sadrul@'s patch to limit the # of elements in |latency_info|, so are we ending up with 1000s of LatencyComponent entries in the |latency_components_| flat_map<> in a single LatencyInfo?
,
May 9 2018
Looks like I have a winner with 146k+ LatencyInfos in a vector:
0:000> dt /r latency_info
Local var @ 0x4f3d214 Type std::vector<ui::LatencyInfo,std::allocator<ui::LatencyInfo> >*
0x04f3d4f8
+0x000 _Mypair : std::_Compressed_pair<std::allocator<ui::LatencyInfo>,std::_Vector_val<std::_Simple_types<ui::LatencyInfo> >,1>
+0x000 _Myval2 : std::_Vector_val<std::_Simple_types<ui::LatencyInfo> >
+0x000 _Myfirst : 0x3c234040 ui::LatencyInfo
+0x004 _Mylast : 0x3cb21d40 ui::LatencyInfo
+0x008 _Myend : 0x3cedc5c0 ui::LatencyInfo
0:000> ? (0x3cb21d40 -0x3c234040 )/40
Evaluate expression: 146292 = 00023b74
,
May 9 2018
Here's the stack trace where I got this, and the info above is from frame 16, TakeLatencyInfoFromFrame: 00 chrome!base::PlatformThread::CurrentRef 01 chrome!base::Lock::CheckUnheldAndMark 02 chrome!base::Lock::Acquire 03 chrome!base::AutoLock::AutoLock 04 chrome!heap_profiling::`anonymous namespace'::SendBuffer::Send 05 chrome!heap_profiling::`anonymous namespace'::DoSend 06 chrome!heap_profiling::AllocatorShimLogFree 07 chrome!heap_profiling::`anonymous namespace'::HookFree 08 chrome!ShimFree 09 chrome!free 0a chrome!std::_Deallocate 0b chrome!std::allocator<std::pair<std::pair<ui::LatencyComponentType,long long>,ui::LatencyInfo::LatencyComponent> >::deallocate 0c chrome!std::vector<std::pair<std::pair<ui::LatencyComponentType,long long>,ui::LatencyInfo::LatencyComponent>,std::allocator<std::pair<std::pair<ui::LatencyComponentType,long long>,ui::LatencyInfo::LatencyComponent> > >::_Tidy 0d chrome!std::vector<std::pair<std::pair<ui::LatencyComponentType,long long>,ui::LatencyInfo::LatencyComponent>,std::allocator<std::pair<std::pair<ui::LatencyComponentType,long long>,ui::LatencyInfo::LatencyComponent> > >::~vector 0e chrome!base::internal::flat_tree<std::pair<ui::LatencyComponentType,long long>,std::pair<std::pair<ui::LatencyComponentType,long long>,ui::LatencyInfo::LatencyComponent>,base::internal::GetKeyFromValuePairFirst<std::pair<ui::LatencyComponentType,long long>,ui::LatencyInfo::LatencyComponent>,std::less<void> >::Impl::~Impl 0f chrome!base::internal::flat_tree<std::pair<ui::LatencyComponentType,long long>,std::pair<std::pair<ui::LatencyComponentType,long long>,ui::LatencyInfo::LatencyComponent>,base::internal::GetKeyFromValuePairFirst<std::pair<ui::LatencyComponentType,long long>,ui::LatencyInfo::LatencyComponent>,std::less<void> >::~flat_tree 10 chrome!ui::LatencyInfo::~LatencyInfo 11 chrome!std::_Default_allocator_traits<std::allocator<ui::LatencyInfo> >::destroy 12 chrome!std::_Destroy_range1 13 chrome!std::_Destroy_range 14 chrome!std::vector<ui::LatencyInfo,std::allocator<ui::LatencyInfo> >::_Destroy 15 chrome!std::vector<ui::LatencyInfo,std::allocator<ui::LatencyInfo> >::clear 16 chrome!viz::Surface::TakeLatencyInfoFromFrame 17 chrome!viz::Surface::TakeLatencyInfo 18 chrome!viz::Surface::ActivateFrame 19 chrome!viz::Surface::QueueFrame 1a chrome!viz::CompositorFrameSinkSupport::MaybeSubmitCompositorFrame 1b chrome!viz::CompositorFrameSinkSupport::SubmitCompositorFrame 1c chrome!content::RenderWidgetHostViewChildFrame::SubmitCompositorFrame 1d chrome!content::RenderWidgetHostImpl::SubmitCompositorFrame 1e chrome!viz::mojom::CompositorFrameSinkStubDispatch::Accept 1f chrome!viz::mojom::CompositorFrameSinkStub<mojo::RawPtrImplRefTraits<viz::mojom::CompositorFrameSink> >::Accept 20 chrome!mojo::InterfaceEndpointClient::HandleValidatedMessage 21 chrome!mojo::FilterChain::Accept 22 chrome!mojo::InterfaceEndpointClient::HandleIncomingMessage 23 chrome!mojo::internal::MultiplexRouter::ProcessIncomingMessage 24 chrome!mojo::internal::MultiplexRouter::Accept 25 chrome!mojo::FilterChain::Accept 26 chrome!mojo::Connector::ReadSingleMessage 27 chrome!mojo::Connector::ReadAllAvailableMessages 28 chrome!mojo::Connector::OnHandleReadyInternal 29 chrome!base::internal::FunctorTraits<void (base::internal::AdaptCallbackForRepeatingHelper<const SkBitmap &>::*)(const SkBitmap &) __attribute__((thiscall)),void>::Invoke 2a chrome!base::internal::InvokeHelper<0,void>::MakeItSo 2b chrome!base::internal::Invoker<base::internal::BindState<void (base::internal::AdaptCallbackForRepeatingHelper<const SkBitmap &>::*)(const SkBitmap &) __attribute__((thiscall)),std::unique_ptr<base::internal::AdaptCallbackForRepeatingHelper<const SkBitmap &>,std::default_delete<base::internal::AdaptCallbackForRepeatingHelper<const SkBitmap &> > > >,void (const SkBitmap &)>::RunImpl 2c chrome!base::internal::Invoker<base::internal::BindState<void (base::internal::AdaptCallbackForRepeatingHelper<const SkBitmap &>::*)(const SkBitmap &) __attribute__((thiscall)),std::unique_ptr<base::internal::AdaptCallbackForRepeatingHelper<const SkBitmap &>,std::default_delete<base::internal::AdaptCallbackForRepeatingHelper<const SkBitmap &> > > >,void (const SkBitmap &)>::Run 2d chrome!base::RepeatingCallback<void (unsigned int)>::Run 2e chrome!mojo::SimpleWatcher::DiscardReadyState 2f chrome!base::internal::FunctorTraits<void (*)(const InlineLoginHandlerImpl::FinishCompleteLoginParams &, Profile *, Profile::CreateStatus),void>::Invoke 30 chrome!base::internal::InvokeHelper<0,void>::MakeItSo 31 chrome!base::internal::Invoker<base::internal::BindState<void (*)(const InlineLoginHandlerImpl::FinishCompleteLoginParams &, Profile *, Profile::CreateStatus),InlineLoginHandlerImpl::FinishCompleteLoginParams>,void (Profile *, Profile::CreateStatus)>::RunImpl 32 chrome!base::internal::Invoker<base::internal::BindState<void (*)(const InlineLoginHandlerImpl::FinishCompleteLoginParams &, Profile *, Profile::CreateStatus),InlineLoginHandlerImpl::FinishCompleteLoginParams>,void (Profile *, Profile::CreateStatus)>::Run 33 chrome!base::RepeatingCallback<void (unsigned int, const mojo::HandleSignalsState &)>::Run 34 chrome!mojo::SimpleWatcher::OnHandleReady 35 chrome!base::internal::FunctorTraits<void (mojo::SimpleWatcher::*)(int, unsigned int, const mojo::HandleSignalsState &) __attribute__((thiscall)),void>::Invoke 36 chrome!base::internal::InvokeHelper<1,void>::MakeItSo 37 chrome!base::internal::Invoker<base::internal::BindState<void (mojo::SimpleWatcher::*)(int, unsigned int, const mojo::HandleSignalsState &) __attribute__((thiscall)),base::WeakPtr<mojo::SimpleWatcher>,int,unsigned int,mojo::HandleSignalsState>,void ()>::RunImpl 38 chrome!base::internal::Invoker<base::internal::BindState<void (mojo::SimpleWatcher::*)(int, unsigned int, const mojo::HandleSignalsState &) __attribute__((thiscall)),base::WeakPtr<mojo::SimpleWatcher>,int,unsigned int,mojo::HandleSignalsState>,void ()>::Run 39 chrome!base::OnceCallback<void ()>::Run 3a chrome!base::debug::TaskAnnotator::RunTask 3b chrome!base::internal::IncomingTaskQueue::RunTask 3c chrome!base::MessageLoop::RunTask 3d chrome!base::MessageLoop::DeferOrRunPendingTask 3e chrome!base::MessageLoop::DoWork 3f chrome!base::MessagePumpForUI::DoRunLoop 40 chrome!base::MessagePumpWin::Run 41 chrome!base::MessageLoop::Run 42 chrome!base::RunLoop::Run 43 chrome!ChromeBrowserMainParts::MainMessageLoopRun 44 chrome!content::BrowserMainLoop::RunMainMessageLoopParts 45 chrome!content::BrowserMainRunnerImpl::Run 46 chrome!content::BrowserMain 47 chrome!content::RunNamedProcessTypeMain 48 chrome!content::ContentMainRunnerImpl::Run 49 chrome!service_manager::Main 4a chrome!content::ContentMain 4b chrome!ChromeMain 4c chrome_exe!MainDllLoader::Launch 4d chrome_exe!wWinMain 4e chrome_exe!invoke_main 4f chrome_exe!__scrt_common_main_seh 50 KERNEL32!BaseThreadInitThunk 51 ntdll!__RtlUserThreadStart 52 ntdll!_RtlUserThreadStart
,
May 9 2018
Passing back to sadrul@ as his CL doesn't fix the issue.
,
May 9 2018
Tentatively marking as Proj-SiteIsolation-LaunchBlocking, because while the bug seems rare, when it does happen the effects are rather severe.
,
May 9 2018
And now this just cleared up, as I was trying to debug it. I have a full process dump of the bad state, in case that helps.
,
May 9 2018
Well, I seem to have captured whatever this is about in the dump. Witness:
0:000> .frame
18 04f3d460 67b85f54 chrome!viz::Surface::ActivateFrame+0x17a [C:\b\c\b\win_asan\src\components\viz\service\surfaces\surface.cc @ 316]
0:000> dt frame_data frame.metadata.latency_info...
Local var @ 0x4f3d468 Type viz::Surface::FrameData
+0x000 frame :
+0x000 metadata :
+0x090 latency_info :
+0x000 _Mypair :
+0x000 _Myval2 :
+0x000 _Myfirst : 0x3c234040 ui::LatencyInfo
+0x004 _Mylast : 0x3cb21d40 ui::LatencyInfo
+0x008 _Myend : 0x3cedc5c0 ui::LatencyInfo
+0x118 frame_index : 0x11dbc
LMK if you want it - this is a full memory dump of Chrome running in my personal profile, so I won't share it widely.
,
May 9 2018
Taking a look inside the sampling profiler averages for viz::Surface::TakeLatencyInfoFromFrame in 68.0.3420.0 win canary: of the profiles we captured executing within the function, the top 15% are spending >85% of their time executing it. So we're definitely seeing this behavior in the wild. The effect is there on Mac too, but smaller: the top 15% are spending >37% of their time executing the function.
,
May 9 2018
OK. It looks like the accumulation of the LatencyInfo instances are happening on the client side. So the deserialization is also taking a fair amount of time. I will change the code to validate things on the client-side so it doesn't accumulate unbounded (and drop incoming messages in the service-side if it includes too many instances). [1] https://uma.googleplex.com/p/chrome/callstacks/?sid=4b6a2738a05fd22d29b0f213c0b0646e
,
May 10 2018
Hm, we already discard the latency-infos if a client sends too many: https://cs.chromium.org/chromium/src/components/viz/service/frame_sinks/compositor_frame_sink_support.cc?type=cs&sq=package:chromium&l=292
,
May 10 2018
Put up https://chromium-review.googlesource.com/c/chromium/src/+/1053793 with a speculative fix, and some debug dumps to help understand this better.
,
May 11 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/92e0446435896425805d05af8a91072263223d68 commit 92e0446435896425805d05af8a91072263223d68 Author: Sadrul Habib Chowdhury <sadrul@chromium.org> Date: Fri May 11 20:05:26 2018 latency-info: Add a speculative fix. Add a speculative fix in SurfaceAggregator::Aggregate() to make sure it doesn't accumulate too many LatencyInfo objects. BUG= 834421 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel Change-Id: Ifd41d901050a954279b642cab972ca6ec6d560e2 Reviewed-on: https://chromium-review.googlesource.com/1053793 Reviewed-by: Charlie Reis <creis@chromium.org> Reviewed-by: Fady Samuel <fsamuel@chromium.org> Commit-Queue: Fady Samuel <fsamuel@chromium.org> Cr-Commit-Position: refs/heads/master@{#557992} [modify] https://crrev.com/92e0446435896425805d05af8a91072263223d68/components/viz/service/display/surface_aggregator.cc
,
May 11 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/603b405c64ee17cd1db7d001a23e6dc88e39a77b commit 603b405c64ee17cd1db7d001a23e6dc88e39a77b Author: Sadrul Habib Chowdhury <sadrul@chromium.org> Date: Fri May 11 20:09:19 2018 latency-info: Add a couple of investigative dumps. Add a couple of investigative dumps to understand better where and how a large number of LatencyInfo objects are accumulating. BUG= 834421 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel Change-Id: Ieea6e47c4a1acf5c89384048e19b742e3e21e5a8 Reviewed-on: https://chromium-review.googlesource.com/1054153 Reviewed-by: Fady Samuel <fsamuel@chromium.org> Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org> Cr-Commit-Position: refs/heads/master@{#557994} [modify] https://crrev.com/603b405c64ee17cd1db7d001a23e6dc88e39a77b/cc/trees/layer_tree_host_impl.cc [modify] https://crrev.com/603b405c64ee17cd1db7d001a23e6dc88e39a77b/components/viz/service/display/surface_aggregator.cc
,
May 11 2018
,
May 14 2018
The NextAction date has arrived: 2018-05-14
,
May 14 2018
siggi@: Any chance you can try out 68.0.3428.0 or later to see if it helps today? Any signals you can offer will help as we decide about merging r557992 to M67. Thanks!
,
May 14 2018
I run the days Canary every day, today is 3430. I've never been able to find repro steps. This seems to happen fairly randomly, and seems to clear up after a while, so I don't know how I could verify a fix one way or the other.
,
May 14 2018
Sure. Just let us know if you happen to see the problem today on 3430. Thanks!
,
May 14 2018
According to sampling profiler data the issue is still present in 68.0.3428.0 canary, but appears to be somewhat decreased in severity on Windows. Here's percentage of execution time by vingtile in viz::Surface::TakeLatencyInfoFromFrame, for profiles that saw at least one stack sample in this function. 68.0.3420.0: Win 0,0,0,0,0,0,0,0,0,0,0,0,0,1,9,12,39,86,94,97,99 Mac 0,0,0,0,0,0,0,1,1,2,2,2,2,4,5,5,38,39,39,44,44 68.0.3428.0: Win 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,3,25,59,87,100 Mac 0,0,0,0,0,0,0,0,0,1,2,4,4,7,11,11,23,33,36,40,69 On Windows we went from the top 15% of clients seeing >86% of execution time in this function to the top 5%. On Mac things look very slightly better, although the worst case is worse.
,
May 14 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f9a2abbeaf4d221540e5888dc4fbd8239675698a commit f9a2abbeaf4d221540e5888dc4fbd8239675698a Author: Sadrul Chowdhury <sadrul@chromium.org> Date: Mon May 14 22:22:11 2018 Revert "latency-info: Add a couple of investigative dumps." This reverts commit 603b405c64ee17cd1db7d001a23e6dc88e39a77b. Reason for revert: We have collected the dumps. So not needed anymore. Original change's description: > latency-info: Add a couple of investigative dumps. > > Add a couple of investigative dumps to understand better where and how > a large number of LatencyInfo objects are accumulating. > > BUG= 834421 > > Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel > Change-Id: Ieea6e47c4a1acf5c89384048e19b742e3e21e5a8 > Reviewed-on: https://chromium-review.googlesource.com/1054153 > Reviewed-by: Fady Samuel <fsamuel@chromium.org> > Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org> > Cr-Commit-Position: refs/heads/master@{#557994} TBR=sadrul@chromium.org,fsamuel@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: 834421 , 842505 Change-Id: I1acbb8d522cb12606071af82d40b96b2b4626d27 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel Reviewed-on: https://chromium-review.googlesource.com/1057362 Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org> Reviewed-by: Sadrul Chowdhury <sadrul@chromium.org> Cr-Commit-Position: refs/heads/master@{#558491} [modify] https://crrev.com/f9a2abbeaf4d221540e5888dc4fbd8239675698a/cc/trees/layer_tree_host_impl.cc [modify] https://crrev.com/f9a2abbeaf4d221540e5888dc4fbd8239675698a/components/viz/service/display/surface_aggregator.cc
,
May 15 2018
Requesting merge of crrev.com/557992 to M67 (the dumps that were reverted show that the fix should help with some cases).
,
May 15 2018
Approving merge to M67 branch 3396 for crrev.com/557992 based on comment #72.
,
May 15 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f04f2e48b7db8730f04d75e750437ef0028b8df0 commit f04f2e48b7db8730f04d75e750437ef0028b8df0 Author: Sadrul Habib Chowdhury <sadrul@chromium.org> Date: Tue May 15 01:02:12 2018 latency-info: Add a speculative fix. Add a speculative fix in SurfaceAggregator::Aggregate() to make sure it doesn't accumulate too many LatencyInfo objects. BUG= 834421 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel Change-Id: Ifd41d901050a954279b642cab972ca6ec6d560e2 Reviewed-on: https://chromium-review.googlesource.com/1053793 Reviewed-by: Charlie Reis <creis@chromium.org> Reviewed-by: Fady Samuel <fsamuel@chromium.org> Commit-Queue: Fady Samuel <fsamuel@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#557992}(cherry picked from commit 92e0446435896425805d05af8a91072263223d68) Reviewed-on: https://chromium-review.googlesource.com/1058288 Reviewed-by: Sadrul Chowdhury <sadrul@chromium.org> Cr-Commit-Position: refs/branch-heads/3396@{#601} Cr-Branched-From: 9ef2aa869bc7bc0c089e255d698cca6e47d6b038-refs/heads/master@{#550428} [modify] https://crrev.com/f04f2e48b7db8730f04d75e750437ef0028b8df0/components/viz/service/display/surface_aggregator.cc
,
May 15 2018
Thanks for the merge, Sadrul! Sorry to keep asking, but given comment 70, can you take another look? :) Sounds like there's still more ways for this to happen. (Sounds like issue 823439 might not be solved either.) Ideally we'd get another CL landed this week with the chance to merge to M67.
,
May 16 2018
Looks like this is happening again on my Version 68.0.3432.0 (Official Build) canary (64-bit). Hard to say, though as it looks like the symbols are harfed for that version :(.
,
May 17 2018
Sadrul, can you post an update when you get a chance? It sounds like you might have had a lead on the remaining issues, which is great. If we can get it landed today or tomorrow, it can bake over the weekend and might get merged to M67. Thanks!
,
May 17 2018
Thanks to sadrul@'s patch I now have a reliable repro: 1. Visit slashdot.org 2. Keep resizing the browser. 3. LatencyInfo grows without bound. I'll have a fix shortly.
,
May 17 2018
Fantastic! Thanks Fady!
,
May 17 2018
I have a fix. Works fine locally. Just writing a unit test. Will be ready for review in minutes.
,
May 17 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d1564e4226387f17c349e4d8a20cbd3ff8a08dc5 commit d1564e4226387f17c349e4d8a20cbd3ff8a08dc5 Author: Fady Samuel <fsamuel@chromium.org> Date: Thu May 17 20:57:05 2018 Surface synchronization: Prevent LatencyInfo from accumulating across surfaces The LatencyInfo vector in CompositorFrameMetadata can accumulate without bound when allocating new surfaces. In order to track latency, we carry LatencyInfo objects across surfaces. The new surface grabs the LatencyInfo objects from the previous surface. We used to check if the new Surface has too many LatencyInfos and if not, grab all LatencyInfos from the previous Surface and we don't check again. The previous Surface might have a large number of LatencyInfo objects accumulated, and will accumulate further with every new surface allocated for a given FrameSink. The fix is to verify the LatencyInfo vector AFTER taking the items from the previous surface. Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel Change-Id: I6814fd3657b7b0a9e9d275013e661b176095e740 Bug: 834421 , 672962 Reviewed-on: https://chromium-review.googlesource.com/1064810 Commit-Queue: Fady Samuel <fsamuel@chromium.org> Reviewed-by: Sadrul Chowdhury <sadrul@chromium.org> Cr-Commit-Position: refs/heads/master@{#559668} [modify] https://crrev.com/d1564e4226387f17c349e4d8a20cbd3ff8a08dc5/components/viz/service/frame_sinks/surface_synchronization_unittest.cc [modify] https://crrev.com/d1564e4226387f17c349e4d8a20cbd3ff8a08dc5/components/viz/service/surfaces/surface.cc [modify] https://crrev.com/d1564e4226387f17c349e4d8a20cbd3ff8a08dc5/components/viz/test/compositor_frame_helpers.cc [modify] https://crrev.com/d1564e4226387f17c349e4d8a20cbd3ff8a08dc5/components/viz/test/compositor_frame_helpers.h
,
May 17 2018
,
May 17 2018
Thank you fsamuel@. Pls update the bug with canary result tomorrow for CL listed at #81.
,
May 17 2018
This bug requires manual review: We are only 11 days from stable. Please contact the milestone owner if you have questions. Owners: cmasso@(Android), cmasso@(iOS), kbleicher@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
May 18 2018
The NextAction date has arrived: 2018-05-18
,
May 18 2018
The latest canary build incorporates this change: 68.0.3434.0. Everything seems find and #81 is very low risk (very small change), so I'm requesting merge approval to M67. This is a site isolation launch blocker! Thanks!
,
May 18 2018
Approving merge to M67 branch 3396 based on comment #86. Pls merge on Monday morning after weekend canary coverage if change continue to look good in canary. Thank you.
,
May 19 2018
wittman@: If you get a chance, can you check the sampling profiler data as you did in comment 70? Would be great to help confirm if the issues is resolved after r559668 (68.0.3434.0).
,
May 21 2018
The NextAction date has arrived: 2018-05-21
,
May 21 2018
This has stuck around for a few days so I'm going to merge it back to M67.
,
May 21 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/014cb0fc75c9c26af2c1645ae49d13c8e52b7c44 commit 014cb0fc75c9c26af2c1645ae49d13c8e52b7c44 Author: Fady Samuel <fsamuel@chromium.org> Date: Mon May 21 15:51:13 2018 Surface synchronization: Prevent LatencyInfo from accumulating across surfaces The LatencyInfo vector in CompositorFrameMetadata can accumulate without bound when allocating new surfaces. In order to track latency, we carry LatencyInfo objects across surfaces. The new surface grabs the LatencyInfo objects from the previous surface. We used to check if the new Surface has too many LatencyInfos and if not, grab all LatencyInfos from the previous Surface and we don't check again. The previous Surface might have a large number of LatencyInfo objects accumulated, and will accumulate further with every new surface allocated for a given FrameSink. The fix is to verify the LatencyInfo vector AFTER taking the items from the previous surface. Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel Change-Id: I6814fd3657b7b0a9e9d275013e661b176095e740 Bug: 834421 , 672962 Reviewed-on: https://chromium-review.googlesource.com/1064810 Commit-Queue: Fady Samuel <fsamuel@chromium.org> Reviewed-by: Sadrul Chowdhury <sadrul@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#559668}(cherry picked from commit d1564e4226387f17c349e4d8a20cbd3ff8a08dc5) Reviewed-on: https://chromium-review.googlesource.com/1067056 Reviewed-by: Fady Samuel <fsamuel@chromium.org> Cr-Commit-Position: refs/branch-heads/3396@{#659} Cr-Branched-From: 9ef2aa869bc7bc0c089e255d698cca6e47d6b038-refs/heads/master@{#550428} [modify] https://crrev.com/014cb0fc75c9c26af2c1645ae49d13c8e52b7c44/components/viz/service/frame_sinks/surface_synchronization_unittest.cc [modify] https://crrev.com/014cb0fc75c9c26af2c1645ae49d13c8e52b7c44/components/viz/service/surfaces/surface.cc [modify] https://crrev.com/014cb0fc75c9c26af2c1645ae49d13c8e52b7c44/components/viz/test/compositor_frame_helpers.cc [modify] https://crrev.com/014cb0fc75c9c26af2c1645ae49d13c8e52b7c44/components/viz/test/compositor_frame_helpers.h
,
May 21 2018
The sampling profiler data looks good in 68.0.3434.0 (per the same analysis as in #70): Win 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2 Mac 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1 Per-client execution percentages under ~4% are not necessarily demonstrative of actual execution time, so we can reasonably conclude from this that we're now spending negligible execution time in this function for all users.
,
May 21 2018
Comment 92: Thanks! That's awesome to hear. And thanks for merging it, Fady! |
|||||||||||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||||||||||
Comment 1 by danakj@chromium.org
, Apr 18 2018