New issue
Advanced search Search tips

Issue 888580 link

Starred by 1 user

Issue metadata

Status: Assigned
Merged: issue 125248
Owner:
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Potential jank caused by BrowserGpuChannelHostFactory::EstablishRequest::Wait

Project Member Reported by etienneb@chromium.org, Sep 24

Issue description


Reports:
  0dd81906f5266cbe  (7 seconds jank)
  ffe6b15d69f7407a  (6.5 seconds)
  59edf789b46ac9ae  (6.3 seconds)

This issue seems to point to:

https://cs.chromium.org/chromium/src/content/browser/gpu/browser_gpu_channel_host_factory.cc?l=218

    // We're blocking the UI thread, which is generally undesirable.
    // In this case we need to wait for this before we can show any UI
    // /anyway/, so it won't cause additional jank.
    // TODO(piman): Make this asynchronous (http://crbug.com/125248).


Stackframe:
	
ntdll.dll
kernelbase.dll
base::WaitableEvent::Wait()
gpu::GpuChannelHost::Send(IPC::Message *)
gpu::CommandBufferProxyImpl::Initialize(HWND__ *,gpu::CommandBufferProxyImpl *,gpu::SchedulingPriority,gpu::ContextCreationAttribs const &,GURL const &)
ws::ContextProviderCommandBuffer::BindToCurrentThread()
content::VizProcessTransportFactory::TryCreateContextsForGpuCompositing(scoped_refptr<gpu::GpuChannelHost>)
content::VizProcessTransportFactory::OnEstablishedGpuChannel(base::WeakPtr<ui::Compositor>,scoped_refptr<gpu::GpuChannelHost>)
<aliased> base::internal::FunctorTraits<void (content::VizProcessTransportFactory::*)(base::WeakPtr<ui::Compositor>, scoped_refptr<gpu::GpuChannelHost>),void>::Invoke<void (content::VizProcessTransportFactory::*)(base::WeakPtr<ui::Compositor>, scoped_refptr<gpu::GpuChannelHost>),base::WeakPtr<content::VizProcessTransportFactory>,base::WeakPtr<ui::Compositor>,scoped_refptr<gpu::GpuChannelHost> >
base::internal::Invoker<base::internal::BindState<void (content::VizProcessTransportFactory::*)(base::WeakPtr<ui::Compositor>, scoped_refptr<gpu::GpuChannelHost>),base::WeakPtr<content::VizProcessTransportFactory>,base::WeakPtr<ui::Compositor> >,void (scoped_refptr<gpu::GpuChannelHost>)>::RunOnce
base::OnceCallback<void (scoped_refptr<gpu::GpuChannelHost>)>::Run
content::BrowserGpuChannelHostFactory::EstablishRequest::RunCallbacksOnMain()
base::debug::TaskAnnotator::RunTask(char const *,base::PendingTask *)
base::MessageLoop::RunTask(base::PendingTask *)
base::MessageLoop::DoWork()
base::MessagePumpForUI::DoRunLoop()
base::MessagePumpWin::Run(base::MessagePump::Delegate *)
base::RunLoop::Run()
ChromeBrowserMainParts::MainMessageLoopRun(int *)
content::BrowserMainLoop::RunMainMessageLoopParts()
content::BrowserMainRunnerImpl::Run()
content::BrowserMain(content::MainFunctionParams const &)
content::RunBrowserProcessMain(content::MainFunctionParams const &,content::ContentMainDelegate *)
content::ContentMainRunnerImpl::Run(bool)
service_manager::Main(service_manager::MainParams const &)
content::ContentMain(content::ContentMainParams const &)
ChromeMain
MainDllLoader::Launch(HINSTANCE__ *,base::TimeTicks)
wWinMain
__scrt_common_main_seh
kernel32.dll
ntdll.dll
 
gpu1.png
31.6 KB View Download
gpu2.png
25.6 KB View Download
gpu3.png
24.0 KB View Download
Cc: ssid@chromium.org
Labels: Hotlist-Jank Hotlist-SamplingProfilerInField Performance-Browser
Owner: piman@chromium.org
+piman we observe considerable number of UI thread janks caused by this wait.
This is related to issue 125248. I will post on that bug about these traces as well.
Mergedinto: 125248
Status: Duplicate (was: Untriaged)
With OOP-D I believe all 
Not sure what I was saying here.
Anyway, the traces aren't super actionable without the GPU process.
In any case, this is already a known problem, with some potential solutions, see other bug.
The GPU process was remove, but we can add it back to detect these cases.
Would it help fixing this issue?
Having the GPU process trace would help scope the problem better and inform possible mitigations
I'm adding more trace information to get precise information on this issue.
The GPU process was part of the thread, we removed it since we were only targeting browser process. I'm adding a GPU scenario to the slow-reports.

I'm gonna use this bug to keep track of the instrumentation progress.

So far, I can't tell what is the exact cause of these janks, but for sure we can get precise traces.

Give me few days to put piece together.
Project Member

Comment 9 by bugdroid1@chromium.org, Sep 27

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/18f09917aed51d0c9ecb85d7ef9c1b2d7c83915f

commit 18f09917aed51d0c9ecb85d7ef9c1b2d7c83915f
Author: Etienne Bergeron <etienneb@chromium.org>
Date: Thu Sep 27 20:40:42 2018

Activate stack sampling profiler on the GPU scenario

This CL is adding the stack sampling tracing category to the GPU scenario.
Slow-reports will now get the stack sampling on the GPU main thread when
the user is running the "BENCHMARK_GPU" scenario.

R=piman@chromium.org, ssid@chromium.org, oysteine@chromium.org

Bug: 888580
Change-Id: Ib6d6e6c98a77a11a1ee24d6223e6358115467a84
Reviewed-on: https://chromium-review.googlesource.com/1249909
Reviewed-by: oysteine <oysteine@chromium.org>
Reviewed-by: Siddhartha S <ssid@chromium.org>
Commit-Queue: Etienne Bergeron <etienneb@chromium.org>
Cr-Commit-Position: refs/heads/master@{#594846}
[modify] https://crrev.com/18f09917aed51d0c9ecb85d7ef9c1b2d7c83915f/content/browser/tracing/background_tracing_manager_impl.cc

Project Member

Comment 10 by bugdroid1@chromium.org, Sep 27

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2e85ee456a21d963310db32b0cfab2be7e386f41

commit 2e85ee456a21d963310db32b0cfab2be7e386f41
Author: Etienne Bergeron <etienneb@chromium.org>
Date: Thu Sep 27 21:03:32 2018

Add GPU events to slow-reports whitelist

This CL is adding the GPU events to the whitelist.

We do not accept any args yet, since some of them may contains PII and
are not yet useful to investigate GPU issues. We will update the
|kGPUAllowedArgs| array when needed. For now, let assume no args.

R=oysteine@chromium.org

Bug: 888580
Change-Id: Ia23ba1c0e2ed6a49328416678062074ffab472a6
Reviewed-on: https://chromium-review.googlesource.com/1249914
Reviewed-by: oysteine <oysteine@chromium.org>
Commit-Queue: Etienne Bergeron <etienneb@chromium.org>
Cr-Commit-Position: refs/heads/master@{#594856}
[modify] https://crrev.com/2e85ee456a21d963310db32b0cfab2be7e386f41/chrome/common/trace_event_args_whitelist.cc

Project Member

Comment 11 by bugdroid1@chromium.org, Sep 27

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/eba91421a7bcde71f4641032bd5e03947029a676

commit eba91421a7bcde71f4641032bd5e03947029a676
Author: Etienne Bergeron <etienneb@chromium.org>
Date: Thu Sep 27 21:26:20 2018

Activate GPU main thread stack sampling through slow-reports

This CL is adding the support for the stack sampling profiler to the GPU
main thread. The |tracing_sampler_profiler| is an adaptor that forward
stack sample as trace events. These events are collected by slow-reports.

This code is only running when the trace category "cpu_sampler" is
activated and should not affect performance otherwise.

R=piman@chromium.org, ssid@chromium.org

Bug: 888580
Change-Id: Ie703d965c1d158581da6fac1ac0f6ea2157ed6f1
Reviewed-on: https://chromium-review.googlesource.com/1249534
Reviewed-by: Siddhartha S <ssid@chromium.org>
Reviewed-by: Antoine Labour <piman@chromium.org>
Commit-Queue: Etienne Bergeron <etienneb@chromium.org>
Cr-Commit-Position: refs/heads/master@{#594872}
[modify] https://crrev.com/eba91421a7bcde71f4641032bd5e03947029a676/content/gpu/gpu_main.cc

Project Member

Comment 12 by bugdroid1@chromium.org, Sep 28

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5837f94069d0348232b7639d2ab17148ebcc7fd2

commit 5837f94069d0348232b7639d2ab17148ebcc7fd2
Author: Etienne Bergeron <etienneb@chromium.org>
Date: Fri Sep 28 20:47:40 2018

Add IPC category to GPU scenario

Follow-up on this CL:
  https://chromium-review.googlesource.com/c/chromium/src/+/1249909

R=piman@chromium.org, oysteine@chromium.org

Bug: 888580
Change-Id: Ia8a6404b7a750f102d224f173abf59c48c5e3b10
Reviewed-on: https://chromium-review.googlesource.com/1251765
Reviewed-by: Antoine Labour <piman@chromium.org>
Reviewed-by: oysteine <oysteine@chromium.org>
Commit-Queue: Etienne Bergeron <etienneb@chromium.org>
Cr-Commit-Position: refs/heads/master@{#595205}
[modify] https://crrev.com/5837f94069d0348232b7639d2ab17148ebcc7fd2/content/browser/tracing/background_tracing_manager_impl.cc

Slow-Reports with GpuChannelHost::Send above 10 seconds (blocking UI thread):

  a7ea44a2cbd807ed
  9eea78f425079f66
  9b7c6ae45f46adb1
  e68058b5e985b3ef

The following report [e68058b5e985b3ef](see attachment gpu2) seems to shows a stalled task on GpuChannelMsg_CreateCommandBuffer.

The GPU sampling profiler will guide us into these cases.

gpu.png
25.1 KB View Download
gpu2.png
78.3 KB View Download
Looking at e68058b5e985b3ef, virtually all the threads, even ones that have nothing to do with GPU (see e.g. network process), get descheduled. So it looks like a system-level problem.
They do not seems to be descheduled to me.
As we can see in the attachment, the sampling profiler thread keep running.
That means, thread are still scheduled at a regular interval.

The GPU thread is stalled in GpuChannelMsg_CreateCommandBuffer.
Which is explaining why some renderers are also stalled in the Send task.
g1.png
31.9 KB View Download
g2.png
13.7 KB View Download
g3.png
17.2 KB View Download
From a0257587928ddf21. The stack profiler keep running the whole duration at regular interval.
gpu_bug.png
10.1 KB View Download
From e7df5b780a613a70. There are 3 instances of the GPU::Send. They are janky.

Between these calls, the UI thread is responsive. The IO Thread is also running tasks. The Sampling profiler is running at regular intervals.
gpu_bug.png
30.0 KB View Download
We started to receive stackframe for the GPU process.
From this example: 21ce006bc3dac12b

We can observed that the main thread is blocked on a call to the GPU process.
The GPU process is waiting for a condition to occurs. The stackframe shows that it is stalled.

Two renderers seem to be stalled to send a message to the GPU.

And a renderer initiated that call.
gpu_example1.png
152 KB View Download
gpu_example2.png
38.3 KB View Download
From this report: 92971bbfc018670f

The browser process is stalled, waiting for the GPU process.
The GPU process is stalled waiting on a renderer.

We can see that they got unblocked at the renderer completion.
gpu_example1.png
10.8 KB View Download
gpu_example2.png
72.5 KB View Download
gpu_example3.png
22.4 KB View Download
Cc: ccameron@chromium.org vmi...@chromium.org
Labels: -Pri-3 Pri-1
Status: Assigned (was: Duplicate)
Great investigation Etienne!

I'm un-duping this bug, as I suspect that there's priority inversion [or worse] happening. More details.

1) Browser main thread is blocked on GpuChannelHost::Send [~4 seconds].
2) GPU main thread is processing a flush from a renderer [~4 seconds].

2a) For the first couple hundred ms, the GPU main thread is busy compiling shaders:
"""
 off:0x379700 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1]
 off:0x5522 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1]
 off:0x125755 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1]
 off:0x123fc7 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1]
 off:0x123f34 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1]
 off:0xf3066 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1]
 off:0xa55c3 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1]
 off:0xc540f - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1]
 gpu::gles2::GLES2DecoderPassthroughImpl::DoCompileShader(unsigned int)
 gpu::gles2::GLES2DecoderPassthroughImpl::DoCommandsImpl<0>(unsigned int,void const volatile *,int,int *)
 gpu::gles2::GLES2DecoderPassthroughImpl::DoCommands(unsigned int,void const volatile *,int,int *)
 gpu::CommandBufferService::Flush(int,gpu::AsyncAPIInterface *)
 gpu::CommandBufferStub::OnAsyncFlush(int,unsigned int)
 IPC::MessageT<GpuCommandBufferMsg_AsyncFlush_Meta,std::tuple<int,unsigned int>,void>::Dispatch<gpu::CommandBufferStub,gpu::CommandBufferStub,void,void (gpu::CommandBufferStub::*)(int, unsigned int)>
 gpu::CommandBufferStub::OnMessageReceived(IPC::Message const &)
 gpu::GpuChannel::HandleMessageHelper(IPC::Message const &)
 gpu::GpuChannel::HandleMessage(IPC::Message const &)
 gpu::Scheduler::RunNextTask()
 base::debug::TaskAnnotator::RunTask(char const *,base::PendingTask *)
"""

2b) For almost 3.5s, the GPU main thread is asleep, waiting for program to link.
"""
 NtWaitForAlertByThreadId
 RtlSleepConditionVariableSRW
 SleepConditionVariableSRW
 off:0x347880 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1]
 off:0x348424 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1]
 off:0x348550 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1]
 off:0xbecd3 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1]
 off:0xbd956 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1]
 off:0xc45c5 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1]
 off:0xc56df - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1]
 off:0xa7e94 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1]
 off:0xa771f - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1]
 off:0x3e8b0 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1]
 gpu::gles2::GLES2DecoderPassthroughImpl::DoLinkProgram(unsigned int)
 gpu::gles2::GLES2DecoderPassthroughImpl::DoCommandsImpl<0>(unsigned int,void const volatile *,int,int *)
 gpu::gles2::GLES2DecoderPassthroughImpl::DoCommands(unsigned int,void const volatile *,int,int *)
 gpu::CommandBufferService::Flush(int,gpu::AsyncAPIInterface *)
 gpu::CommandBufferStub::OnAsyncFlush(int,unsigned int)
 IPC::MessageT<GpuCommandBufferMsg_AsyncFlush_Meta,std::tuple<int,unsigned int>,void>::Dispatch<gpu::CommandBufferStub,gpu::CommandBufferStub,void,void (gpu::CommandBufferStub::*)(int, unsigned int)>
 gpu::CommandBufferStub::OnMessageReceived(IPC::Message const &)
 gpu::GpuChannel::HandleMessageHelper(IPC::Message const &)
"""

There are two problems here:
* Renderer has flushed 4s worth of synchronous work, which blocks the browser process main thread.
* GPU main thread sleeps for ~3.5s, even though this is blocking the the browser process main thread. If there's another thread/process that is supposed to be doing the program linking, it's probably been deprioritized.
trace-92971bbfc018670f.json.gz
1.7 MB Download
Cc: piman@chromium.org
Labels: -Pri-1 Pri-2
Owner: geoffl...@chromium.org
I assume this is ANGLE. @geofflang, anything stands out?
It's unfortunate that we don't have symbols into ANGLE here, is it possible for us to get them in libGLESv2.dll and libEGL.dll in the future?

Both glLinkProgram and glCompileShader have become asynchronous in ANGLE over the last month or so, the GPU process should no longer block on shader compilation and program linking.  We may see some good improvements to these cases soon.

The sleep we're seeing in the program linking is because ANGLE used to spawn 2 threads (for vertex and fragment shader) and wait on them to finish, we found that this did roughly double our link speed but probably had some gaps in our testing for heavy load situations.

Otherwise, both of these functions are considered fairly expensive and it's not unreasonable for a program linking to take a few seconds if the system is under heavy load.  Both Chrome and ANGLE both cache the results of these so long compiles/links should only be happening after a browser update, do we know if this is the case? 
I'm providing a better symbolized version of 21ce006bc3dac12b.

Now, we can observe that the GPU main thread is stalled on this:

"NtWaitForKeyedEvent
 RtlSleepConditionVariableSRW
 SleepConditionVariableSRW
 _crtSleepConditionVariableSRW
 Concurrency::details::stl_condition_variable_win7::wait(Concurrency::details::stl_critical_section_interface *)
 do_wait
 std::_Associated_state<int>::_Wait()
 angle::AsyncWaitableEvent::wait()
 rx::ProgramD3D::GraphicsProgramLinkEvent::wait()
 gl::Program::resolveLinkImpl()
 gl::Program::isLinked()
 gl::ValidateGetUniformLocation(gl::Context *,unsigned int,char const *)
 gl::GetUniformLocation(unsigned int,char const *)
trace-21ce006bc3dac12b.json.gz
2.6 MB Download
A better symbolized version of 92971bbfc018670f

The GPU main thread is stalled here:

 NtWaitForAlertByThreadId
 RtlSleepConditionVariableSRW
 SleepConditionVariableSRW
 _crtSleepConditionVariableSRW
 Concurrency::details::stl_condition_variable_win7::wait(Concurrency::details::stl_critical_section_interface *)
 do_wait
 std::_Associated_state<int>::_Wait()
 angle::AsyncWaitableEvent::wait()
 rx::ProgramD3D::GraphicsProgramLinkEvent::wait()
 gl::Program::resolveLinkImpl()
 gl::Program::isLinked()
 gl::ValidateUseProgram(gl::Context *,unsigned int)
 gl::UseProgram(unsigned int)
 gpu::gles2::GLES2DecoderPassthroughImpl::DoUseProgram(unsigned int)
 gpu::gles2::GLES2DecoderPassthroughImpl::DoCommandsImpl<0>(unsigned int,void const volatile *,int,int *)
 gpu::gles2::GLES2DecoderPassthroughImpl::DoCommands(unsigned int,void const volatile *,int,int *)
trace-92971bbfc018670f.json.gz
1.7 MB Download
Awesome, thanks for getting the symbols.

Now that the linking is asynchronous, these other calls (glGetUniformLocation and glUseProgram) are forcing the resolve of the linking.  I believe we could be doing a better job of letting the program link for as long as possible and only force the resolve just before using it.  This would require optimization in both the client and service side of the command decoder and probably in some of the client GL code too.  I'm not sure who could own that.

Etienne, do you have an idea if the overall situation has improved recently?
With OOP-D and OOP-R, we should be able to
- have all GLRenderer instances use the same GL context and shader cache
- have all raster workers use the same CrContext

That way, while we do suffer an initial link, we would only suffer that once. On macOS we do a bunch of acrobatics to avoid initializing new contexts ... see comments in:
https://bugs.chromium.org/p/chromium/issues/detail?id=712268#c27

If we do get that sort of GL context and shader pool sharing in place then we could finally delete the compositor reuse workarounds.
Cc: kylec...@chromium.org

Sign in to add a comment