Issue metadata
Sign in to add a comment
|
Potential jank caused by BrowserGpuChannelHostFactory::EstablishRequest::Wait |
||||||||||||||||||||
Issue descriptionReports: 0dd81906f5266cbe (7 seconds jank) ffe6b15d69f7407a (6.5 seconds) 59edf789b46ac9ae (6.3 seconds) This issue seems to point to: https://cs.chromium.org/chromium/src/content/browser/gpu/browser_gpu_channel_host_factory.cc?l=218 // We're blocking the UI thread, which is generally undesirable. // In this case we need to wait for this before we can show any UI // /anyway/, so it won't cause additional jank. // TODO(piman): Make this asynchronous (http://crbug.com/125248). Stackframe: ntdll.dll kernelbase.dll base::WaitableEvent::Wait() gpu::GpuChannelHost::Send(IPC::Message *) gpu::CommandBufferProxyImpl::Initialize(HWND__ *,gpu::CommandBufferProxyImpl *,gpu::SchedulingPriority,gpu::ContextCreationAttribs const &,GURL const &) ws::ContextProviderCommandBuffer::BindToCurrentThread() content::VizProcessTransportFactory::TryCreateContextsForGpuCompositing(scoped_refptr<gpu::GpuChannelHost>) content::VizProcessTransportFactory::OnEstablishedGpuChannel(base::WeakPtr<ui::Compositor>,scoped_refptr<gpu::GpuChannelHost>) <aliased> base::internal::FunctorTraits<void (content::VizProcessTransportFactory::*)(base::WeakPtr<ui::Compositor>, scoped_refptr<gpu::GpuChannelHost>),void>::Invoke<void (content::VizProcessTransportFactory::*)(base::WeakPtr<ui::Compositor>, scoped_refptr<gpu::GpuChannelHost>),base::WeakPtr<content::VizProcessTransportFactory>,base::WeakPtr<ui::Compositor>,scoped_refptr<gpu::GpuChannelHost> > base::internal::Invoker<base::internal::BindState<void (content::VizProcessTransportFactory::*)(base::WeakPtr<ui::Compositor>, scoped_refptr<gpu::GpuChannelHost>),base::WeakPtr<content::VizProcessTransportFactory>,base::WeakPtr<ui::Compositor> >,void (scoped_refptr<gpu::GpuChannelHost>)>::RunOnce base::OnceCallback<void (scoped_refptr<gpu::GpuChannelHost>)>::Run content::BrowserGpuChannelHostFactory::EstablishRequest::RunCallbacksOnMain() base::debug::TaskAnnotator::RunTask(char const *,base::PendingTask *) base::MessageLoop::RunTask(base::PendingTask *) base::MessageLoop::DoWork() base::MessagePumpForUI::DoRunLoop() base::MessagePumpWin::Run(base::MessagePump::Delegate *) base::RunLoop::Run() ChromeBrowserMainParts::MainMessageLoopRun(int *) content::BrowserMainLoop::RunMainMessageLoopParts() content::BrowserMainRunnerImpl::Run() content::BrowserMain(content::MainFunctionParams const &) content::RunBrowserProcessMain(content::MainFunctionParams const &,content::ContentMainDelegate *) content::ContentMainRunnerImpl::Run(bool) service_manager::Main(service_manager::MainParams const &) content::ContentMain(content::ContentMainParams const &) ChromeMain MainDllLoader::Launch(HINSTANCE__ *,base::TimeTicks) wWinMain __scrt_common_main_seh kernel32.dll ntdll.dll
,
Sep 26
,
Sep 26
+piman we observe considerable number of UI thread janks caused by this wait. This is related to issue 125248. I will post on that bug about these traces as well.
,
Sep 26
,
Sep 26
Not sure what I was saying here. Anyway, the traces aren't super actionable without the GPU process. In any case, this is already a known problem, with some potential solutions, see other bug.
,
Sep 26
The GPU process was remove, but we can add it back to detect these cases. Would it help fixing this issue?
,
Sep 26
Having the GPU process trace would help scope the problem better and inform possible mitigations
,
Sep 27
I'm adding more trace information to get precise information on this issue. The GPU process was part of the thread, we removed it since we were only targeting browser process. I'm adding a GPU scenario to the slow-reports. I'm gonna use this bug to keep track of the instrumentation progress. So far, I can't tell what is the exact cause of these janks, but for sure we can get precise traces. Give me few days to put piece together.
,
Sep 27
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/18f09917aed51d0c9ecb85d7ef9c1b2d7c83915f commit 18f09917aed51d0c9ecb85d7ef9c1b2d7c83915f Author: Etienne Bergeron <etienneb@chromium.org> Date: Thu Sep 27 20:40:42 2018 Activate stack sampling profiler on the GPU scenario This CL is adding the stack sampling tracing category to the GPU scenario. Slow-reports will now get the stack sampling on the GPU main thread when the user is running the "BENCHMARK_GPU" scenario. R=piman@chromium.org, ssid@chromium.org, oysteine@chromium.org Bug: 888580 Change-Id: Ib6d6e6c98a77a11a1ee24d6223e6358115467a84 Reviewed-on: https://chromium-review.googlesource.com/1249909 Reviewed-by: oysteine <oysteine@chromium.org> Reviewed-by: Siddhartha S <ssid@chromium.org> Commit-Queue: Etienne Bergeron <etienneb@chromium.org> Cr-Commit-Position: refs/heads/master@{#594846} [modify] https://crrev.com/18f09917aed51d0c9ecb85d7ef9c1b2d7c83915f/content/browser/tracing/background_tracing_manager_impl.cc
,
Sep 27
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2e85ee456a21d963310db32b0cfab2be7e386f41 commit 2e85ee456a21d963310db32b0cfab2be7e386f41 Author: Etienne Bergeron <etienneb@chromium.org> Date: Thu Sep 27 21:03:32 2018 Add GPU events to slow-reports whitelist This CL is adding the GPU events to the whitelist. We do not accept any args yet, since some of them may contains PII and are not yet useful to investigate GPU issues. We will update the |kGPUAllowedArgs| array when needed. For now, let assume no args. R=oysteine@chromium.org Bug: 888580 Change-Id: Ia23ba1c0e2ed6a49328416678062074ffab472a6 Reviewed-on: https://chromium-review.googlesource.com/1249914 Reviewed-by: oysteine <oysteine@chromium.org> Commit-Queue: Etienne Bergeron <etienneb@chromium.org> Cr-Commit-Position: refs/heads/master@{#594856} [modify] https://crrev.com/2e85ee456a21d963310db32b0cfab2be7e386f41/chrome/common/trace_event_args_whitelist.cc
,
Sep 27
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/eba91421a7bcde71f4641032bd5e03947029a676 commit eba91421a7bcde71f4641032bd5e03947029a676 Author: Etienne Bergeron <etienneb@chromium.org> Date: Thu Sep 27 21:26:20 2018 Activate GPU main thread stack sampling through slow-reports This CL is adding the support for the stack sampling profiler to the GPU main thread. The |tracing_sampler_profiler| is an adaptor that forward stack sample as trace events. These events are collected by slow-reports. This code is only running when the trace category "cpu_sampler" is activated and should not affect performance otherwise. R=piman@chromium.org, ssid@chromium.org Bug: 888580 Change-Id: Ie703d965c1d158581da6fac1ac0f6ea2157ed6f1 Reviewed-on: https://chromium-review.googlesource.com/1249534 Reviewed-by: Siddhartha S <ssid@chromium.org> Reviewed-by: Antoine Labour <piman@chromium.org> Commit-Queue: Etienne Bergeron <etienneb@chromium.org> Cr-Commit-Position: refs/heads/master@{#594872} [modify] https://crrev.com/eba91421a7bcde71f4641032bd5e03947029a676/content/gpu/gpu_main.cc
,
Sep 28
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5837f94069d0348232b7639d2ab17148ebcc7fd2 commit 5837f94069d0348232b7639d2ab17148ebcc7fd2 Author: Etienne Bergeron <etienneb@chromium.org> Date: Fri Sep 28 20:47:40 2018 Add IPC category to GPU scenario Follow-up on this CL: https://chromium-review.googlesource.com/c/chromium/src/+/1249909 R=piman@chromium.org, oysteine@chromium.org Bug: 888580 Change-Id: Ia8a6404b7a750f102d224f173abf59c48c5e3b10 Reviewed-on: https://chromium-review.googlesource.com/1251765 Reviewed-by: Antoine Labour <piman@chromium.org> Reviewed-by: oysteine <oysteine@chromium.org> Commit-Queue: Etienne Bergeron <etienneb@chromium.org> Cr-Commit-Position: refs/heads/master@{#595205} [modify] https://crrev.com/5837f94069d0348232b7639d2ab17148ebcc7fd2/content/browser/tracing/background_tracing_manager_impl.cc
,
Oct 5
Slow-Reports with GpuChannelHost::Send above 10 seconds (blocking UI thread): a7ea44a2cbd807ed 9eea78f425079f66 9b7c6ae45f46adb1 e68058b5e985b3ef The following report [e68058b5e985b3ef](see attachment gpu2) seems to shows a stalled task on GpuChannelMsg_CreateCommandBuffer. The GPU sampling profiler will guide us into these cases.
,
Oct 5
Looking at e68058b5e985b3ef, virtually all the threads, even ones that have nothing to do with GPU (see e.g. network process), get descheduled. So it looks like a system-level problem.
,
Oct 5
They do not seems to be descheduled to me. As we can see in the attachment, the sampling profiler thread keep running. That means, thread are still scheduled at a regular interval. The GPU thread is stalled in GpuChannelMsg_CreateCommandBuffer. Which is explaining why some renderers are also stalled in the Send task.
,
Oct 5
From a0257587928ddf21. The stack profiler keep running the whole duration at regular interval.
,
Oct 5
From e7df5b780a613a70. There are 3 instances of the GPU::Send. They are janky. Between these calls, the UI thread is responsive. The IO Thread is also running tasks. The Sampling profiler is running at regular intervals.
,
Oct 9
We started to receive stackframe for the GPU process. From this example: 21ce006bc3dac12b We can observed that the main thread is blocked on a call to the GPU process. The GPU process is waiting for a condition to occurs. The stackframe shows that it is stalled. Two renderers seem to be stalled to send a message to the GPU. And a renderer initiated that call.
,
Oct 9
From this report: 92971bbfc018670f The browser process is stalled, waiting for the GPU process. The GPU process is stalled waiting on a renderer. We can see that they got unblocked at the renderer completion.
,
Oct 9
Great investigation Etienne! I'm un-duping this bug, as I suspect that there's priority inversion [or worse] happening. More details. 1) Browser main thread is blocked on GpuChannelHost::Send [~4 seconds]. 2) GPU main thread is processing a flush from a renderer [~4 seconds]. 2a) For the first couple hundred ms, the GPU main thread is busy compiling shaders: """ off:0x379700 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1] off:0x5522 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1] off:0x125755 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1] off:0x123fc7 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1] off:0x123f34 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1] off:0xf3066 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1] off:0xa55c3 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1] off:0xc540f - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1] gpu::gles2::GLES2DecoderPassthroughImpl::DoCompileShader(unsigned int) gpu::gles2::GLES2DecoderPassthroughImpl::DoCommandsImpl<0>(unsigned int,void const volatile *,int,int *) gpu::gles2::GLES2DecoderPassthroughImpl::DoCommands(unsigned int,void const volatile *,int,int *) gpu::CommandBufferService::Flush(int,gpu::AsyncAPIInterface *) gpu::CommandBufferStub::OnAsyncFlush(int,unsigned int) IPC::MessageT<GpuCommandBufferMsg_AsyncFlush_Meta,std::tuple<int,unsigned int>,void>::Dispatch<gpu::CommandBufferStub,gpu::CommandBufferStub,void,void (gpu::CommandBufferStub::*)(int, unsigned int)> gpu::CommandBufferStub::OnMessageReceived(IPC::Message const &) gpu::GpuChannel::HandleMessageHelper(IPC::Message const &) gpu::GpuChannel::HandleMessage(IPC::Message const &) gpu::Scheduler::RunNextTask() base::debug::TaskAnnotator::RunTask(char const *,base::PendingTask *) """ 2b) For almost 3.5s, the GPU main thread is asleep, waiting for program to link. """ NtWaitForAlertByThreadId RtlSleepConditionVariableSRW SleepConditionVariableSRW off:0x347880 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1] off:0x348424 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1] off:0x348550 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1] off:0xbecd3 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1] off:0xbd956 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1] off:0xc45c5 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1] off:0xc56df - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1] off:0xa7e94 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1] off:0xa771f - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1] off:0x3e8b0 - libGLESv2.dll.pdb [E09638AC3CF84D834C4C44205044422E1] gpu::gles2::GLES2DecoderPassthroughImpl::DoLinkProgram(unsigned int) gpu::gles2::GLES2DecoderPassthroughImpl::DoCommandsImpl<0>(unsigned int,void const volatile *,int,int *) gpu::gles2::GLES2DecoderPassthroughImpl::DoCommands(unsigned int,void const volatile *,int,int *) gpu::CommandBufferService::Flush(int,gpu::AsyncAPIInterface *) gpu::CommandBufferStub::OnAsyncFlush(int,unsigned int) IPC::MessageT<GpuCommandBufferMsg_AsyncFlush_Meta,std::tuple<int,unsigned int>,void>::Dispatch<gpu::CommandBufferStub,gpu::CommandBufferStub,void,void (gpu::CommandBufferStub::*)(int, unsigned int)> gpu::CommandBufferStub::OnMessageReceived(IPC::Message const &) gpu::GpuChannel::HandleMessageHelper(IPC::Message const &) """ There are two problems here: * Renderer has flushed 4s worth of synchronous work, which blocks the browser process main thread. * GPU main thread sleeps for ~3.5s, even though this is blocking the the browser process main thread. If there's another thread/process that is supposed to be doing the program linking, it's probably been deprioritized.
,
Oct 10
I assume this is ANGLE. @geofflang, anything stands out?
,
Oct 10
It's unfortunate that we don't have symbols into ANGLE here, is it possible for us to get them in libGLESv2.dll and libEGL.dll in the future? Both glLinkProgram and glCompileShader have become asynchronous in ANGLE over the last month or so, the GPU process should no longer block on shader compilation and program linking. We may see some good improvements to these cases soon. The sleep we're seeing in the program linking is because ANGLE used to spawn 2 threads (for vertex and fragment shader) and wait on them to finish, we found that this did roughly double our link speed but probably had some gaps in our testing for heavy load situations. Otherwise, both of these functions are considered fairly expensive and it's not unreasonable for a program linking to take a few seconds if the system is under heavy load. Both Chrome and ANGLE both cache the results of these so long compiles/links should only be happening after a browser update, do we know if this is the case?
,
Oct 10
I'm providing a better symbolized version of 21ce006bc3dac12b. Now, we can observe that the GPU main thread is stalled on this: "NtWaitForKeyedEvent RtlSleepConditionVariableSRW SleepConditionVariableSRW _crtSleepConditionVariableSRW Concurrency::details::stl_condition_variable_win7::wait(Concurrency::details::stl_critical_section_interface *) do_wait std::_Associated_state<int>::_Wait() angle::AsyncWaitableEvent::wait() rx::ProgramD3D::GraphicsProgramLinkEvent::wait() gl::Program::resolveLinkImpl() gl::Program::isLinked() gl::ValidateGetUniformLocation(gl::Context *,unsigned int,char const *) gl::GetUniformLocation(unsigned int,char const *)
,
Oct 10
A better symbolized version of 92971bbfc018670f The GPU main thread is stalled here: NtWaitForAlertByThreadId RtlSleepConditionVariableSRW SleepConditionVariableSRW _crtSleepConditionVariableSRW Concurrency::details::stl_condition_variable_win7::wait(Concurrency::details::stl_critical_section_interface *) do_wait std::_Associated_state<int>::_Wait() angle::AsyncWaitableEvent::wait() rx::ProgramD3D::GraphicsProgramLinkEvent::wait() gl::Program::resolveLinkImpl() gl::Program::isLinked() gl::ValidateUseProgram(gl::Context *,unsigned int) gl::UseProgram(unsigned int) gpu::gles2::GLES2DecoderPassthroughImpl::DoUseProgram(unsigned int) gpu::gles2::GLES2DecoderPassthroughImpl::DoCommandsImpl<0>(unsigned int,void const volatile *,int,int *) gpu::gles2::GLES2DecoderPassthroughImpl::DoCommands(unsigned int,void const volatile *,int,int *)
,
Oct 10
Awesome, thanks for getting the symbols. Now that the linking is asynchronous, these other calls (glGetUniformLocation and glUseProgram) are forcing the resolve of the linking. I believe we could be doing a better job of letting the program link for as long as possible and only force the resolve just before using it. This would require optimization in both the client and service side of the command decoder and probably in some of the client GL code too. I'm not sure who could own that. Etienne, do you have an idea if the overall situation has improved recently?
,
Oct 10
With OOP-D and OOP-R, we should be able to - have all GLRenderer instances use the same GL context and shader cache - have all raster workers use the same CrContext That way, while we do suffer an initial link, we would only suffer that once. On macOS we do a bunch of acrobatics to avoid initializing new contexts ... see comments in: https://bugs.chromium.org/p/chromium/issues/detail?id=712268#c27 If we do get that sort of GL context and shader pool sharing in place then we could finally delete the compositor reuse workarounds.
,
Oct 19
|
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by etienneb@chromium.org
, Sep 25