GPU process hang in OnCreateVideoDecoder, on NVIDIA, in PowerGetActiveScheme |
|||
Issue descriptionWhile using Chrome on my Chrome development workstation (lots of memory, lots of CPUs, fast disk, etc.) I hit a hang. The browser UI was unresponsive for ~ten seconds. This gave me enough time to start ETW tracing so I have a trace of Chrome's behavior in the seconds before the GPU process watchdog thread terminated the GPU process. Chrome Version: 64.0.3282.167 Operating System: Windows NT 10.0.15063 URL (if applicable) where crash occurred: http://cpac.conservative.org/cpac-2018-sponsors/ Can you reproduce this crash? No ****DO NOT CHANGE BELOW THIS LINE**** Crash ID: crash/d7ba109797a431c3 The crash dump was discarded by I have attached it to this bug. The hang occurred because thread 0 (thread ID 0x7140) did not respond. The call stack from the crash dump and from the ETW trace looks like this: 00 ntdll!NtAlpcSendWaitReceivePort 01 RPCRT4!LRPC_BASE_CCALL::DoSendReceive 02 RPCRT4!NdrpClientCall3 03 RPCRT4!NdrClientCall3 04 powrprof!PowerGetActiveScheme 05 nvwgf2umx!NVAPI_Thunk+plus_huge_offset 06 nvwgf2umx!NVAPI_Thunk+plus_huge_offset 07 nvwgf2umx!NVAPI_Thunk+plus_huge_offset 08 nvwgf2umx!NVAPI_Thunk+plus_huge_offset 09 nvwgf2umx!NVAPI_Thunk+plus_huge_offset 0a nvwgf2umx!NVAPI_Thunk+plus_huge_offset 0b nvwgf2umx!NVDEV_Thunk+plus_huge_offset 0c nvwgf2umx!OpenAdapter12+plus_huge_offset 0d nvwgf2umx!OpenAdapter12+plus_huge_offset 0e d3d11!dxrt11::Direct3DDevice::Release 0f d3d11!CDevice::QIVideoDevice 10 d3d11!ATL::AtlInternalQueryInterface 11 d3d11!CLayeredObject<CDevice>::QueryInterface 12 d3d11!ATL::AtlInternalQueryInterface 13 d3d11!CLayeredObject<NDXGI::CDevice>::QueryInterface 14 d3d11!ATL::AtlInternalQueryInterface 15 chrome_child!Microsoft::WRL::ComPtr<ID3D11Device>::CopyTo 16 chrome_child!media::DXVAVideoDecodeAccelerator::CreateDX11DevManager 17 chrome_child!media::DXVAVideoDecodeAccelerator::InitDecoder 18 chrome_child!media::DXVAVideoDecodeAccelerator::Initialize 19 chrome_child!media::GpuVideoDecodeAcceleratorFactory::CreateVDA 1a chrome_child!media::GpuVideoDecodeAccelerator::Initialize 1b chrome_child!media::MediaGpuChannel::OnCreateVideoDecoder 1c chrome_child!media::MediaGpuChannelDispatchHelper::OnCreateVideoDecoder 1d chrome_child!base::DispatchToMethodImpl 1e chrome_child!base::DispatchToMethod 1f chrome_child!IPC::MessageT<...>::DispatchDelayReply<...> 20 chrome_child!media::MediaGpuChannel::OnMessageReceived 21 chrome_child!gpu::GpuChannel::HandleMessageHelper 22 chrome_child!gpu::GpuChannel::HandleMessage 23 chrome_child!base::OnceCallback<void ()>::Run 24 chrome_child!gpu::Scheduler::RunNextTask 25 chrome_child!base::OnceCallback<void ()>::Run 26 chrome_child!base::debug::TaskAnnotator::RunTask 27 chrome_child!base::MessageLoop::RunTask 28 chrome_child!base::MessageLoop::DeferOrRunPendingTask 29 chrome_child!base::MessageLoop::DoWork 2a chrome_child!base::MessagePumpDefault::Run 2b chrome_child!base::RunLoop::Run 2c chrome_child!content::GpuMain 2d chrome_child!content::RunNamedProcessTypeMain 2e chrome_child!content::ContentMainRunnerImpl::Run 2f chrome_child!service_manager::Main 30 chrome_child!content::ContentMain 31 chrome_child!ChromeMain 32 chrome!MainDllLoader::Launch 33 chrome!wWinMain 34 chrome!invoke_main 35 chrome!__scrt_common_main_seh 36 KERNEL32!BaseThreadInitThunk 37 ntdll!RtlUserThreadStart The call stack on the crash service is not as good because the nvwgf2umx binaries are not available, although this can be fixed in the future by using NVIDIA's symbol server which will allow retrieving of binaries (but not symbols) for better stack walking. The crash dump by itself is not very helpful but together with the ETW trace it gives some additional clues. In particular it shows that during the 2.25 s prior to the GPU process being terminated (when the hang is detected) the GPU process consumed just 20 ms of CPU time. Since tracing was started just 5.1 s before process termination it is possible that this is not accurate, but it looks correct. In fact, there is no CPU usage shown in the trace from the GPU process for the first 2.7 s of the trace, so it seems to be extremely idle. Thread 28,992 is the one that hung and it is shown in the trace as not running for 4.389 s, apparently stuck on the stack above. So, the hang appears to be due to PowerGetActiveScheme not returning. Observations: 1) This might be a random one-off 2) NVIDIA and Microsoft are the only companies with any hope of investigating more deeply - I can't even speculate about why PowerGetActiveScheme is being called and why it might not return 3) Because stack walking across the nvwgf2umx fails we can't tell how often we hang inside media::MediaGpuChannel::OnCreateVideoDecoder. 4) We can tell how often we hang inside powrprof!PowerGetActiveScheme, and it appears to be a small percentage of GPU hangs (less than 0.1% based on my quick analysis) 5) Note that OnCreateVideoDecoder is being called on the process' main thread. ETW trace available on demand, name is "2018-02-23_11-26-29 Chrome hung for 5-10 s, caught the end.etl"
,
Feb 24 2018
It's definitely legal and even required for us to create the video decoder on the main thread here unfortunately. liberato@ is working on reducing the amount of work we do on the main thread, but I think the creation still must happen on the main thread. Frank?
,
Feb 24 2018
moving the decoder to another thread might be possible in some cases, but not all. depends on how well texture sharing works across d3d11 contexts. on my 420's video card (don't remember what it is), for example, it doesn't seem to when the video decoder is involved. so, we're stuck using the angle context, which means that we're stuck on the main thread. i've only got d3d 11.0, so the NTHANDLE texture sharing stuff isn't supported. hopefully it works better with 11.1 . or, maybe, i'm just doing it wrong. :)
,
Feb 28 2018
As per the above provided crash id in C#0, issue seems to be similar to #633031,hence merging into it. Please feel free to undup if it is not similar. Thanks..! |
|||
►
Sign in to add a comment |
|||
Comment 1 by brucedaw...@chromium.org
, Feb 24 2018Labels: -Restrict-View-EditIssue -User-Submitted Pri-3 Type-Bug