Win Intel bots suffer from OUT_OF_MEMORY issue running webgl2_conformance_tests |
|||||
Issue description[3080:2736:0526/083719:ERROR:gles2_cmd_decoder.cc(10010)] [.Offscreen-For-WebGL-04B8CC28]GL ERROR :GL_OUT_OF_MEMORY : glReadPixels: <- error from previous GL command [3080:2736:0526/083719:ERROR:gles2_cmd_decoder.cc(10113)] [.Offscreen-For-WebGL-04B8CC28]GL ERROR :GL_OUT_OF_MEMORY : glReadPixels: [3080:2736:0526/083719:ERROR:gles2_cmd_decoder.cc(3823)] GLES2DecoderImpl: Context lost during MakeCurrent. [3080:2736:0526/083719:ERROR:gpu_command_buffer_stub.cc(384)] Context lost because MakeCurrent failed. [3080:2736:0526/083719:ERROR:gpu_channel_manager.cc(222)] Exiting GPU process because some drivers cannot recover from problems. [3080:2736:0526/083719:ERROR:gpu_channel_manager.cc(222)] Exiting GPU process because some drivers cannot recover from problems. [1652:2716:0526/083719:FATAL:gpu_process_host.cc(893)] Check failed: false.
,
May 26 2016
For the first crash, I think the other error codes need to be mapped to one of the DomainGuilt enum - and then remove the default: case so that this doesn't happen again. For the second crash, most of the GLES2DecoderImpl teardown code handles the context being lost on MakeCurrent, it looks like CommandsCompletedQuery/GLFence* also needs to be aware of it?
,
May 26 2016
To add more context: most of the failures seem to happen in the WebGL 2.0 conformance tests. Here is one failure: https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20Release%20%28New%20Intel%29/builds/735/steps/webgl2_conformance_tests/logs/stdio Full stdout attached, but the relevant parts (the out-of-memory error and subsequent crash stack) are already above.
,
May 26 2016
Antoine, thanks for pointing out the cause of the NOTREACHED(). I'll fix that code and see about the CommandsCompletedQuery. The cause of the OUT_OF_MEMORY error is mysterious. It looks like that one test somewhat reliably provokes it. The failure may be legitimate -- maybe this test is allocating, and not deleting, a lot of GPU resources. They'd only be cleaned up automatically when navigating away from the page.
,
May 26 2016
,
Jun 4 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c27c10b334e930d5569aafdf8c7d167f034708c5 commit c27c10b334e930d5569aafdf8c7d167f034708c5 Author: kbr <kbr@chromium.org> Date: Sat Jun 04 05:05:05 2016 Handle all context lost reasons when determining guilt of domain. This avoids a NOTREACHED() in the previous code. Verified that addition of new constants without updating the switch statement will cause a compile failure. BUG= 615113 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2029293002 Cr-Commit-Position: refs/heads/master@{#397906} [modify] https://crrev.com/c27c10b334e930d5569aafdf8c7d167f034708c5/content/browser/gpu/gpu_process_host.cc [modify] https://crrev.com/c27c10b334e930d5569aafdf8c7d167f034708c5/gpu/command_buffer/common/constants.h
,
Jun 7 2016
,
Jun 7 2016
At this point the OUT_OF_MEMORY errors don't seem to be happening any more, but the Win Intel bot still doesn't run the WebGL 2.0 conformance tests reliably. Closing this issue fixed but blocking Issue 617449 on this one. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by zmo@chromium.org
, May 26 2016void GpuProcessHost::OnDidLoseContext(bool offscreen, gpu::error::ContextLostReason reason, const GURL& url) { // TODO(kbr): would be nice to see the "offscreen" flag too. TRACE_EVENT2("gpu", "GpuProcessHost::OnDidLoseContext", "reason", reason, "url", url.possibly_invalid_spec()); if (!offscreen || url.is_empty()) { // Assume that the loss of the compositor's or accelerated canvas' // context is a serious event and blame the loss on all live // offscreen contexts. This more robustly handles situations where // the GPU process may not actually detect the context loss in the // offscreen context. BlockLiveOffscreenContexts(); return; } GpuDataManagerImpl::DomainGuilt guilt; switch (reason) { case gpu::error::kGuilty: guilt = GpuDataManagerImpl::DOMAIN_GUILT_KNOWN; break; case gpu::error::kUnknown: guilt = GpuDataManagerImpl::DOMAIN_GUILT_UNKNOWN; break; case gpu::error::kInnocent: return; default: NOTREACHED(); return; } GpuDataManagerImpl::GetInstance()->BlockDomainFrom3DAPIs(url, guilt); } The above NOTREACHED() is where the crash is. Then we see later the following crash: [3080:2736:0526/083719:FATAL:gl_bindings_autogen_gl.cc(12516)] Check failed: false. Trying to call glIsSync() without current GL context Backtrace: base::debug::StackTrace::StackTrace [0x687C4B07+23] logging::LogMessage::~LogMessage [0x6878F681+49] gfx::NoContextGLApi::glIsSyncFn [0x6A868794+84] gfx::GLFenceARB::~GLFenceARB [0x6B3A6CF0+48] gfx::GLFenceARB::`scalar deleting destructor' [0x6B3A6D6B+11] gpu::gles2::CommandsCompletedQuery::`scalar deleting destructor' [0x6B366659+25] std::_Hash<std::_Umap_traits<unsigned int,scoped_refptr<gpu::gles2::QueryManager::Query>,std::_Uhash_compare<unsigned int,base_hash::hash<unsigned int>,std::equal_to<unsigned int> >,std::allocator<std::pair<unsigned int const ,scoped_refptr<gpu::gles2::Qu [0x6B368CD8+168] gpu::gles2::QueryManager::Destroy [0x6B366FEF+143] gpu::gles2::GLES2DecoderImpl::Destroy [0x6B33308E+1774] gpu::GpuCommandBufferStub::Destroy [0x6A8A86BD+973] gpu::GpuCommandBufferStub::~GpuCommandBufferStub [0x6A8A7CE6+22] base::ScopedPtrHashMap<int,std::unique_ptr<media::MediaChannel,std::default_delete<media::MediaChannel> > >::clear [0x6B1E2200+32] base::ScopedPtrHashMap<int,std::unique_ptr<media::MediaChannel,std::default_delete<media::MediaChannel> > >::clear [0x6B1E2200+32] content::GpuChildThread::~GpuChildThread [0x6A5D7D4E+350] content::ChildProcess::~ChildProcess [0x69D649EE+206] content::GpuMain [0x6A5D644F+2607] content::RunNamedProcessTypeMain [0x6876FCD4+260] content::ContentMainRunnerImpl::Run [0x6876FB81+321] content::ContentMain [0x6876CCA3+35] ChromeMain [0x686C69F6+118] MainDllLoader::Launch [0x00F7841C+812] wWinMain [0x00F77787+567] __scrt_common_main_seh [0x00FEB465+253] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:255) BaseThreadInitThunk [0x75EF338A+18] RtlInitializeExceptionChain [0x77C69A02+99] RtlInitializeExceptionChain [0x77C699D5+54] But it seems to me we are already in a bad state because of the previous check failure.