New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 615113 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
OOO until 2019-01-24
Closed: Jun 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug

Blocking:
issue 617449



Sign in to add a comment

Win Intel bots suffer from OUT_OF_MEMORY issue running webgl2_conformance_tests

Project Member Reported by zmo@chromium.org, May 26 2016

Issue description

[3080:2736:0526/083719:ERROR:gles2_cmd_decoder.cc(10010)] [.Offscreen-For-WebGL-04B8CC28]GL ERROR :GL_OUT_OF_MEMORY : glReadPixels: <- error from previous GL command
[3080:2736:0526/083719:ERROR:gles2_cmd_decoder.cc(10113)] [.Offscreen-For-WebGL-04B8CC28]GL ERROR :GL_OUT_OF_MEMORY : glReadPixels: 
[3080:2736:0526/083719:ERROR:gles2_cmd_decoder.cc(3823)]   GLES2DecoderImpl: Context lost during MakeCurrent.
[3080:2736:0526/083719:ERROR:gpu_command_buffer_stub.cc(384)] Context lost because MakeCurrent failed.
[3080:2736:0526/083719:ERROR:gpu_channel_manager.cc(222)] Exiting GPU process because some drivers cannot recover from problems.
[3080:2736:0526/083719:ERROR:gpu_channel_manager.cc(222)] Exiting GPU process because some drivers cannot recover from problems.
[1652:2716:0526/083719:FATAL:gpu_process_host.cc(893)] Check failed: false. 
 

Comment 1 by zmo@chromium.org, May 26 2016

void GpuProcessHost::OnDidLoseContext(bool offscreen,
                                      gpu::error::ContextLostReason reason,
                                      const GURL& url) {
  // TODO(kbr): would be nice to see the "offscreen" flag too.
  TRACE_EVENT2("gpu", "GpuProcessHost::OnDidLoseContext",
               "reason", reason,
               "url",
               url.possibly_invalid_spec());

  if (!offscreen || url.is_empty()) {
    // Assume that the loss of the compositor's or accelerated canvas'
    // context is a serious event and blame the loss on all live
    // offscreen contexts. This more robustly handles situations where
    // the GPU process may not actually detect the context loss in the
    // offscreen context.
    BlockLiveOffscreenContexts();
    return;
  }

  GpuDataManagerImpl::DomainGuilt guilt;
  switch (reason) {
    case gpu::error::kGuilty:
      guilt = GpuDataManagerImpl::DOMAIN_GUILT_KNOWN;
      break;
    case gpu::error::kUnknown:
      guilt = GpuDataManagerImpl::DOMAIN_GUILT_UNKNOWN;
      break;
    case gpu::error::kInnocent:
      return;
    default:
      NOTREACHED();
      return;
  }

  GpuDataManagerImpl::GetInstance()->BlockDomainFrom3DAPIs(url, guilt);
}

The above NOTREACHED() is where the crash is.

Then we see later the following crash:

[3080:2736:0526/083719:FATAL:gl_bindings_autogen_gl.cc(12516)] Check failed: false. Trying to call glIsSync() without current GL context
Backtrace:
	base::debug::StackTrace::StackTrace [0x687C4B07+23]
	logging::LogMessage::~LogMessage [0x6878F681+49]
	gfx::NoContextGLApi::glIsSyncFn [0x6A868794+84]
	gfx::GLFenceARB::~GLFenceARB [0x6B3A6CF0+48]
	gfx::GLFenceARB::`scalar deleting destructor' [0x6B3A6D6B+11]
	gpu::gles2::CommandsCompletedQuery::`scalar deleting destructor' [0x6B366659+25]
	std::_Hash<std::_Umap_traits<unsigned int,scoped_refptr<gpu::gles2::QueryManager::Query>,std::_Uhash_compare<unsigned int,base_hash::hash<unsigned int>,std::equal_to<unsigned int> >,std::allocator<std::pair<unsigned int const ,scoped_refptr<gpu::gles2::Qu [0x6B368CD8+168]
	gpu::gles2::QueryManager::Destroy [0x6B366FEF+143]
	gpu::gles2::GLES2DecoderImpl::Destroy [0x6B33308E+1774]
	gpu::GpuCommandBufferStub::Destroy [0x6A8A86BD+973]
	gpu::GpuCommandBufferStub::~GpuCommandBufferStub [0x6A8A7CE6+22]
	base::ScopedPtrHashMap<int,std::unique_ptr<media::MediaChannel,std::default_delete<media::MediaChannel> > >::clear [0x6B1E2200+32]
	base::ScopedPtrHashMap<int,std::unique_ptr<media::MediaChannel,std::default_delete<media::MediaChannel> > >::clear [0x6B1E2200+32]
	content::GpuChildThread::~GpuChildThread [0x6A5D7D4E+350]
	content::ChildProcess::~ChildProcess [0x69D649EE+206]
	content::GpuMain [0x6A5D644F+2607]
	content::RunNamedProcessTypeMain [0x6876FCD4+260]
	content::ContentMainRunnerImpl::Run [0x6876FB81+321]
	content::ContentMain [0x6876CCA3+35]
	ChromeMain [0x686C69F6+118]
	MainDllLoader::Launch [0x00F7841C+812]
	wWinMain [0x00F77787+567]
	__scrt_common_main_seh [0x00FEB465+253] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:255)
	BaseThreadInitThunk [0x75EF338A+18]
	RtlInitializeExceptionChain [0x77C69A02+99]
	RtlInitializeExceptionChain [0x77C699D5+54]

But it seems to me we are already in a bad state because of the previous check failure.

Comment 2 by piman@chromium.org, May 26 2016

For the first crash, I think the other error codes need to be mapped to one of the DomainGuilt enum - and then remove the default: case so that this doesn't happen again.

For the second crash, most of the GLES2DecoderImpl teardown code handles the context being lost on MakeCurrent, it looks like CommandsCompletedQuery/GLFence* also needs to be aware of it?

Comment 3 by kbr@chromium.org, May 26 2016

To add more context: most of the failures seem to happen in the WebGL 2.0 conformance tests. Here is one failure:

https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20Release%20%28New%20Intel%29/builds/735/steps/webgl2_conformance_tests/logs/stdio

Full stdout attached, but the relevant parts (the out-of-memory error and subsequent crash stack) are already above.


stdout.txt
3.6 MB View Download

Comment 4 by kbr@chromium.org, May 26 2016

Owner: kbr@chromium.org
Status: Assigned (was: Available)
Antoine, thanks for pointing out the cause of the NOTREACHED(). I'll fix that code and see about the CommandsCompletedQuery.

The cause of the OUT_OF_MEMORY error is mysterious. It looks like that one test somewhat reliably provokes it. The failure may be legitimate -- maybe this test is allocating, and not deleting, a lot of GPU resources. They'd only be cleaned up automatically when navigating away from the page.


Comment 5 by kbr@chromium.org, May 26 2016

Summary: Win Intel bots suffer from OUT_OF_MEMORY issue running webgl2_conformance_tests (was: Win Intel bots suffer from OUT_OF_MEMORY issue)
Project Member

Comment 6 by bugdroid1@chromium.org, Jun 4 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c27c10b334e930d5569aafdf8c7d167f034708c5

commit c27c10b334e930d5569aafdf8c7d167f034708c5
Author: kbr <kbr@chromium.org>
Date: Sat Jun 04 05:05:05 2016

Handle all context lost reasons when determining guilt of domain.

This avoids a NOTREACHED() in the previous code. Verified that addition
of new constants without updating the switch statement will cause a
compile failure.

BUG= 615113 
CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2029293002
Cr-Commit-Position: refs/heads/master@{#397906}

[modify] https://crrev.com/c27c10b334e930d5569aafdf8c7d167f034708c5/content/browser/gpu/gpu_process_host.cc
[modify] https://crrev.com/c27c10b334e930d5569aafdf8c7d167f034708c5/gpu/command_buffer/common/constants.h

Comment 7 by kbr@chromium.org, Jun 7 2016

Blocking: 617449

Comment 8 by kbr@chromium.org, Jun 7 2016

Status: Fixed (was: Assigned)
At this point the OUT_OF_MEMORY errors don't seem to be happening any more, but the Win Intel bot still doesn't run the WebGL 2.0 conformance tests reliably. Closing this issue fixed but blocking  Issue 617449  on this one.

Sign in to add a comment