New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 619106 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Email to this user bounced
Closed: Jun 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug

Blocking:
issue 616483



Sign in to add a comment

testDumpMemorySuccess flaky. GPU issue?

Project Member Reported by piman@chromium.org, Jun 10 2016

Issue description

From: https://build.chromium.org/p/chromium.win/builders/Win7%20Tests%20%28dbg%29%281%29/builds/49551/steps/telemetry_unittests%20on%20Windows-7-SP1/logs/stdio

[263/1023] telemetry.internal.backends.chrome_inspector.tracing_backend_unittest.TracingBackendTest.testDumpMemorySuccess failed unexpectedly 5.4570s:
  Successfully shut down browser cooperatively
  Chrome build location for win_AMD64 not found. Browser will be run without Flash.
  Requested remote debugging port: 0
  Chrome log file will be saved in e:\b\swarm_slave\work\isolated\isolated_tmpfk5aoy\tmpxtgwen\chrome.log
  Starting Chrome ['../../out\\Debug\\chrome.exe', '--no-sandbox', '--enable-memory-benchmarking', '--enable-net-benchmarking', '--metrics-recording-only', '--no-default-browser-check', '--no-first-run', '--enable-gpu-benchmarking', '--disable-background-networking', '--no-proxy-server', '--disable-component-extensions-with-background-pages', '--disable-default-apps', '--enable-logging', '--v=1', '--remote-debugging-port=0', '--enable-crash-reporter-for-testing', '--window-size=1280,1024', '--user-data-dir=e:\\b\\swarm_slave\\work\\isolated\\isolated_tmpfk5aoy\\tmpfjbqj9', 'about:blank']

[snip]

  [5576:4916:0610/103620:FATAL:gpu_info_collector.cc(104)] Check failed: gl::GetGLImplementation() != gl::kGLImplementationNone (0 vs. 0)
  Backtrace:
  	base::debug::StackTrace::StackTrace [0x10064957+23]
  	logging::LogMessage::~LogMessage [0x100B36BB+59]
  	gpu::gles2::BufferManager::MarkContextLost [0x0B3CFF68+2890514]
  	gpu::gles2::BufferManager::MarkContextLost [0x0B3D3458+2904066]
  	content::GpuChildThread::OnCollectGraphicsInfo [0x1078F178+216]
  	??$DispatchToMethodImpl@PAVGpuChildThread@content@@P812@AEXXZ$$V$$Z$S@base@@YAXABQAVGpuChildThread@content@@P812@AEXXZABV?$tuple@$$V@std@@U?$IndexSequence@$S@0@@Z [0x10784C20+32]
  	??$DispatchToMethod@PAVGpuChildThread@content@@P812@AEXXZ$$V@base@@YAXABQAVGpuChildThread@content@@P812@AEXXZABV?$tuple@$$V@std@@@Z [0x107847DC+44]
  	??$DispatchToMethod@VGpuChildThread@content@@P812@AEXXZXV?$tuple@$$V@std@@@IPC@@YAXPAVGpuChildThread@content@@P812@AEXXZPAXABV?$tuple@$$V@std@@@Z [0x107849B6+38]
  	??$Dispatch@VGpuChildThread@content@@V12@XP812@AEXXZ@?$MessageT@UGpuMsg_CollectGraphicsInfo_Meta@@V?$tuple@$$V@std@@X@IPC@@SA_NPBVMessage@1@PAVGpuChildThread@content@@1PAXP834@AEXXZ@Z [0x10783B63+227]
  	content::GpuChildThread::OnControlMessageReceived [0x1078F78B+491]
  	content::ChildThreadImpl::OnMessageReceived [0x108A336B+1259]
  	content::GpuChildThread::OnMessageReceived [0x10791088+24]
  	IPC::MessageAttachmentSet::ReplacePlaceholderWithAttachment [0x0B900DD3+168836]
  	IPC::MessageAttachmentSet::ReplacePlaceholderWithAttachment [0x0B8FB7C3+146804]
  	IPC::MessageAttachmentSet::ReplacePlaceholderWithAttachment [0x0B8FB530+146145]
  	IPC::MessageAttachmentSet::ReplacePlaceholderWithAttachment [0x0B901688+171065]
  	base::Callback<void __cdecl(void),1>::Run [0x1003C32E+30]
  	base::debug::TaskAnnotator::RunTask [0x1006DF34+324]
  	base::MessageLoop::RunTask [0x100DA9A0+640]
  	base::MessageLoop::DeferOrRunPendingTask [0x100D887D+45]
  	base::MessageLoop::DoWork [0x100D8E64+196]
  	base::MessagePumpForGpu::DoRunLoop [0x100E1F42+98]
  	base::MessagePumpWin::Run [0x100E34DB+123]
  	base::MessageLoop::RunHandler [0x100DA6E1+193]
  	base::RunLoop::Run [0x10180834+52]
  	base::MessageLoop::Run [0x100DA5DC+188]
  	content::GpuMain [0x1079C3C3+2691]
  	content::RunNamedProcessTypeMain [0x131B4B67+135]
  	content::ContentMainRunnerImpl::Run [0x131B4A28+488]
  	content::ContentMain [0x131B2A14+100]
  	ChromeMain [0x04EF5622+114]
  	MainDllLoader::Launch [0x0043F5C4+916]
  	wWinMain [0x0043B42D+653]
  	invoke_main [0x006D3DDE+30] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:118)
  	__scrt_common_main_seh [0x006D3C2A+346] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:255)
  	__scrt_common_main [0x006D3ABD+13] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:300)
  	wWinMainCRTStartup [0x006D3DF8+8] (f:\dd\vctools\crt\vcstartup\src\startup\exe_wwinmain.cpp:17)
  	BaseThreadInitThunk [0x7701337A+18]
  	RtlInitializeExceptionChain [0x777D9882+99]
  	RtlInitializeExceptionChain [0x777D9855+54]

[snip]

  [1372:2900:0610/103621:VERBOSE1:tracing_controller_impl.cc(1005)] Memory-infra dump failed because of NACK from child 5576
  [1372:1908:0610/103621:VERBOSE1:node_controller.cc(445)] Dropped peer E106F77245BBC37.9191E085EF5E21B6
  [1372:1908:0610/103621:VERBOSE1:node.cc(410)] Observing lost connection from node 13C75C8D216A6809.B0EF3EC4F1FC7A8B to node E106F77245BBC37.9191E085EF5E21B6
  
  ========== END BROWSER LOG ==========
  Traceback (most recent call last):
    File "e:\b\swarm_slave\work\isolated\isolated_runekl8te\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\tracing_backend_unittest.py", line 25, in WrappedTest
      test(self)
    File "e:\b\swarm_slave\work\isolated\isolated_runekl8te\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\tracing_backend_unittest.py", line 92, in testDumpMemorySuccess
      self.assertIsNotNone(dump_id)
  AssertionError: unexpectedly None



It looks like maybe a race or something makes the OnCollectGraphicsInfo fail / lose context, and it looks like we're getting into a bad situation (kGLImplementationNone ?) which raises asserts.
 

Comment 1 by kbr@chromium.org, Jun 10 2016

Blocking: 616483
Good catch.  Issue 616483  was tracking flakiness in this test; not sure whether the above stack trace was gathered before.

Comment 2 by kbr@chromium.org, Jun 10 2016

Cc: j.iso...@samsung.com
j.isorce@ has been working on the GPU info collection code recently and may be able to postulate a cause of the crash.

Comment 3 by zmo@chromium.org, Jun 10 2016

GPU device 0: VENDOR = 0x15ad, DEVICE = 0x405

It is using VMware software renderer.

The assertion is from CollectGraphicsInfoGL, which should not be reached on Windows at all.

This is weird.
Cc: piman@chromium.org primiano@chromium.org zmo@chromium.org petrcermak@chromium.org perezju@chromium.org
 Issue 616483  has been merged into this issue.

Comment 5 by kbr@chromium.org, Jun 10 2016

Looking higher in the log from the last failure reported in  Issue 616483 :
https://chromium-swarm.appspot.com/user/task/2f5431fbd1518e10

  [4908:4748:0610/104011:ERROR:angle_platform_impl.cc(33)] ANGLE Display::initialize error 4: Renderer does not support PS 3.0.aborting!
  [4908:4748:0610/104011:ERROR:gl_surface_egl.cc(598)] eglInitialize D3D9 failed with error EGL_NOT_INITIALIZED
  [4908:4748:0610/104011:ERROR:gl_initializer_win.cc(28)] GLSurfaceEGL::InitializeOneOff failed.
  [4908:4748:0610/104011:VERBOSE1:gpu_main.cc(345)] gl::init::InitializeGLOneOff failed
...
  [4908:4748:0610/104011:ERROR:gpu_child_thread.cc(376)] Exiting GPU process due to errors during initialization

  [4908:4748:0610/104011:FATAL:gpu_info_collector.cc(104)] Check failed: gl::GetGLImplementation() != gl::kGLImplementationNone (0 vs. 0)
  Backtrace:
  	base::debug::StackTrace::StackTrace [0x10064957+23]
  	logging::LogMessage::~LogMessage [0x100B36BB+59]
  	gpu::gles2::BufferManager::MarkContextLost [0x0B19FF68+2890514]
  	gpu::gles2::BufferManager::MarkContextLost [0x0B1A3458+2904066]
  	content::GpuChildThread::OnCollectGraphicsInfo [0x1078F178+216]
  	??$DispatchToMethodImpl@PAVGpuChildThread@content@@P812@AEXXZ$$V$$Z$S@base@@YAXABQAVGpuChildThread@content@@P812@AEXXZABV?$tuple@$$V@std@@U?$IndexSequence@$S@0@@Z [0x10784C20+32]
  	??$DispatchToMethod@PAVGpuChildThread@content@@P812@AEXXZ$$V@base@@YAXABQAVGpuChildThread@content@@P812@AEXXZABV?$tuple@$$V@std@@@Z [0x107847DC+44]
  	??$DispatchToMethod@VGpuChildThread@content@@P812@AEXXZXV?$tuple@$$V@std@@@IPC@@YAXPAVGpuChildThread@content@@P812@AEXXZPAXABV?$tuple@$$V@std@@@Z [0x107849B6+38]
  	??$Dispatch@VGpuChildThread@content@@V12@XP812@AEXXZ@?$MessageT@UGpuMsg_CollectGraphicsInfo_Meta@@V?$tuple@$$V@std@@X@IPC@@SA_NPBVMessage@1@PAVGpuChildThread@content@@1PAXP834@AEXXZ@Z [0x10783B63+227]
  	content::GpuChildThread::OnControlMessageReceived [0x1078F78B+491]
  	content::ChildThreadImpl::OnMessageReceived [0x108A336B+1259]
  	content::GpuChildThread::OnMessageReceived [0x10791088+24]
  	IPC::MessageAttachmentSet::ReplacePlaceholderWithAttachment [0x0B840DD3+168836]
  	IPC::MessageAttachmentSet::ReplacePlaceholderWithAttachment [0x0B83B7C3+146804]
  	IPC::MessageAttachmentSet::ReplacePlaceholderWithAttachment [0x0B83B530+146145]
  	IPC::MessageAttachmentSet::ReplacePlaceholderWithAttachment [0x0B841688+171065]
  	base::Callback<void __cdecl(void),1>::Run [0x1003C32E+30]
  	base::debug::TaskAnnotator::RunTask [0x1006DF34+324]
  	base::MessageLoop::RunTask [0x100DA9A0+640]
  	base::MessageLoop::DeferOrRunPendingTask [0x100D887D+45]
  	base::MessageLoop::DoWork [0x100D8E64+196]
  	base::MessagePumpForGpu::DoRunLoop [0x100E1F42+98]
  	base::MessagePumpWin::Run [0x100E34DB+123]
  	base::MessageLoop::RunHandler [0x100DA6E1+193]
  	base::RunLoop::Run [0x10180834+52]
  	base::MessageLoop::Run [0x100DA5DC+188]
  	content::GpuMain [0x1079C3C3+2691]
  	content::RunNamedProcessTypeMain [0x131B4B67+135]
  	content::ContentMainRunnerImpl::Run [0x131B4A28+488]
  	content::ContentMain [0x131B2A14+100]
  	ChromeMain [0x04CC5622+114]
  	MainDllLoader::Launch [0x0043F5C4+916]
  	wWinMain [0x0043B42D+653]
  	invoke_main [0x006D3DDE+30] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:118)
  	__scrt_common_main_seh [0x006D3C2A+346] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:255)
  	__scrt_common_main [0x006D3ABD+13] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:300)
  	wWinMainCRTStartup [0x006D3DF8+8] (f:\dd\vctools\crt\vcstartup\src\startup\exe_wwinmain.cpp:17)
  	BaseThreadInitThunk [0x7701337A+18]
  	RtlInitializeExceptionChain [0x777D9882+99]
  	RtlInitializeExceptionChain [0x777D9855+54]

Are fallbacks taking effect that are causing unexpected code paths to be taken?

Comment 6 by zmo@chromium.org, Jun 10 2016

Cc: jmad...@chromium.org
Jmadill: I remember we have a list of fallbacks that you moved from ANGLE to chromium. Can you shed some light on this?
In addition to previous comment, some random ideas:

1: Should we bump driver_version field of entry 68 in gpu/config/software_rendering_list_json.cc ? Though why does it start to fail just now. Since these test run in a virtual machine, do these tests really require to start the gpu process ?

2: Is it possible that the bot or the vm has been touched ? Like upgrading the client driver ?

3: Comment #21 from 2015 reports a similar problem here https://bugs.chromium.org/p/chromium/issues/detail?id=514274 on win_os (though the ticket itself is against Linux). So it seems that the problem disappear and re-appear.


 If the GPU process fails to initialize the GPU then it'll clear the GL bindings and GpuChildThread::OnInitialize will do base::MessageLoop::current()->QuitWhenIdle(). I think there's a race condition there where if other messages (e.g. collect graphics info) are sent quickly then it'll process them before it becomes idle.

Maybe it should do _exit(0); immediately in that case, so it won't try to handle other IPC messages. Another option is to get rid of the dead on arrival state and die immediately after creating a GPU context fails. That should work now that the browser process can detect when processes die before creating an IPC channel.
jbauman I think you are exactly right. I did this quick fix here https://codereview.chromium.org/2061953002 . Though in long term I think what you suggested would be better.

Comment 10 by kbr@chromium.org, Jun 13 2016

Owner: j.iso...@samsung.com
Status: Started (was: Untriaged)
Project Member

Comment 11 by bugdroid1@chromium.org, Jun 14 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e02f94b59fa148e354db57a0f5d38f6bb644aa55

commit e02f94b59fa148e354db57a0f5d38f6bb644aa55
Author: j.isorce <j.isorce@samsung.com>
Date: Tue Jun 14 08:31:03 2016

Do not call CollectContextGraphicsInfo if GL has failed to initialize

If the gpu process is not yet launch when calling
GpuDataManagerImplPrivate::RequestCompleteGpuInfoIfNeeded()
it will cause to start the gpu process and send the message
GpuMsg_CollectGraphicsInfo right away.
(possible causes: browser_bridge.js, SystemInfoHandler::GetInfo,
GPUFeatureChecker)

On gpu side this will cause to call GpuMain,
GpuChildThread's constructor, OnInitialize and
OnCollectGraphicsInfo sequentially.

If dead_on_arrival_ is true then GpuChildThread::OnInitialize
calls base::MessageLoop::current()->QuitWhenIdle() which might
let handle the pending GpuMsg_CollectGraphicsInfo message.

BUG= 619106 

R=jbauman@chromium.org, kbr@chromium.org, piman@chromium.org, zmo@chromium.org

Review-Url: https://codereview.chromium.org/2061953002
Cr-Commit-Position: refs/heads/master@{#399669}

[modify] https://crrev.com/e02f94b59fa148e354db57a0f5d38f6bb644aa55/content/gpu/gpu_child_thread.cc

Project Member

Comment 12 by bugdroid1@chromium.org, Jun 15 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e02f94b59fa148e354db57a0f5d38f6bb644aa55

commit e02f94b59fa148e354db57a0f5d38f6bb644aa55
Author: j.isorce <j.isorce@samsung.com>
Date: Tue Jun 14 08:31:03 2016

Do not call CollectContextGraphicsInfo if GL has failed to initialize

If the gpu process is not yet launch when calling
GpuDataManagerImplPrivate::RequestCompleteGpuInfoIfNeeded()
it will cause to start the gpu process and send the message
GpuMsg_CollectGraphicsInfo right away.
(possible causes: browser_bridge.js, SystemInfoHandler::GetInfo,
GPUFeatureChecker)

On gpu side this will cause to call GpuMain,
GpuChildThread's constructor, OnInitialize and
OnCollectGraphicsInfo sequentially.

If dead_on_arrival_ is true then GpuChildThread::OnInitialize
calls base::MessageLoop::current()->QuitWhenIdle() which might
let handle the pending GpuMsg_CollectGraphicsInfo message.

BUG= 619106 

R=jbauman@chromium.org, kbr@chromium.org, piman@chromium.org, zmo@chromium.org

Review-Url: https://codereview.chromium.org/2061953002
Cr-Commit-Position: refs/heads/master@{#399669}

[modify] https://crrev.com/e02f94b59fa148e354db57a0f5d38f6bb644aa55/content/gpu/gpu_child_thread.cc

Can we mark this issue as fixed ? Just adding a last note about what jbauman suggested in comment #8: "Another option is to get rid of the dead on arrival state and die immediately after creating a GPU context fails. That should work now that the browser process can detect when processes die before creating an IPC channel."

Comment 14 by kbr@chromium.org, Jun 28 2016

Status: Fixed (was: Started)
Thanks for working on this Julien. chromium-try-flakes isn't reporting any recent flakes in this test:
http://chromium-try-flakes.appspot.com/search?q=telemetry_unittests%20(with%20patch)

Closing as fixed. Thanks.

Sign in to add a comment