New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 842782 link

Starred by 3 users

Issue metadata

Status: WontFix
Owner:
OOO until 2019-01-24
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug

Blocking:
issue 796386



Sign in to add a comment

Flaky "Gathering system GPU info" timeout during GPU tests on Windows

Project Member Reported by ynovikov@chromium.org, May 14 2018

Issue description

For example, in https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win10%20FYI%20Exp%20Release%20%28Intel%20HD%20630%29/54 this happened during webgl2_conformance_tests, webgl_conformance_d3d9_passthrough_tests and webgl_conformance_d3d9_tests.

https://chromium-swarm.appspot.com/task?id=3d77aad3bb276d10&refresh=10&show_raw=1
https://chromium-swarm.appspot.com/task?id=3d77d70017fd2f10&refresh=10&show_raw=1
https://chromium-swarm.appspot.com/task?id=3d77e1fb60967f10&refresh=10&show_raw=1

[4320:2528:0514/072153.706:FATAL:system_info_handler.cc(191)] Check failed: false. Gathering system GPU info took more than 5 seconds.

 

Comment 1 by zmo@chromium.org, May 14 2018

Owner: cwallez@chromium.org
Status: Assigned (was: Untriaged)
cwallez, I suspect it's due to we switched to use ANGLE info collection code on Windows.

Comment 2 by zmo@chromium.org, May 14 2018

Cc: yang...@intel.com yunchao...@intel.com
Owner: ----
Status: Available (was: Assigned)
Sorry, just checked back from the beginning. Even the first build has this issue.

I guess it's something with the new driver that caused info collection takes longer
Just saw it on another bot.
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win7%20FYI%20Release%20%28AMD%29/1231

Can this affect Chrome start up on user machines? Maybe this is higher priority?
Cc: -cwallez@chromium.org
Labels: -Pri-2 Pri-1
Owner: cwallez@chromium.org
Summary: Flaky "Gathering system GPU info" timeout during WebGL tests on Windows (was: Flaky "Gathering system GPU info" timeout during WebGL tests on Win10 FYI Exp Release (Intel HD 630) GPU.FYI bot)
Also here
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win7%20FYI%20Release%20(NVIDIA)/997

Corentin, PTAL.
Summary: Flaky "Gathering system GPU info" timeout during GPU tests on Windows (was: Flaky "Gathering system GPU info" timeout during WebGL tests on Windows)
I tried loading one of the minidumps to know where the code is stuck in the GPU process but even with the Chromium symbol server, Visual studio is not able to show any useful stack trace. zmo@ do you know if there is a way to get that information from the minidump or the logs?

Comment 7 by kbr@chromium.org, May 18 2018

Cc: u...@chromium.org sunn...@chromium.org
sunnyps@ or ulan@ may be able to help you. I'm not sure whether it will help if you download the isolate which had the binaries which produced that minidump. See ./tools/swarming_client/isolateserver.py help download and use the hash for the isolated inputs. For example, for this failure:

https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win7%20FYI%20Release%20(NVIDIA)/997

and this Swarming task:

https://chromium-swarm.appspot.com/task?id=3d89a16b58cd8710&refresh=10&show_raw=1

the isolated input is:
59618d4d8841c30ad35a47f01efaafd89ed81a56

Comment 8 by zmo@chromium.org, May 18 2018

I suggest after the next Thursday branch cut, add a bunch of temporary logs and land them so we can see where the bots get stuck.

Comment 9 by zmo@chromium.org, May 18 2018

Cc: nednguyen@chromium.org charliea@chromium.org sullivan@chromium.org mythria@chromium.org lukasza@chromium.org perezju@chromium.org
 Issue 842180  has been merged into this issue.
Components: Internals>Sandbox>SiteIsolation
Adding Internals>Sandbox>SiteIsolation component because r552589 was blamed as the culprit both in  https://crbug.com/842180#c5  and in https://crbug.com/839470#c6

Comment 11 by u...@chromium.org, May 23 2018

The log in https://chromium-swarm.appspot.com/task?id=3d89a16b58cd8710&refresh=10&show_raw=1 contains a stack trace that seems to be from GPU process.

Found Minidump: True
Stack Trace:
********************************************************************************
	Last event: 11d8.ae8: Access violation - code c0000005 (first/second chance not available)
	  debugger time: Thu May 17 18:15:57.276 2018 (UTC - 7:00)
	ChildEBP RetAddr  Args to Child              
	WARNING: Stack unwind information not available. Following frames may be wrong.
	0287e890 6a97fa75 000003b1 00000000 075a38a8 nvwgf2um!OpenAdapter12+0x1860da
	0287e8a8 6a97fa75 00000016 0084c400 0085c88c nvwgf2um!OpenAdapter12+0x186125
	0287e8c0 6a97d5ae 00000018 0287e998 0287e91c nvwgf2um!OpenAdapter12+0x186125
	0287e8d8 6a97b234 0085c88c 00000001 0287e8f4 nvwgf2um!OpenAdapter12+0x183c5e
	0287e8fc 6a96cb06 075a1600 05f01d98 0287e91c nvwgf2um!OpenAdapter12+0x1818e4
	0287e97c 6a96c609 0287e998 0761b8d8 007cf6e0 nvwgf2um!OpenAdapter12+0x1731b6
	0287e9b8 6a96c1c6 0761b8d8 05f01d98 007cf6e0 nvwgf2um!OpenAdapter12+0x172cb9
	0287ea30 6a903c37 0609ea90 05f01d98 0609a730 nvwgf2um!OpenAdapter12+0x172876
	0287ea64 6a8d506f 00000001 00830000 008069f0 nvwgf2um!OpenAdapter12+0x10a2e7
	0287eb00 6a8d5878 40001440 00000000 00000000 nvwgf2um!OpenAdapter12+0xdb71f
	0287f778 6a8d5be1 00000006 00827880 00000000 nvwgf2um!OpenAdapter12+0xdbf28
	0287f7d0 6b09c321 00806080 076a1658 00806080 nvwgf2um!OpenAdapter12+0xdc291
	0287f82c 6a919a99 00806080 076a1658 04d793e8 nvwgf2um!NVAPI_Thunk+0x25c4f1
	0287f850 6a9199bd 0287f868 0080582c 00000000 nvwgf2um!OpenAdapter12+0x120149
	0287f874 6aed6a95 04d8b710 0287f8b8 6b24385e nvwgf2um!OpenAdapter12+0x12006d
	0287f880 6b24385e 00805920 54ec7e1c 00000000 nvwgf2um!NVAPI_Thunk+0x96c65
	0287f8b8 6b243986 00000000 0287f8d0 7502336a nvwgf2um!NVAPI_Thunk+0x403a2e
	*** WARNING: Unable to verify checksum for kernel32.dll
	*** ERROR: Symbol file could not be found.  Defaulted to export symbols for kernel32.dll - 
	0287f8c4 7502336a 04d793e8 0287f910 771d98f2 nvwgf2um!NVAPI_Thunk+0x403b56
	0287f8d0 771d98f2 04d793e8 752b90f9 00000000 kernel32!BaseThreadInitThunk+0x12
	0287f910 771d98c5 6b24390a 04d793e8 ffffffff ntdll!RtlInitializeExceptionChain+0x63
	0287f928 00000000 6b24390a 04d793e8 00000000 ntdll!RtlInitializeExceptionChain+0x36
	
	   0  Id: 11d8.12cc Suspend: 0 Teb: fffdd000 Unfrozen
	ChildEBP RetAddr  Args to Child              
	WARNING: Stack unwind information not available. Following frames may be wrong.
	002fd514 75021194 00000288 000003e8 00000000 ntdll!NtWaitForSingleObject+0x15
	002fd52c 75021148 00000288 000003e8 00000000 kernel32!WaitForSingleObjectEx+0x43
	002fd540 6a919d55 00000288 000003e8 0000009a kernel32!WaitForSingleObject+0x12
	002fda70 6a91acca 00000007 0000009a 00000000 nvwgf2um!OpenAdapter12+0x120405
	002fda8c 6a91b06b 0000009a 00000061 00000000 nvwgf2um!OpenAdapter12+0x12137a
	002fdac4 6a91d1fd 00000000 00000000 002fdaf0 nvwgf2um!OpenAdapter12+0x12171b
	002fdaf8 6a935c70 00000000 00000000 04dec238 nvwgf2um!OpenAdapter12+0x1238ad
	*** WARNING: Unable to verify checksum for d3d11.dll
	*** ERROR: Symbol file could not be found.  Defaulted to export symbols for d3d11.dll - 
	002fdb28 6cc9dead 008055c4 56444a4c 04deb9c8 nvwgf2um!OpenAdapter12+0x13c320
	*** WARNING: Unable to verify checksum for dxgi.dll
	*** ERROR: Symbol file could not be found.  Defaulted to export symbols for dxgi.dll - 
	002fdbac 6e2d3b21 007fb420 04dec014 00000000 d3d11!D3D11CreateDeviceAndSwapChain+0x2c2c7
	002fe090 6e2d1db1 00000002 00000000 00000000 dxgi!DXGID3D10RegisterLayers+0x6439
	*** WARNING: Unable to verify checksum for libglesv2.dll
	No .natvis files found at c:\b\s\w\ir\out\Release\cdb\Visualizers.
	No .natvis files found at C:\Users\chrome-bot\AppData\Local\Dbg\Visualizers.
	002fe0e8 6bdc76e2 04deb9c8 00000000 00000000 dxgi!DXGID3D10RegisterLayers+0x46c9
	002fe1d8 6bdc6081 00000001 00000005 00000005 libglesv2!rx::SwapChain11::present+0xa8
	002fe200 6bd55a78 04dee2a8 00000005 00000005 libglesv2!rx::SwapChain11::swapRect+0x63
	002fe22c 6bd55d20 002fe2a0 04dee2a8 00000005 libglesv2!rx::SurfaceD3D::swapRect+0x60
	002fe250 6bcafa0e 002fe2a0 04dee2a8 00000005 libglesv2!rx::SurfaceD3D::postSubBuffer+0x1c
	002fe278 6bc783e8 002fe2a0 04dee2a8 00000005 libglesv2!egl::Surface::postSubBuffer+0x2a
	*** WARNING: Unable to verify checksum for chrome_child.dll
	002fe4b0 6051e6ac 007e1ed8 04de9370 00000005 libglesv2!egl::PostSubBufferNV+0x176
	002fe5a4 608422d1 00000005 00000051 00000402 chrome_child!gl::NativeViewGLSurfaceEGL::PostSubBuffer+0xba
	002fe60c 6079c470 00000005 00000051 00000402 chrome_child!gpu::PassThroughImageTransportSurface::PostSubBuffer+0xa9
	002fe650 607906b4 00000030 00000000 00000005 chrome_child!gpu::gles2::GLES2DecoderPassthroughImpl::DoPostSubBufferCHROMIUM+0xac
	002fe68c 6075a3fc 00000000 06116258 0268e4ac chrome_child!gpu::gles2::GLES2DecoderPassthroughImpl::HandlePostSubBufferCHROMIUM+0x44
	002fe6b8 6075a01b 00000014 061160a0 00000076 chrome_child!gpu::gles2::GLES2DecoderPassthroughImpl::DoCommandsImpl<0>+0xc0
	002fe6d0 6181691f 00000014 061160a0 00000076 chrome_child!gpu::gles2::GLES2DecoderPassthroughImpl::DoCommands+0x21
	002fe800 6182607d 0000589e 0268e4a8 002fe860 chrome_child!gpu::CommandBufferService::Flush+0x163
	002fe924 61825eaf 0000589e 000000d7 00000001 chrome_child!gpu::CommandBufferStub::OnAsyncFlush+0x12f
	002fe980 61824dfc 028f13b8 0267bfe8 0267bfe8 chrome_child!IPC::MessageT<GpuCommandBufferMsg_AsyncFlush_Meta,std::tuple<int,unsigned int,bool>,void>::Dispatch<gpu::CommandBufferStub,gpu::CommandBufferStub,void,void (__thiscall gpu::CommandBufferStub::*)(int,unsigned int,bool)>+0x6f
	002fea50 61823713 028f13b8 002fea70 02906610 chrome_child!gpu::CommandBufferStub::OnMessageReceived+0x22c
	002fea64 618222a8 028f13b8 0267bfe8 02679c70 chrome_child!gpu::GpuChannel::HandleMessageHelper+0x2d
	002feb54 61823f1d 028f13b8 61822194 002feb7c chrome_child!gpu::GpuChannel::HandleMessage+0x114
	002feb70 5e735d2b 028f1398 028f1398 5649e309 chrome_child!base::internal::Invoker<base::internal::BindState<void (__thiscall gpu::GpuChannel::*)(IPC::Message const &),base::WeakPtr<gpu::GpuChannel>,IPC::Message>,void __cdecl(void)>::Run+0x39
	002feb88 6181b1a8 002febbc 002febbc 02669b58 chrome_child!base::OnceCallback<void __cdecl(void)>::Run+0x2b
	002feccc 5faede94 026bfe38 5faa4567 000012cc chrome_child!gpu::Scheduler::RunNextTask+0x332
	002fede0 5fb1b449 62862f8b 002ff0c0 002feeb8 chrome_child!base::debug::TaskAnnotator::RunTask+0x134
	002feeb4 5fad1817 002ff0c0 028ef728 63025e2c chrome_child!base::internal::IncomingTaskQueue::RunTask+0x69
	002fefdc 5fad1b6d 002ff0c0 5faa4157 0266aad0 chrome_child!base::MessageLoop::RunTask+0x207
	002ff0b8 5fad1d9a 00000000 63026090 63025e2c chrome_child!base::MessageLoop::DeferOrRunPendingTask+0x9d
	002ff218 5fb1da7c 028d29b0 028d29a8 00001201 chrome_child!base::MessageLoop::DoWork+0x1fa
	002ff23c 5fad1334 028ef728 002ff3f8 5fa8c71e chrome_child!base::MessagePumpDefault::Run+0x9c
	002ff30c 5fa97cdc 00000001 002ff3f8 62858c78 chrome_child!base::MessageLoop::Run+0x74
	002ff3e0 612c2c0a 028d2501 002ff3e8 028ff160 chrome_child!base::RunLoop::Run+0xcc
	002ff5dc 5fa60b94 002ff6cc 00000000 00000000 chrome_child!content::GpuMain+0x38e
	002ff6ac 5fa61125 002ff6e0 002ff6cc 002ff92c chrome_child!content::RunNamedProcessTypeMain+0x10c
	002ff7a8 5fa65ae5 028d4280 00000003 002ff8c8 chrome_child!content::ContentMainRunnerImpl::Run+0x117
	002ff8b8 5fa60a5f 002ff8c4 002ff8c8 6284f750 chrome_child!service_manager::Main+0x2a5
	002ff8f4 5e73111e 002ff918 002ff900 002ff8fc chrome_child!content::ContentMain+0x33
	*** WARNING: Unable to verify checksum for chrome.exe
	002ff968 00375384 00370000 002ff9b0 0c97b89f chrome_child!ChromeMain+0x11e
	002ff9f4 00371543 00370000 0c97b89f 00000002 chrome!MainDllLoader::Launch+0x230
	002ffb6c 00455df8 00370000 00000000 00763016 chrome!wWinMain+0x543

Comment 12 by kbr@chromium.org, May 26 2018

Yikes. Thanks Ulan for tracking that down.

I'm mystified because the original timeouts were reported on Win/Intel but the crash above is coming from NVIDIA's driver. This timeout of GPU info gathering on Win/Intel:
https://chromium-swarm.appspot.com/task?id=3d77aad3bb276d10&refresh=10&show_raw=1

doesn't show evidence of a GPU process crash.

Maybe the bots are slow and we should have a longer timeout, especially for the first cold start of the browser.

Comment 13 by zmo@chromium.org, May 29 2018

I am also confused by this stack in #11.

I don't understand how it is possible that we already have a functioning command buffer but the info collection is still pending and triggering a timeout.
Owner: kbr@chromium.org
I think the stack in #11 from the swarming task in #7 is a different problem where GPU process crashes in nvidia driver. #12 has a link to another task where GPU info collection timed out, and there's no GPU process crash there.

GPU info collection times out flakily, but we retry browser process creation, and it seems to succeed eventually.

The actual test failure is:

[2/136] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance2_glsl3_short_circuiting_in_loop_condition failed unexpectedly

...

  Traceback (most recent call last):
    File "c:\b\s\w\ir\third_party\catapult\telemetry\telemetry\testing\serially_executed_browser_test_case.py", line 206, in <lambda>
      return lambda self: based_method(self, *args)
    File "c:\b\s\w\ir\content\test\gpu\gpu_tests\gpu_integration_test.py", line 179, in _RunGpuTest
      self.RunActualGpuTest(url, *args)
    File "c:\b\s\w\ir\content\test\gpu\gpu_tests\webgl_conformance_integration_test.py", line 188, in RunActualGpuTest
      getattr(self, test_name)(test_path, *args[1:])
    File "c:\b\s\w\ir\content\test\gpu\gpu_tests\webgl_conformance_integration_test.py", line 202, in _RunConformanceTest
      self._CheckTestCompletion()
    File "c:\b\s\w\ir\content\test\gpu\gpu_tests\webgl_conformance_integration_test.py", line 198, in _CheckTestCompletion
      self.fail(self._WebGLTestMessages(self.tab))
  AssertionError: should be green
  at (0, 0) expected: 0,255,0,255 was 0,255,0,0
  FAIL should be green
  at (0, 0) expected: 0,255,0,255 was 0,255,0,0
  should be green
  at (0, 0) expected: 0,255,0,255 was 0,255,0,0
  FAIL should be green
  at (0, 0) expected: 0,255,0,255 was 0,255,0,0
  should be green
  at (0, 0) expected: 0,255,0,255 was 0,255,0,0
  FAIL should be green
  at (0, 0) expected: 0,255,0,255 was 0,255,0,0
  should be green
  at (0, 0) expected: 0,255,0,255 was 0,255,0,0
  FAIL should be green
  at (0, 0) expected: 0,255,0,255 was 0,255,0,0

Comment 15 by kbr@chromium.org, May 29 2018

Blocking: 796386
Status: WontFix (was: Available)
Sorry about the unreliability of this code.

I think the timeouts in system_info_handler.cc are likely caused by slow bots and slow cold startup of the browser.

Per https://cs.chromium.org/chromium/src/content/browser/devtools/protocol/system_info_handler.cc?q=system_info_handler.cc&sq=package:chromium&dr&l=184 we still can't yet assert that DevTools' system info collection will succeed because  Issue 796386  hasn't been fixed yet.

Basically, this failure is non-fatal and is being worked around by retrying browser startup in the GPU test harness. For this reason I'm closing it as WontFix.

Sign in to add a comment