New issue
Advanced search Search tips

Issue 617847 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug

Blocking:
issue 525259
issue 608923



Sign in to add a comment

context_lost_tests failed with FATAL:gpu_raster_buffer_provider.cc(192)] Check failed: sync_token.HasData()

Project Member Reported by kbr@chromium.org, Jun 7 2016

Issue description

https://build.chromium.org/p/chromium.gpu/builders/Mac%2010.10%20Retina%20Release%20%28AMD%29/builds/9301

GpuCrash.GPUProcessCrashesExactlyOnce failed with:

[42461:21763:0606/201918:FATAL:gpu_raster_buffer_provider.cc(192)] Check failed: sync_token.HasData(). 
0   Chromium Framework                  0x00000001058edb73 _ZN4base5debug10StackTraceC1Ev + 19
1   Chromium Framework                  0x000000010590c357 _ZN7logging10LogMessageD2Ev + 71
2   Chromium Framework                  0x0000000106839a6b _ZN2cc23GpuRasterBufferProvider22PlaybackOnWorkerThreadEPNS_16ResourceProvider17ScopedWriteLockGLERKN3gpu9SyncTokenEbPKNS_12RasterSourceERKN3gfx4RectESE_yfRKNS8_16PlaybackSettingsE + 235
3   Chromium Framework                  0x000000010683988c _ZN2cc23GpuRasterBufferProvider16RasterBufferImpl8PlaybackEPKNS_12RasterSourceERKN3gfx4RectES8_yfRKNS2_16PlaybackSettingsE + 108
4   Chromium Framework                  0x0000000106889e1a _ZN2cc12_GLOBAL__N_114RasterTaskImpl17RunOnWorkerThreadEv + 426
5   Chromium Framework                  0x000000010a91f469 _ZN7content21CategorizedWorkerPool33RunTaskInCategoryWithLockAcquiredEN2cc12TaskCategoryE + 137
6   Chromium Framework                  0x000000010a91e53c _ZN7content21CategorizedWorkerPool3RunERKNSt3__16vectorIN2cc12TaskCategoryENS1_9allocatorIS4_EEEEPN4base17ConditionVariableE + 156
7   Chromium Framework                  0x00000001059638dd _ZN4base12SimpleThread10ThreadMainEv + 125
8   Chromium Framework                  0x000000010595f578 _ZN4base12_GLOBAL__N_110ThreadFuncEPv + 104
9   libsystem_pthread.dylib             0x00007fff8d20905a _pthread_body + 131
10  libsystem_pthread.dylib             0x00007fff8d208fd7 _pthread_body + 0
11  libsystem_pthread.dylib             0x00007fff8d2063ed thread_start + 13

This is GPU rasterization, correct? The code needs to be made more robust to lost contexts.

Unclear how often this is happening at this point. I only saw one instance in 200 runs on this machine.


 

Comment 1 by kbr@chromium.org, Jun 7 2016

Labels: -OS-Mac OS-All
Another failure:
https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20Debug%20%28New%20Intel%29/builds/511

Happening across platforms.

Comment 2 by kbr@chromium.org, Jun 7 2016

To clarify: the crash in #1 is slightly different:

[4052:1676:0607/090944:FATAL:one_copy_raster_buffer_provider.cc(243)] Check failed: sync_token.HasData(). 

Full stdout attached.

stdout.txt
104 KB View Download

Comment 3 by kbr@chromium.org, Jun 7 2016

Labels: -Pri-2 Pri-1
After discussion with ericrk@, raising to P1 because it's showing up often enough on the waterfalls. Not sure whether this is affecting the CQ, but it's likely.

Comment 4 by kbr@chromium.org, Jun 7 2016

Blocking: 525259
Cc: sunn...@chromium.org geoffl...@chromium.org
Status: WontFix (was: Assigned)
After discussion with ericrk@ it looks like https://codereview.chromium.org/1951193002/ was the cause of these crashes. This was already reverted in https://codereview.chromium.org/2046033002/ . The build above, https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20Debug%20%28New%20Intel%29/builds/511 , did not contain the revert, but the next job, https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20Debug%20%28New%20Intel%29/builds/512 , does.

Closing as WontFix. Let's be careful to make sure these sorts of flaky failures are sorted out before re-landing.

Comment 5 by kbr@chromium.org, Jun 8 2016

Blocking: 608923
Cc: piman@chromium.org ericrk@chromium.org
Owner: sunn...@chromium.org
Status: Started (was: WontFix)
Re-opening this so that I can keep track of it. I haven't been able to reproduce this on my macbook despite hundreds of runs of the test. I've stared at the code for too long and can't see how this bug could ever happen.

+piman@

Comment 7 by piman@chromium.org, Jun 9 2016

I suspect what happens is that GenUnverifiedSyncTokenCHROMIUM fails because IsFenceSyncFlushed fails because the channel is lost (like most lost context things, this is fundamentally racy hence the flakiness). So *RasterBufferProvider::OrderingBarrier doesn't properly generate a token and the assert triggers on the worker thread when we want to wait for it.

Note that the condition doesn't cause actual problems in prod (the channel is lost anyway), so it may just be a matter of fine-tuning checks.

My suggestion is maybe to abort tasks if the token is invalid in OrderingBarrier (after DCHECK'ing the context is actually lost) - no point in running the worker threads if the GPU process is gone, might as well abort early and go through recovery as normal.
Status: Fixed (was: Started)
Relanded the CL (https://codereview.chromium.org/1951193002/) - we prevent this DCHECK from triggering by never scheduling tasks if we detect that the context is lost while generating the sync token (RasterBufferProvider::OrderingBarrier).

Sign in to add a comment