New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 671791 link

Starred by 3 users

Issue metadata

Status: Fixed
Merged: issue 671942
Owner:
Closed: Dec 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug-Regression

Blocked on:
issue 666061

Blocking:
issue 596622
issue 619264
issue 673531



Sign in to add a comment

WebGL2 conformance test misc_uninitialized_test_2 is failing/flaky

Project Member Reported by senorblanco@chromium.org, Dec 6 2016

Issue description

Seems to be a null ptr deref. Here's the shard log, in case it's helpful.
nvidia-linux-shard-log.txt
53.9 KB View Download
Cc: kbr@chromium.org
On Mac, it seems to be asserting in ScriptWrappableVisitorVerifier during GC; maybe this is a dupe of the flake that Ken just fixed?

WebglConformance_conformance2_misc_uninitialized_test_2 (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... [21757:775:1206/133045.159228:FATAL:ScriptWrappableVisitorVerifier.h(54)] Check failed: false. 
0 Chromium Framework 0x000000010b4158d3 _ZN4base5debug10StackTraceC1Ev + 19
1 Chromium Framework 0x000000010b439897 _ZN7logging10LogMessageD2Ev + 71
2 Chromium Framework 0x000000010ef47098 _ZNK5blink30ScriptWrappableVisitorVerifier18pushToMarkingDequeEPFvPKNS_14WrapperVisitorEPKvEPFPNS_16HeapObjectHeaderES5_ES5_ + 168
3 Chromium Framework 0x00000001106ea626 _ZNK5blink16WebGLFramebuffer13traceWrappersEPKNS_14WrapperVisitorE + 342
4 Chromium Framework 0x000000010ef453a5 _ZN5blink22ScriptWrappableVisitor13TraceEpilogueEv + 373
5 Chromium Framework 0x0000000109ac0de1 _ZN2v88internal20MarkCompactCollector15MarkLiveObjectsEv + 3473
6 Chromium Framework 0x0000000109abff97 _ZN2v88internal20MarkCompactCollector14CollectGarbageEv + 23
7 Chromium Framework 0x0000000109a964a3 _ZN2v88internal4Heap11MarkCompactEv + 259
8 Chromium Framework 0x0000000109a94907 _ZN2v88internal4Heap24PerformGarbageCollectionENS0_16GarbageCollectorENS_15GCCallbackFlagsE + 1095
9 Chromium Framework 0x0000000109a93a93 _ZN2v88internal4Heap14CollectGarbageENS0_16GarbageCollectorENS0_23GarbageCollectionReasonEPKcNS_15GCCallbackFlagsE + 1187
10 Chromium Framework 0x0000000109a925c5 _ZN2v88internal4Heap15HandleGCRequestEv + 277
11 Chromium Framework 0x0000000109a3baee _ZN2v88internal10StackGuard16HandleInterruptsEv + 110
12 Chromium Framework 0x0000000109d71f48 _ZN2v88internal17Runtime_InterruptEiPPNS0_6ObjectEPNS0_7IsolateE + 280
13 ??? 0x00003f05e558426e 0x0 + 69294555153006

Comment 3 by kbr@chromium.org, Dec 7 2016

Mergedinto: 671942
Status: Duplicate (was: Untriaged)
I just found this assertion failure separately in  Issue 671942 . I'll duplicate this into the other bug.

Cc: zmo@chromium.org ccameron@chromium.org senorblanco@chromium.org yunchao...@intel.com yang...@intel.com
 Issue 672105  has been merged into this issue.

Comment 5 by kbr@chromium.org, Dec 7 2016

Cc: mlippautz@chromium.org
Components: Blink>JavaScript>GC
Labels: -Type-Bug -Pri-3 OS-All Pri-1 Type-Bug-Regression
Owner: kbr@chromium.org
Status: Assigned (was: Duplicate)
Apologies, this isn't the same issue. This is a bug in the management of WebGLFramebuffer's attachments, rather than the context.

Taking this for a minute, but may have to hand this to mlippautz@ as I can't see an obvious bug.

Comment 6 by kbr@chromium.org, Dec 7 2016

Blocking: 596622

Comment 7 by kbr@chromium.org, Dec 8 2016

Blockedon: 666061
It's related to the fix for  Issue 666061 , though not a direct consequence. Linking the two bugs regardless.

Comment 8 by kbr@chromium.org, Dec 8 2016

Cc: sigbjo...@opera.com haraken@chromium.org
Status: Started (was: Assigned)
There are two changes in the queue for this bug right now:

Add WebGLContextObject::traceWrappers, and trace its context.
https://codereview.chromium.org/2561733002/

Use strong references from WebGLContextGroup to WebGLRenderingContextBase.
https://codereview.chromium.org/2558853003


I don't know whether either of these is the actual bug. The assertion failure seems to be happening while tracing from the WebGLFramebuffer to one of its WebGLAttachment objects, not from the WebGLFramebuffer to its WebGLRenderingContextBase (because I forgot that wrapper tracing -- fixed in https://codereview.chromium.org/2561733002/ ).

Mo and I looked through the code and can't see a place where a write barrier would be being missed from either WebGLFramebuffer to one of the WebGLAttachment objects, or the WebGLAttachment objects to their contained WebGLRenderbuffer or WebGLTexture.

I attempted to reproduce this with a Release build with dcheck_always_on=true and multiple runs of:

./content/test/gpu/run_gpu_integration_test.py webgl_conformance --browser=release --webgl-conformance-version=2.0.1 --test-filter=WebglConformance_conformance2_misc --extra-browser-args="--js-flags=--expose-gc"

with no luck so far.

If anyone can help please feel free to take this bug from me. Eliminating this source of flakiness is urgent.

Thanks.

Project Member

Comment 9 by bugdroid1@chromium.org, Dec 8 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/aa5cbb1a6f3500536feec0c97acdd72e1c56dbb4

commit aa5cbb1a6f3500536feec0c97acdd72e1c56dbb4
Author: kbr <kbr@chromium.org>
Date: Thu Dec 08 05:00:31 2016

Use strong references from WebGLContextGroup to WebGLRenderingContextBase.

Making the Oilpan references weak but the V8 wrapper references strong
is causing confusion at least, and bugs at worst.

Not clear this will fix the referenced bug, but should do no harm.

BUG= 671791 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2558853003
Cr-Commit-Position: refs/heads/master@{#437183}

[modify] https://crrev.com/aa5cbb1a6f3500536feec0c97acdd72e1c56dbb4/third_party/WebKit/Source/modules/webgl/WebGLContextGroup.cpp
[modify] https://crrev.com/aa5cbb1a6f3500536feec0c97acdd72e1c56dbb4/third_party/WebKit/Source/modules/webgl/WebGLContextGroup.h

Comment 10 by kbr@chromium.org, Dec 8 2016

https://codereview.chromium.org/2558853003 definitely doesn't fix the problem. This try job https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/5572 for it failed that test.

https://codereview.chromium.org/2561733002/ won't fix it, either. The try job https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/5575 failed the test too:

WebglConformance_conformance2_misc_uninitialized_test_2 (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... [27599:779:1207/185112.007029:FATAL:ScriptWrappableVisitorVerifier.h(54)] Check failed: false. 
0 Chromium Framework 0x0000000104912ad3 _ZN4base5debug10StackTraceC1Ev + 19
1 Chromium Framework 0x0000000104936a97 _ZN7logging10LogMessageD2Ev + 71
2 Chromium Framework 0x0000000108447038 _ZNK5blink30ScriptWrappableVisitorVerifier18pushToMarkingDequeEPFvPKNS_14WrapperVisitorEPKvEPFPNS_16HeapObjectHeaderES5_ES5_ + 168
3 Chromium Framework 0x0000000109be7926 _ZNK5blink16WebGLFramebuffer13traceWrappersEPKNS_14WrapperVisitorE + 342
4 Chromium Framework 0x0000000108445375 _ZN5blink22ScriptWrappableVisitor13TraceEpilogueEv + 373
5 Chromium Framework 0x0000000102fb3cc1 _ZN2v88internal20MarkCompactCollector15MarkLiveObjectsEv + 3473
6 Chromium Framework 0x0000000102fb2e77 _ZN2v88internal20MarkCompactCollector14CollectGarbageEv + 23
7 Chromium Framework 0x0000000102f89543 _ZN2v88internal4Heap11MarkCompactEv + 259
8 Chromium Framework 0x0000000102f879a7 _ZN2v88internal4Heap24PerformGarbageCollectionENS0_16GarbageCollectorENS_15GCCallbackFlagsE + 1095
9 Chromium Framework 0x0000000102f86b26 _ZN2v88internal4Heap14CollectGarbageENS0_16GarbageCollectorENS0_23GarbageCollectionReasonEPKcNS_15GCCallbackFlagsE + 1206
10 Chromium Framework 0x0000000102f85645 _ZN2v88internal4Heap15HandleGCRequestEv + 277
11 Chromium Framework 0x0000000102f2e9de _ZN2v88internal10StackGuard16HandleInterruptsEv + 110
12 Chromium Framework 0x0000000103266718 _ZN2v88internal17Runtime_InterruptEiPPNS0_6ObjectEPNS0_7IsolateE + 280
13 ??? 0x00000e67f590426e 0x0 + 15839664292462


(It was also affected by  http://crbug.com/672367  which caused a couple of other spurious test failures.)

I've re-CQ'd https://codereview.chromium.org/2561733002/ but don't know what is going on. Michael, can you take this over during your work day tomorrow and see what you can find? The bug might be reproducible on Linux Release builds with dcheck_always_on=true, but I haven't had any luck reproducing it. I think we might need to suppress this test failure while this is being investigated.

Comment 11 by kbr@chromium.org, Dec 8 2016

Owner: mlippautz@chromium.org
Project Member

Comment 12 by bugdroid1@chromium.org, Dec 8 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/758c15753b35b4ca7dd4d37a9c718ab2176504a4

commit 758c15753b35b4ca7dd4d37a9c718ab2176504a4
Author: kbr <kbr@chromium.org>
Date: Thu Dec 08 06:53:59 2016

Mark conformance2/misc/uninitialized-test-2.html flaky.

It's failing an assertion in the new wrapper tracing code, which is
being investigated.

Remove the old Linux/AMD failure suppression for this test. Checking
the bots, it's not reliably failing there, so it must be flaky at
worst.

BUG= 671791 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel;master.tryserver.chromium.android:android_optional_gpu_tests_rel
TBR=zmo@chromium.org
NOTRY=true

Review-Url: https://codereview.chromium.org/2557053003
Cr-Commit-Position: refs/heads/master@{#437200}

[modify] https://crrev.com/758c15753b35b4ca7dd4d37a9c718ab2176504a4/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py

Project Member

Comment 13 by bugdroid1@chromium.org, Dec 8 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/069f876903bc008aae66345a1e5d6686ff7a05b7

commit 069f876903bc008aae66345a1e5d6686ff7a05b7
Author: kbr <kbr@chromium.org>
Date: Thu Dec 08 08:05:39 2016

Add WebGLContextObject::traceWrappers, and trace its context.

It's not clear to me whether this could be the cause of  Issue 671791 ,
but I can't find any other errors via code inspection.

BUG= 671791 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
NOTRY=true

Review-Url: https://codereview.chromium.org/2561733002
Cr-Commit-Position: refs/heads/master@{#437208}

[modify] https://crrev.com/069f876903bc008aae66345a1e5d6686ff7a05b7/third_party/WebKit/Source/modules/webgl/WebGLContextObject.cpp
[modify] https://crrev.com/069f876903bc008aae66345a1e5d6686ff7a05b7/third_party/WebKit/Source/modules/webgl/WebGLContextObject.h

Comment 14 by kbr@chromium.org, Dec 8 2016

Blocking: 619264
I am sometimes able to reproduce in a single run on my mac using release build (with dchecks): content/test/gpu/run_gpu_integration_test.py webgl_conformance --show-stdout --browser=release --extra-browser-args="--js-flags=\"--trace-gc --max_semi_space_size=1 --trace-incremental-marking\" " --webgl-conformance-version=2.0.0 --test-filter=WebglConformance_conformance2_misc_uninitialized_test_2

By now I figured out that we crash in the verifier when we
- trace a WebGLFrameBuffer which
- points to a WebGLAttachment that is white, violating the marking invariant.

Continuing now to figure out which attachment we are dealing with and why it is still white.
The attachment is a WebGLTextureAttachment. The scenario is:
(1) We create the attachment.
(2) The containing WebGLFrameBuffer is still white, so the barrier doesn't fire. This is fine
(3) When we reach the WebGLFrameBuffer for tracing we don't find the attachment in the attachment map, so we don't trace it.
(4) The verifier finds the WebGLFrameBuffer and the attachment and complains that the attachment is still white while the buffer is black.

The problem is that we don't find the attachment in step (3). Investigating further...
Cc: hlopko@chromium.org
The bug is the following (repeating myself to describe the full picture):

(1) We create the attachment.
(2) The containing WebGLFrameBuffer is still white, so the barrier doesn't fire. This is fine
(3) HeapHashMap::add needs to expand its buffer, effectively scheduling an oilpan GC, *and* reporting external memory pressure to V8.
(4) If we are unlucky and hit a limit in the external memory reporting API we will do an incremental marking step (or even a full GC) at this point. 
(5) The step marks the WebGLFrameBuffer, which doesn't yet contain the attachment
(6) We ::add the attachment, but never mark it again, because the partent has been marked
(7) The verifier rightfully complains that we missed marking the attachment.

The bug is somewhere around (4) as doing a step in V8 now requires a consistent state in the oilpan heap.
For completeness, this is the problematic call: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/heap/ThreadState.cpp?sq=package:chromium&dr=C&rcl=1481185182&l=748

Need to think a bit about how to tackle this one.

Comment 19 Deleted

Cc: -mlippautz@chromium.org jochen@chromium.org
One more note: As far as I see, the same problem already applies with object grouping but it is less likely because there we would require a full GC for the same scenario.
Nice detective work!

I think we need to add a GCForbiddenScope to a place where Oilpan is in an inconsistent state and prevent GCs (of V8 and Oilpan) from happening.

For example, we already have a GCForbiddenScope in HashTable::swap because we need to guarantee the atomicity of the swap operation.

https://cs.chromium.org/chromium/src/third_party/WebKit/Source/wtf/LinkedHashSet.h?q=gcforbiddenscope+file:%5Esrc/third_party/WebKit/Source/wtf/&sq=package:chromium&l=873&dr=C


Labels: Hotlist-PixelWrangler
Looks like on the Linux Release (AMD R5 230) bot, WebglConformance_conformance2_misc_uninitialized_test_2 fails consistently after Ken's patch (https://codereview.chromium.org/2557053003).

First failing build: https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28AMD%20R5%20230%29/builds/316
#23: The test should start passing again since I had to turn off the flag for the upcoming dev release.

As for the bug: The fundamental problem here, even with object grouping, is that we can end up in a full V8 GC where there is still an object on the stack and not yet referenced from the heap. 

#21: Adding GCForbiddenScopes would work but I think there would be too many places for this to be feasible, e.g. all expands of vectors or hash tables. I think the only way to properly solve this is not call any V8 API that can do a GC from such inconsistent states. In this case move the call to ReportMemoryToV8 to some point where it's safe to do a GC (which doesn't have to be a safepoint though).

(For the interested reader: If we would only have to deal with incremental marking steps, and not the finalizing GC, the problem is that the write barrier is executed before the object is linked into the graph. This could be solved by doing a second write barrier in the copy/move constructor. In any case this will not solve the problem of a full GC happening.)
If this is limited to backing store re-allocations triggering a GC at the wrong time,  we could track that on the Oilpan side as the source of the GC and instead of performing scheduleGCIfNeeded(), schedule a task to run it.
Thanks for the input!

The API function behind all this is AdjustAmountOfExternalMemory. It has to be able to make progress (we've had leaks in the past) so we cannot remove the capability of doing a V8 GC, effectively triggering wrapper tracing, in there.

So this implies that *all* re-allocations of backing stores would need to request a task instead of doing a (conservative) GC at *all* times. While this would work, it seems like a hammer approach that significantly restricts Oilpan here. 

Do I miss something?
No hammer involved?

Conservative Oilpan GCs triggered by backing store out-of-line allocations could go ahead as per usual, but the scheduling of follow up GCs (=> reporting memory use to v8) will have to wait until back at the event loop and the task is scheduled.
Thanks for the explanation. Yes, afaics this way would work.
An artificial constraint to impose on the Oilpan GC though, so finding a better solution would be preferable.
Does this mean that we should not call any V8 API in any constructor and destructor, because rehashing can call any constructor and destructor?

If rehashing is the only problem, would it be an option to insert a GCForbiddenScope to the rehashing? V8 should take the GCForbiddenScope into account before triggering incremental marking and full GCs.



Comment 31 by kbr@chromium.org, Dec 9 2016

mlippautz@: great diagnosis! Looking forward to a general fix.

geofflang@: improving the suppression in https://codereview.chromium.org/2566603002/ .

Project Member

Comment 32 by bugdroid1@chromium.org, Dec 9 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e535009c983cfe65f82ac47b624c37ab81bc98e0

commit e535009c983cfe65f82ac47b624c37ab81bc98e0
Author: kbr <kbr@chromium.org>
Date: Fri Dec 09 05:37:47 2016

Improve precision of uninitialized-test-2.html suppression.

The last change caused a reliable failure on Linux AMD.

BUG= 671791 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel;master.tryserver.chromium.android:android_optional_gpu_tests_rel
TBR=zmo@chromium.org,geofflang@chromium.org
NOTRY=true

Review-Url: https://codereview.chromium.org/2566603002
Cr-Commit-Position: refs/heads/master@{#437479}

[modify] https://crrev.com/e535009c983cfe65f82ac47b624c37ab81bc98e0/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py

Status: Fixed (was: Started)
Moved this to email to get some more feedback. The issue here (WebGL test flaky) should be solved. 

I will obviously only reland the feature after solving the stack issue.

Comment 34 by kbr@chromium.org, Dec 10 2016

Thanks Michael. https://codereview.chromium.org/2558043009/ is removing the flaky suppression.

Project Member

Comment 35 by bugdroid1@chromium.org, Dec 12 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/64967f4900a2bd54bd219deae185c793f7431117

commit 64967f4900a2bd54bd219deae185c793f7431117
Author: kbr <kbr@chromium.org>
Date: Mon Dec 12 21:20:09 2016

Remove flaky suppression of uninitialized-test-2.html.

Wrapper tracing's been disabled, and the underlying bug will be fixed
before it's enabled again.

BUG= 671791 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel;master.tryserver.chromium.android:android_optional_gpu_tests_rel
NOTRY=true

TBR=zmo@chromium.org

Review-Url: https://codereview.chromium.org/2558043009
Cr-Commit-Position: refs/heads/master@{#437913}

[modify] https://crrev.com/64967f4900a2bd54bd219deae185c793f7431117/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py

Comment 36 by kbr@chromium.org, Dec 12 2016

Blocking: 673531

Sign in to add a comment