Issue metadata
Sign in to add a comment
|
WebGL2 conformance test misc_uninitialized_test_2 is failing/flaky |
||||||||||||||||||||||||
Issue descriptionmisc_uninitialized_test_2 is failing on Linux Release (NVidia), and more frequently on Mac trybots: https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28NVIDIA%29/builds/45586 https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel
,
Dec 6 2016
On Mac, it seems to be asserting in ScriptWrappableVisitorVerifier during GC; maybe this is a dupe of the flake that Ken just fixed? WebglConformance_conformance2_misc_uninitialized_test_2 (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... [21757:775:1206/133045.159228:FATAL:ScriptWrappableVisitorVerifier.h(54)] Check failed: false. 0 Chromium Framework 0x000000010b4158d3 _ZN4base5debug10StackTraceC1Ev + 19 1 Chromium Framework 0x000000010b439897 _ZN7logging10LogMessageD2Ev + 71 2 Chromium Framework 0x000000010ef47098 _ZNK5blink30ScriptWrappableVisitorVerifier18pushToMarkingDequeEPFvPKNS_14WrapperVisitorEPKvEPFPNS_16HeapObjectHeaderES5_ES5_ + 168 3 Chromium Framework 0x00000001106ea626 _ZNK5blink16WebGLFramebuffer13traceWrappersEPKNS_14WrapperVisitorE + 342 4 Chromium Framework 0x000000010ef453a5 _ZN5blink22ScriptWrappableVisitor13TraceEpilogueEv + 373 5 Chromium Framework 0x0000000109ac0de1 _ZN2v88internal20MarkCompactCollector15MarkLiveObjectsEv + 3473 6 Chromium Framework 0x0000000109abff97 _ZN2v88internal20MarkCompactCollector14CollectGarbageEv + 23 7 Chromium Framework 0x0000000109a964a3 _ZN2v88internal4Heap11MarkCompactEv + 259 8 Chromium Framework 0x0000000109a94907 _ZN2v88internal4Heap24PerformGarbageCollectionENS0_16GarbageCollectorENS_15GCCallbackFlagsE + 1095 9 Chromium Framework 0x0000000109a93a93 _ZN2v88internal4Heap14CollectGarbageENS0_16GarbageCollectorENS0_23GarbageCollectionReasonEPKcNS_15GCCallbackFlagsE + 1187 10 Chromium Framework 0x0000000109a925c5 _ZN2v88internal4Heap15HandleGCRequestEv + 277 11 Chromium Framework 0x0000000109a3baee _ZN2v88internal10StackGuard16HandleInterruptsEv + 110 12 Chromium Framework 0x0000000109d71f48 _ZN2v88internal17Runtime_InterruptEiPPNS0_6ObjectEPNS0_7IsolateE + 280 13 ??? 0x00003f05e558426e 0x0 + 69294555153006
,
Dec 7 2016
I just found this assertion failure separately in Issue 671942 . I'll duplicate this into the other bug.
,
Dec 7 2016
Issue 672105 has been merged into this issue.
,
Dec 7 2016
Apologies, this isn't the same issue. This is a bug in the management of WebGLFramebuffer's attachments, rather than the context. Taking this for a minute, but may have to hand this to mlippautz@ as I can't see an obvious bug.
,
Dec 7 2016
,
Dec 8 2016
It's related to the fix for Issue 666061 , though not a direct consequence. Linking the two bugs regardless.
,
Dec 8 2016
There are two changes in the queue for this bug right now: Add WebGLContextObject::traceWrappers, and trace its context. https://codereview.chromium.org/2561733002/ Use strong references from WebGLContextGroup to WebGLRenderingContextBase. https://codereview.chromium.org/2558853003 I don't know whether either of these is the actual bug. The assertion failure seems to be happening while tracing from the WebGLFramebuffer to one of its WebGLAttachment objects, not from the WebGLFramebuffer to its WebGLRenderingContextBase (because I forgot that wrapper tracing -- fixed in https://codereview.chromium.org/2561733002/ ). Mo and I looked through the code and can't see a place where a write barrier would be being missed from either WebGLFramebuffer to one of the WebGLAttachment objects, or the WebGLAttachment objects to their contained WebGLRenderbuffer or WebGLTexture. I attempted to reproduce this with a Release build with dcheck_always_on=true and multiple runs of: ./content/test/gpu/run_gpu_integration_test.py webgl_conformance --browser=release --webgl-conformance-version=2.0.1 --test-filter=WebglConformance_conformance2_misc --extra-browser-args="--js-flags=--expose-gc" with no luck so far. If anyone can help please feel free to take this bug from me. Eliminating this source of flakiness is urgent. Thanks.
,
Dec 8 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/aa5cbb1a6f3500536feec0c97acdd72e1c56dbb4 commit aa5cbb1a6f3500536feec0c97acdd72e1c56dbb4 Author: kbr <kbr@chromium.org> Date: Thu Dec 08 05:00:31 2016 Use strong references from WebGLContextGroup to WebGLRenderingContextBase. Making the Oilpan references weak but the V8 wrapper references strong is causing confusion at least, and bugs at worst. Not clear this will fix the referenced bug, but should do no harm. BUG= 671791 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2558853003 Cr-Commit-Position: refs/heads/master@{#437183} [modify] https://crrev.com/aa5cbb1a6f3500536feec0c97acdd72e1c56dbb4/third_party/WebKit/Source/modules/webgl/WebGLContextGroup.cpp [modify] https://crrev.com/aa5cbb1a6f3500536feec0c97acdd72e1c56dbb4/third_party/WebKit/Source/modules/webgl/WebGLContextGroup.h
,
Dec 8 2016
https://codereview.chromium.org/2558853003 definitely doesn't fix the problem. This try job https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/5572 for it failed that test. https://codereview.chromium.org/2561733002/ won't fix it, either. The try job https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/5575 failed the test too: WebglConformance_conformance2_misc_uninitialized_test_2 (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... [27599:779:1207/185112.007029:FATAL:ScriptWrappableVisitorVerifier.h(54)] Check failed: false. 0 Chromium Framework 0x0000000104912ad3 _ZN4base5debug10StackTraceC1Ev + 19 1 Chromium Framework 0x0000000104936a97 _ZN7logging10LogMessageD2Ev + 71 2 Chromium Framework 0x0000000108447038 _ZNK5blink30ScriptWrappableVisitorVerifier18pushToMarkingDequeEPFvPKNS_14WrapperVisitorEPKvEPFPNS_16HeapObjectHeaderES5_ES5_ + 168 3 Chromium Framework 0x0000000109be7926 _ZNK5blink16WebGLFramebuffer13traceWrappersEPKNS_14WrapperVisitorE + 342 4 Chromium Framework 0x0000000108445375 _ZN5blink22ScriptWrappableVisitor13TraceEpilogueEv + 373 5 Chromium Framework 0x0000000102fb3cc1 _ZN2v88internal20MarkCompactCollector15MarkLiveObjectsEv + 3473 6 Chromium Framework 0x0000000102fb2e77 _ZN2v88internal20MarkCompactCollector14CollectGarbageEv + 23 7 Chromium Framework 0x0000000102f89543 _ZN2v88internal4Heap11MarkCompactEv + 259 8 Chromium Framework 0x0000000102f879a7 _ZN2v88internal4Heap24PerformGarbageCollectionENS0_16GarbageCollectorENS_15GCCallbackFlagsE + 1095 9 Chromium Framework 0x0000000102f86b26 _ZN2v88internal4Heap14CollectGarbageENS0_16GarbageCollectorENS0_23GarbageCollectionReasonEPKcNS_15GCCallbackFlagsE + 1206 10 Chromium Framework 0x0000000102f85645 _ZN2v88internal4Heap15HandleGCRequestEv + 277 11 Chromium Framework 0x0000000102f2e9de _ZN2v88internal10StackGuard16HandleInterruptsEv + 110 12 Chromium Framework 0x0000000103266718 _ZN2v88internal17Runtime_InterruptEiPPNS0_6ObjectEPNS0_7IsolateE + 280 13 ??? 0x00000e67f590426e 0x0 + 15839664292462 (It was also affected by http://crbug.com/672367 which caused a couple of other spurious test failures.) I've re-CQ'd https://codereview.chromium.org/2561733002/ but don't know what is going on. Michael, can you take this over during your work day tomorrow and see what you can find? The bug might be reproducible on Linux Release builds with dcheck_always_on=true, but I haven't had any luck reproducing it. I think we might need to suppress this test failure while this is being investigated.
,
Dec 8 2016
,
Dec 8 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/758c15753b35b4ca7dd4d37a9c718ab2176504a4 commit 758c15753b35b4ca7dd4d37a9c718ab2176504a4 Author: kbr <kbr@chromium.org> Date: Thu Dec 08 06:53:59 2016 Mark conformance2/misc/uninitialized-test-2.html flaky. It's failing an assertion in the new wrapper tracing code, which is being investigated. Remove the old Linux/AMD failure suppression for this test. Checking the bots, it's not reliably failing there, so it must be flaky at worst. BUG= 671791 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel;master.tryserver.chromium.android:android_optional_gpu_tests_rel TBR=zmo@chromium.org NOTRY=true Review-Url: https://codereview.chromium.org/2557053003 Cr-Commit-Position: refs/heads/master@{#437200} [modify] https://crrev.com/758c15753b35b4ca7dd4d37a9c718ab2176504a4/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py
,
Dec 8 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/069f876903bc008aae66345a1e5d6686ff7a05b7 commit 069f876903bc008aae66345a1e5d6686ff7a05b7 Author: kbr <kbr@chromium.org> Date: Thu Dec 08 08:05:39 2016 Add WebGLContextObject::traceWrappers, and trace its context. It's not clear to me whether this could be the cause of Issue 671791 , but I can't find any other errors via code inspection. BUG= 671791 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel NOTRY=true Review-Url: https://codereview.chromium.org/2561733002 Cr-Commit-Position: refs/heads/master@{#437208} [modify] https://crrev.com/069f876903bc008aae66345a1e5d6686ff7a05b7/third_party/WebKit/Source/modules/webgl/WebGLContextObject.cpp [modify] https://crrev.com/069f876903bc008aae66345a1e5d6686ff7a05b7/third_party/WebKit/Source/modules/webgl/WebGLContextObject.h
,
Dec 8 2016
,
Dec 8 2016
I am sometimes able to reproduce in a single run on my mac using release build (with dchecks): content/test/gpu/run_gpu_integration_test.py webgl_conformance --show-stdout --browser=release --extra-browser-args="--js-flags=\"--trace-gc --max_semi_space_size=1 --trace-incremental-marking\" " --webgl-conformance-version=2.0.0 --test-filter=WebglConformance_conformance2_misc_uninitialized_test_2 By now I figured out that we crash in the verifier when we - trace a WebGLFrameBuffer which - points to a WebGLAttachment that is white, violating the marking invariant. Continuing now to figure out which attachment we are dealing with and why it is still white.
,
Dec 8 2016
The attachment is a WebGLTextureAttachment. The scenario is: (1) We create the attachment. (2) The containing WebGLFrameBuffer is still white, so the barrier doesn't fire. This is fine (3) When we reach the WebGLFrameBuffer for tracing we don't find the attachment in the attachment map, so we don't trace it. (4) The verifier finds the WebGLFrameBuffer and the attachment and complains that the attachment is still white while the buffer is black. The problem is that we don't find the attachment in step (3). Investigating further...
,
Dec 8 2016
The bug is the following (repeating myself to describe the full picture): (1) We create the attachment. (2) The containing WebGLFrameBuffer is still white, so the barrier doesn't fire. This is fine (3) HeapHashMap::add needs to expand its buffer, effectively scheduling an oilpan GC, *and* reporting external memory pressure to V8. (4) If we are unlucky and hit a limit in the external memory reporting API we will do an incremental marking step (or even a full GC) at this point. (5) The step marks the WebGLFrameBuffer, which doesn't yet contain the attachment (6) We ::add the attachment, but never mark it again, because the partent has been marked (7) The verifier rightfully complains that we missed marking the attachment. The bug is somewhere around (4) as doing a step in V8 now requires a consistent state in the oilpan heap.
,
Dec 8 2016
For completeness, this is the problematic call: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/heap/ThreadState.cpp?sq=package:chromium&dr=C&rcl=1481185182&l=748 Need to think a bit about how to tackle this one.
,
Dec 8 2016
One more note: As far as I see, the same problem already applies with object grouping but it is less likely because there we would require a full GC for the same scenario.
,
Dec 8 2016
Nice detective work! I think we need to add a GCForbiddenScope to a place where Oilpan is in an inconsistent state and prevent GCs (of V8 and Oilpan) from happening. For example, we already have a GCForbiddenScope in HashTable::swap because we need to guarantee the atomicity of the swap operation. https://cs.chromium.org/chromium/src/third_party/WebKit/Source/wtf/LinkedHashSet.h?q=gcforbiddenscope+file:%5Esrc/third_party/WebKit/Source/wtf/&sq=package:chromium&l=873&dr=C
,
Dec 8 2016
,
Dec 8 2016
Looks like on the Linux Release (AMD R5 230) bot, WebglConformance_conformance2_misc_uninitialized_test_2 fails consistently after Ken's patch (https://codereview.chromium.org/2557053003). First failing build: https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28AMD%20R5%20230%29/builds/316
,
Dec 8 2016
#23: The test should start passing again since I had to turn off the flag for the upcoming dev release. As for the bug: The fundamental problem here, even with object grouping, is that we can end up in a full V8 GC where there is still an object on the stack and not yet referenced from the heap. #21: Adding GCForbiddenScopes would work but I think there would be too many places for this to be feasible, e.g. all expands of vectors or hash tables. I think the only way to properly solve this is not call any V8 API that can do a GC from such inconsistent states. In this case move the call to ReportMemoryToV8 to some point where it's safe to do a GC (which doesn't have to be a safepoint though). (For the interested reader: If we would only have to deal with incremental marking steps, and not the finalizing GC, the problem is that the write barrier is executed before the object is linked into the graph. This could be solved by doing a second write barrier in the copy/move constructor. In any case this will not solve the problem of a full GC happening.)
,
Dec 8 2016
If this is limited to backing store re-allocations triggering a GC at the wrong time, we could track that on the Oilpan side as the source of the GC and instead of performing scheduleGCIfNeeded(), schedule a task to run it.
,
Dec 8 2016
Thanks for the input! The API function behind all this is AdjustAmountOfExternalMemory. It has to be able to make progress (we've had leaks in the past) so we cannot remove the capability of doing a V8 GC, effectively triggering wrapper tracing, in there. So this implies that *all* re-allocations of backing stores would need to request a task instead of doing a (conservative) GC at *all* times. While this would work, it seems like a hammer approach that significantly restricts Oilpan here. Do I miss something?
,
Dec 8 2016
No hammer involved? Conservative Oilpan GCs triggered by backing store out-of-line allocations could go ahead as per usual, but the scheduling of follow up GCs (=> reporting memory use to v8) will have to wait until back at the event loop and the task is scheduled.
,
Dec 8 2016
Thanks for the explanation. Yes, afaics this way would work.
,
Dec 8 2016
An artificial constraint to impose on the Oilpan GC though, so finding a better solution would be preferable.
,
Dec 9 2016
Does this mean that we should not call any V8 API in any constructor and destructor, because rehashing can call any constructor and destructor? If rehashing is the only problem, would it be an option to insert a GCForbiddenScope to the rehashing? V8 should take the GCForbiddenScope into account before triggering incremental marking and full GCs.
,
Dec 9 2016
mlippautz@: great diagnosis! Looking forward to a general fix. geofflang@: improving the suppression in https://codereview.chromium.org/2566603002/ .
,
Dec 9 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e535009c983cfe65f82ac47b624c37ab81bc98e0 commit e535009c983cfe65f82ac47b624c37ab81bc98e0 Author: kbr <kbr@chromium.org> Date: Fri Dec 09 05:37:47 2016 Improve precision of uninitialized-test-2.html suppression. The last change caused a reliable failure on Linux AMD. BUG= 671791 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel;master.tryserver.chromium.android:android_optional_gpu_tests_rel TBR=zmo@chromium.org,geofflang@chromium.org NOTRY=true Review-Url: https://codereview.chromium.org/2566603002 Cr-Commit-Position: refs/heads/master@{#437479} [modify] https://crrev.com/e535009c983cfe65f82ac47b624c37ab81bc98e0/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py
,
Dec 9 2016
Moved this to email to get some more feedback. The issue here (WebGL test flaky) should be solved. I will obviously only reland the feature after solving the stack issue.
,
Dec 10 2016
Thanks Michael. https://codereview.chromium.org/2558043009/ is removing the flaky suppression.
,
Dec 12 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/64967f4900a2bd54bd219deae185c793f7431117 commit 64967f4900a2bd54bd219deae185c793f7431117 Author: kbr <kbr@chromium.org> Date: Mon Dec 12 21:20:09 2016 Remove flaky suppression of uninitialized-test-2.html. Wrapper tracing's been disabled, and the underlying bug will be fixed before it's enabled again. BUG= 671791 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel;master.tryserver.chromium.android:android_optional_gpu_tests_rel NOTRY=true TBR=zmo@chromium.org Review-Url: https://codereview.chromium.org/2558043009 Cr-Commit-Position: refs/heads/master@{#437913} [modify] https://crrev.com/64967f4900a2bd54bd219deae185c793f7431117/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py
,
Dec 12 2016
|
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by senorblanco@chromium.org
, Dec 6 201653.9 KB
53.9 KB View Download