conformance2/query/occlusion-query.html became flaky on several bots on GPU FYI |
|||||||||
Issue descriptionicu_56::RegexMatcher::regionStart [0x6BC81B80+0] blink::WebGLObject::detachAndDeleteObject [0x6A29F193+51] blink::WebGLFramebuffer::~WebGLFramebuffer [0x6A2A1373+19] blink::WebGLFramebuffer::`scalar deleting destructor' [0x6A2A13E8+8] blink::FinalizerTrait<blink::XPathNSResolver>::finalize [0x6A23A726+22] blink::NormalPage::sweep [0x6A0C6D2E+158] blink::BaseArena::completeSweep [0x6A0C5BDB+91] blink::ThreadState::completeSweep [0x6A0C0887+215] blink::ThreadState::scheduleV8FollowupGCIfNeeded [0x6A0C2868+24] blink::V8GCController::gcEpilogue [0x6AAB8D5B+203] v8::internal::Heap::CallGCEpilogueCallbacks [0x6BA81CC9+73] v8::internal::Heap::PerformGarbageCollection [0x6BA8C0D4+1780] v8::internal::Heap::CollectGarbage [0x6BA8245E+558] v8::internal::ScavengeJob::IdleTask::RunInternal [0x6BF48DD2+306] v8::internal::CancelableIdleTask::Run [0x6BA520FD+45] blink::V8IdleTaskAdapter::run [0x6A7A76A8+24] scheduler::WebSchedulerImpl::runIdleTask [0x6BDED6FC+44] base::internal::Invoker<base::IndexSequence<0>,base::internal::BindState<base::internal::RunnableAdapter<void (__cdecl*)(std::unique_ptr<blink::WebThread::IdleTask,std::default_delete<blink::WebThread::IdleTask> >,base::TimeTicks)>,void __cdecl(std::uniqu [0x6BDE84A7+103] scheduler::SingleThreadIdleTaskRunner::RunTask [0x6BDEE736+198] base::internal::Invoker<base::IndexSequence<0,1>,base::internal::BindState<base::internal::RunnableAdapter<void (__thiscall scheduler::SingleThreadIdleTaskRunner::*)(base::Callback<void __cdecl(base::TimeTicks),1>)>,void __cdecl(scheduler::SingleThreadIdl [0x6BDEE65D+93] base::debug::TaskAnnotator::RunTask [0x69BBAED7+247] scheduler::TaskQueueManager::ProcessTaskFromWorkQueue [0x6BDF342A+826] scheduler::TaskQueueManager::DoWork [0x6BDF2C43+387] base::internal::Invoker<base::IndexSequence<0,1,2>,base::internal::BindState<base::internal::RunnableAdapter<void (__thiscall scheduler::TaskQueueManager::*)(base::TimeTicks,bool)>,void __cdecl(scheduler::TaskQueueManager *,base::TimeTicks,bool),base::Wea [0x6BDF386F+79] base::debug::TaskAnnotator::RunTask [0x69BBAED7+247] base::MessageLoop::RunTask [0x69B90FAD+765] base::MessageLoop::DoDelayedWork [0x69B90767+215] base::MessagePumpDefault::Run [0x69BBCAAA+58] base::MessageLoop::RunHandler [0x69B90CA1+17] base::RunLoop::Run [0x69BBCCC8+88] base::MessageLoop::Run [0x69B90C7D+29] content::RendererMain [0x6ACFA479+1241] content::RunNamedProcessTypeMain [0x69B6C920+256] content::ContentMainRunnerImpl::Run [0x69B6C7E6+134] content::ContentMain [0x69B69CC3+35] ChromeMain [0x69AD3F6B+107] MainDllLoader::Launch [0x0026726C+748] wWinMain [0x00266AF7+487] __scrt_common_main_seh [0x002C0A96+253] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:255) BaseThreadInitThunk [0x7717338A+18] RtlInitializeExceptionChain [0x77849A02+99] RtlInitializeExceptionChain [0x778499D5+54]
,
Apr 13 2016
It became very flaky within 24 hours
,
Apr 13 2016
A few more affected builds: https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20Release%20%28NVIDIA%29/builds/23943 https://build.chromium.org/p/chromium.gpu.fyi/builders/Win8%20Release%20%28NVIDIA%29/builds/21593 https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%2010.10%20Retina%20Release%20%28AMD%29/builds/3856 A ton of Oilpan-related changes landed over the past couple of days. I'm still going through them but it looks like one or more of them destabilized WebGL's object destruction. Kentaro, can you please help us investigate this urgently? I realize you're not online right now and I'm continuing to dig into this.
,
Apr 13 2016
,
Apr 13 2016
More failures (trying to figure out what configuration to try to debug this on directly): https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20Release%20%28New%20Intel%29/builds/254 Linux Release (NVIDIA) and Linux Debug (NVIDIA) don't seem to be affected. I'm going to try to reproduce this locally on Windows Debug.
,
Apr 14 2016
Simply running the test in a loop with a Debug build on Windows does not reproduce the problem: python content\test\gpu\run_gpu_test.py webgl_conformance --browser=debug --show-stdout --extra-browser-args="--enable-logging=stderr --js-flags=--expose-gc" --webgl-conformance-version=2.0.0 --webgl2-only=true --story-filter=conformance2_query_occlusion_query --page-repeat=1000 --max-failures=1 Expanding the tests run to try to catch the failure.
,
Apr 14 2016
The test conformance2_misc_uninitialized_test_2 runs just before conformance2_query_occlusion_query, and I suspect it's allocating a lot of garbage that, once cleaned up, intermittently provokes the failure. Am running the following on Windows in a loop: python content\test\gpu\run_gpu_test.py webgl_conformance --browser=debug --show-stdout --extra-browser-args="--enable-logging=stderr --js-flags=--expose-gc" --webgl-conformance-version=2.0.0 --webgl2-only=true --story-filter="conformance2_misc|conformance2_query" --pageset-repeat=1000 --max-failures=1
,
Apr 14 2016
I've been running the above command for over an hour with no crashes, and am at a loss as to how to proceed. As far as I can tell from https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20Release%20%28NVIDIA%29?numbuilds=200 , the flakiness is still present, though there were several green builds in between the failing ones. These are the GYP_DEFINES I used for my Debug build: disable_nacl=1 component=shared_library build_angle_deqp_tests=1 archive_gpu_tests=1 proprietary_codecs=1 ffmpeg_branding=Chrome blink_logging_always_on=1 test_isolation_mode=prepare Kentaro, may I please ask for your help here? I think that the many Oilpan-related changes that landed over the past couple of days contained an unintended behavioral change. This is the first failing build of this form I can find: https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20Release%20%28NVIDIA%29/builds/23849 I am going to try adding a flaky suppression for this particular test to see if it will allow the remainder of the tests to run green.
,
Apr 14 2016
Must be tired -- I only just realized that this test is of course only failing on the FYI waterfall and can not possibly be blocking the CQ bots. De-prioritizing.
,
Apr 14 2016
+sigbjorn It looks like we're crashing in WebGLFramebuffer's destructor. All build logs are crashing there. blink::WebGLObject::detachAndDeleteObject [0x6A29F193+51] blink::WebGLFramebuffer::~WebGLFramebuffer [0x6A2A1373+19] The first crash is observed in https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20Release%20%28NVIDIA%29/builds/23847. The only recent oilpan-related change to webgl/ before the build is https://codereview.chromium.org/1878463002, but I'm not sure if it's related. Maybe some other oilpan-related change caused destruction timing of some objects, which triggered the crash. Still looking, but any advice is welcome.
,
Apr 14 2016
Sigbjorn: I'm sorry I've been asking the same question many times, but why is it safe to touch WebGLContextObject::m_context in WebGLFrameBuffer's destructor? It's touched via WebGLContextObject::getAGLInterface called by WebGLObject::deleteObject called by WebGLFrameBuffer's destructor.
,
Apr 14 2016
Isn't that why WebGLRenderingContextBase uses EAGERLY_FINALIZE()? If WebGLRenderingContextBase is GC'd first, then detachAndRemoveAllObjects() iterates down all of its m_contextObjects calling detachContext(), which nulls out m_context in the WebGLContextObject. If WebGLContextObject is GC'd first, then it seems it's guaranteed that WebGLRenderingContextBase hasn't been GC'd in this cycle, yes? Otherwise WebGLRenderingContextBase's destructor would have been called first, and WebGLContextObject::m_context would still be valid.
,
Apr 14 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/40c519e687363ca55328e6f2d8652ee4520cc41c commit 40c519e687363ca55328e6f2d8652ee4520cc41c Author: kbr <kbr@chromium.org> Date: Thu Apr 14 05:42:05 2016 Suppress flaky Oilpan-related crashes in occlusion-query.html. BUG= 603168 TBR=zmo@chromium.org CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel Review URL: https://codereview.chromium.org/1885963003 Cr-Commit-Position: refs/heads/master@{#387245} [modify] https://crrev.com/40c519e687363ca55328e6f2d8652ee4520cc41c/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py
,
Apr 14 2016
Sorry about the slow response; #13 accurately explains #12. Same also holds for WebGL2RenderingContextBases wrt eager finalization, which I'm guessing is involved for this test.
,
Apr 14 2016
WebGL2RenderingContextBase keeps a WebGLFramebuffer reference set via bindFramebuffer() - how do we ensure that a buffer isn't created in one context and later bound to another ? It would confuse finalization, if achievable.
,
Apr 14 2016
Resources are associated with a particular context. It's not currently possible in WebGL to use a resource allocated by one context with another context. (OpenGL-level resource sharing is difficult to implement portably and performantly.)
,
Apr 14 2016
othanks, so it would have failed right off the bat. Verified via https://codereview.chromium.org/1885303002/
,
Apr 15 2016
Hmm, I'm looking at the code but the WebGLFrameBuffer's destruction sequence looks correct to me. Does anyone reproduce the crash locally? Maybe it would be helpful to know the exact line where the crash is occurring.
,
Apr 18 2016
Unfortunately I haven't been able to reproduce the crash locally; see #9 above.
,
Apr 20 2016
Same assessment here as #19 from going over the code again. Re #9 and first failing build, https://codereview.chromium.org/1747283003 landed not long before -- has it been considered?
,
Apr 20 2016
@sigbjornf: that code path should only take effect if Chrome's GPU sub-process exits, which should not be happening in these tests. It would in the context_lost_tests step, for example.
,
Apr 21 2016
This might not be relevant but when using the ANGLE OpenGL backend this test would also crash the GPU process because of missing state tracking for the PIXEL_UNPACK_BUFFER object. (we would call the driver's function without the buffer bound, causing a nullptr dereference).
,
Apr 21 2016
Thanks for the note. I'd expect the GPU process to crash 100% of the time in that case so the test would be failing reliably instead of flakily.
,
Apr 22 2016
,
Jul 19 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/468fefb5fa8e23b66fd56e54a96ad13691272b12 commit 468fefb5fa8e23b66fd56e54a96ad13691272b12 Author: cwallez <cwallez@chromium.org> Date: Tue Jul 19 20:19:25 2016 WebGL2 expectations: tighten up Linux expectations TBR=zmo@chromium.org BUG= 483282 BUG= 598902 BUG= 603168 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2161993003 Cr-Commit-Position: refs/heads/master@{#406366} [modify] https://crrev.com/468fefb5fa8e23b66fd56e54a96ad13691272b12/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py
,
Apr 6 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c85136a913f508e2795af6b21873a9984ea721f7 commit c85136a913f508e2795af6b21873a9984ea721f7 Author: Kenneth Russell <kbr@chromium.org> Date: Fri Apr 06 03:58:35 2018 Remove flaky expectation for occlusion-query.html. This was flaking because of crashes in Oilpan. Since then, the WebGL context's object graph has been redesigned. This test should no longer be flaky. TBR=kainino@chromium.org, jdarpinian@chromium.org Bug: 603168 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Change-Id: Ibafa9168250094c4b070342a8a08a6bb92002174 Reviewed-on: https://chromium-review.googlesource.com/996965 Commit-Queue: Kenneth Russell <kbr@chromium.org> Reviewed-by: Kenneth Russell <kbr@chromium.org> Cr-Commit-Position: refs/heads/master@{#548660} [modify] https://crrev.com/c85136a913f508e2795af6b21873a9984ea721f7/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py
,
Apr 6 2018
Calling this WontFix - no longer reproducible - at this point. |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by zmo@chromium.org
, Apr 13 2016