New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 603168 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner:
Closed: Apr 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Bug

Blocked on:
issue 605988



Sign in to add a comment

conformance2/query/occlusion-query.html became flaky on several bots on GPU FYI

Project Member Reported by zmo@chromium.org, Apr 13 2016

Issue description

icu_56::RegexMatcher::regionStart [0x6BC81B80+0]
	blink::WebGLObject::detachAndDeleteObject [0x6A29F193+51]
	blink::WebGLFramebuffer::~WebGLFramebuffer [0x6A2A1373+19]
	blink::WebGLFramebuffer::`scalar deleting destructor' [0x6A2A13E8+8]
	blink::FinalizerTrait<blink::XPathNSResolver>::finalize [0x6A23A726+22]
	blink::NormalPage::sweep [0x6A0C6D2E+158]
	blink::BaseArena::completeSweep [0x6A0C5BDB+91]
	blink::ThreadState::completeSweep [0x6A0C0887+215]
	blink::ThreadState::scheduleV8FollowupGCIfNeeded [0x6A0C2868+24]
	blink::V8GCController::gcEpilogue [0x6AAB8D5B+203]
	v8::internal::Heap::CallGCEpilogueCallbacks [0x6BA81CC9+73]
	v8::internal::Heap::PerformGarbageCollection [0x6BA8C0D4+1780]
	v8::internal::Heap::CollectGarbage [0x6BA8245E+558]
	v8::internal::ScavengeJob::IdleTask::RunInternal [0x6BF48DD2+306]
	v8::internal::CancelableIdleTask::Run [0x6BA520FD+45]
	blink::V8IdleTaskAdapter::run [0x6A7A76A8+24]
	scheduler::WebSchedulerImpl::runIdleTask [0x6BDED6FC+44]
	base::internal::Invoker<base::IndexSequence<0>,base::internal::BindState<base::internal::RunnableAdapter<void (__cdecl*)(std::unique_ptr<blink::WebThread::IdleTask,std::default_delete<blink::WebThread::IdleTask> >,base::TimeTicks)>,void __cdecl(std::uniqu [0x6BDE84A7+103]
	scheduler::SingleThreadIdleTaskRunner::RunTask [0x6BDEE736+198]
	base::internal::Invoker<base::IndexSequence<0,1>,base::internal::BindState<base::internal::RunnableAdapter<void (__thiscall scheduler::SingleThreadIdleTaskRunner::*)(base::Callback<void __cdecl(base::TimeTicks),1>)>,void __cdecl(scheduler::SingleThreadIdl [0x6BDEE65D+93]
	base::debug::TaskAnnotator::RunTask [0x69BBAED7+247]
	scheduler::TaskQueueManager::ProcessTaskFromWorkQueue [0x6BDF342A+826]
	scheduler::TaskQueueManager::DoWork [0x6BDF2C43+387]
	base::internal::Invoker<base::IndexSequence<0,1,2>,base::internal::BindState<base::internal::RunnableAdapter<void (__thiscall scheduler::TaskQueueManager::*)(base::TimeTicks,bool)>,void __cdecl(scheduler::TaskQueueManager *,base::TimeTicks,bool),base::Wea [0x6BDF386F+79]
	base::debug::TaskAnnotator::RunTask [0x69BBAED7+247]
	base::MessageLoop::RunTask [0x69B90FAD+765]
	base::MessageLoop::DoDelayedWork [0x69B90767+215]
	base::MessagePumpDefault::Run [0x69BBCAAA+58]
	base::MessageLoop::RunHandler [0x69B90CA1+17]
	base::RunLoop::Run [0x69BBCCC8+88]
	base::MessageLoop::Run [0x69B90C7D+29]
	content::RendererMain [0x6ACFA479+1241]
	content::RunNamedProcessTypeMain [0x69B6C920+256]
	content::ContentMainRunnerImpl::Run [0x69B6C7E6+134]
	content::ContentMain [0x69B69CC3+35]
	ChromeMain [0x69AD3F6B+107]
	MainDllLoader::Launch [0x0026726C+748]
	wWinMain [0x00266AF7+487]
	__scrt_common_main_seh [0x002C0A96+253] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:255)
	BaseThreadInitThunk [0x7717338A+18]
	RtlInitializeExceptionChain [0x77849A02+99]
	RtlInitializeExceptionChain [0x778499D5+54]
 

Comment 1 by zmo@chromium.org, Apr 13 2016

Here is another trace:

blink::DrawingBuffer::contextGL [0x162AE4F1+17]
	blink::IDBIndexParameters::setUnique [0x1F462361+724332]
	blink::IDBIndexParameters::setUnique [0x1F9E6AB0+6509755]
	blink::IDBIndexParameters::setUnique [0x1F9F0DFC+6551559]
	blink::IDBIndexParameters::setUnique [0x1F9F0F10+6551835]
	blink::IDBIndexParameters::setUnique [0x1F9EC3C6+6532561]
	blink::IDBIndexParameters::setUnique [0x1F9EC8C6+6533841]
	blink::IDBIndexParameters::setUnique [0x1F9AB0EE+6265593]
	blink::IDBIndexParameters::setUnique [0x1F9AB08E+6265497]
	blink::IDBIndexParameters::setUnique [0x1F9AB06C+6265463]
	blink::HeapObjectHeader::finalize [0x165C4515+85]
	blink::NormalPage::sweep [0x165C802F+447]
	blink::NormalPageArena::lazySweepPages [0x165C553C+204]
	blink::BaseArena::lazySweep [0x165C515D+349]
	blink::NormalPageArena::outOfLineAllocate [0x165C6D81+321]
	blink::NormalPageArena::allocateObject [0x15FD90EB+363]
	blink::Heap::allocateOnArenaIndex [0x15FD920E+174]
	blink::DOMURLUtilsReadOnly::search [0x250EC2F4+30116]
	blink::GarbageCollected<blink::ScriptRunner>::allocateObject [0x25108EC1+17]
	blink::GarbageCollected<blink::ScriptRunner>::operator new [0x25103CBE+14]
	blink::ScriptRunner::create [0x2510EC2B+27]
	blink::Document::Document [0x25101689+1689]
	blink::HTMLDocument::HTMLDocument [0x254779F7+119]
	blink::HTMLDocument::create [0x2414306A+58]
	blink::DOMImplementation::createDocument [0x250D0BBA+90]
	blink::LocalDOMWindow::createDocument [0x24D26D7A+106]
	blink::LocalDOMWindow::installNewDocument [0x24D2A55A+122]
	blink::DocumentLoader::createWriterFor [0x24EC6CDD+253]
	blink::DocumentLoader::ensureWriter [0x24EC7630+576]
	blink::DocumentLoader::commitData [0x24EC69DA+138]
	blink::DocumentLoader::processData [0x24EC9366+166]
	blink::DocumentLoader::dataReceived [0x24EC6F00+432]
	blink::RawResource::appendData [0x24C80A25+117]
	blink::ResourceLoader::didReceiveData [0x24CAC601+337]
	content::WebURLLoaderImpl::Context::OnReceivedData [0x1201DF1F+255]
	content::WebURLLoaderImpl::RequestPeerImpl::OnReceivedData [0x1201E01A+58]
	content::ResourceDispatcher::OnReceivedData [0x11F86D31+1233]
	??$DispatchToMethodImpl@PAVResourceDispatcher@content@@P812@AEXHHHH@ZHHHH$$Z$0A@$00$01$02@base@@YAXABQAVResourceDispatcher@content@@P812@AEXHHHH@ZABV?$tuple@HHHH@std@@U?$IndexSequence@$0A@$00$01$02@0@@Z [0x11F7D23E+110]
	base::DispatchToMethod<content::ResourceDispatcher *,void (__thiscall content::ResourceDispatcher::*)(int,int,int,int),int,int,int,int> [0x11F7CCE1+33]
	IPC::DispatchToMethod<content::ResourceDispatcher,void (__thiscall content::ResourceDispatcher::*)(int,int,int,int),void,std::tuple<int,int,int,int> > [0x11F7CEB4+20]
	IPC::MessageT<ResourceMsg_DataReceived_Meta,std::tuple<int,int,int,int>,void>::Dispatch<content::ResourceDispatcher,content::ResourceDispatcher,void,void (__thiscall content::ResourceDispatcher::*)(int,int,int,int)> [0x11F7C60D+253]
	content::ResourceDispatcher::DispatchMessageW [0x11F85526+1142]
	content::ResourceDispatcher::OnMessageReceived [0x11F86667+839]
	content::ResourceSchedulingFilter::DispatchMessageW [0x11F91777+39]
	std::unique_ptr<blink::WebTaskRunner,std::default_delete<blink::WebTaskRunner> >::release [0x11F94798+152]
	scheduler::RendererSchedulerImpl::OnRendererForegrounded [0x1E59CE26+419544]
	scheduler::RendererSchedulerImpl::OnRendererForegrounded [0x1E59BDDF+415377]
	scheduler::RendererSchedulerImpl::OnRendererForegrounded [0x1E59BD38+415210]
	scheduler::RendererSchedulerImpl::OnRendererForegrounded [0x1E59C5F3+417445]
	base::Callback<void __cdecl(void),1>::Run [0x100716EF+47]
	base::debug::TaskAnnotator::RunTask [0x100A5383+387]
	scheduler::RendererSchedulerImpl::OnRendererForegrounded [0x1E566153+195077]
	scheduler::RendererSchedulerImpl::OnRendererForegrounded [0x1E5644ED+187807]
	scheduler::RendererSchedulerImpl::OnRendererForegrounded [0x1E55A280+146226]
	scheduler::RendererSchedulerImpl::OnRendererForegrounded [0x1E55A1E4+146070]
	scheduler::RendererSchedulerImpl::OnRendererForegrounded [0x1E566D48+198138]
	base::Callback<void __cdecl(void),1>::Run [0x100716EF+47]
	base::debug::TaskAnnotator::RunTask [0x100A5383+387]
	base::MessageLoop::RunTask [0x10127E79+729]
	base::MessageLoop::DeferOrRunPendingTask [0x101255B4+52]
	base::MessageLoop::DoWork [0x10125D7D+221]
	base::MessagePumpDefault::Run [0x1012E174+244]

Comment 2 by zmo@chromium.org, Apr 13 2016

Cc: bajones@chromium.org
Status: Available (was: Untriaged)
It became very flaky within 24 hours

Comment 4 by kbr@chromium.org, Apr 13 2016

Cc: haraken@chromium.org sigbjo...@opera.com
Labels: -Pri-3 OS-All Pri-0
Owner: kbr@chromium.org
Status: Assigned (was: Available)
A few more affected builds:

https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20Release%20%28NVIDIA%29/builds/23943

https://build.chromium.org/p/chromium.gpu.fyi/builders/Win8%20Release%20%28NVIDIA%29/builds/21593

https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%2010.10%20Retina%20Release%20%28AMD%29/builds/3856

A ton of Oilpan-related changes landed over the past couple of days. I'm still going through them but it looks like one or more of them destabilized WebGL's object destruction. Kentaro, can you please help us investigate this urgently? I realize you're not online right now and I'm continuing to dig into this.

Comment 5 by kbr@chromium.org, Apr 13 2016

Components: Blink>MemoryAllocator>GarbageCollection

Comment 6 by kbr@chromium.org, Apr 13 2016

More failures (trying to figure out what configuration to try to debug this on directly):

https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20Release%20%28New%20Intel%29/builds/254

Linux Release (NVIDIA) and Linux Debug (NVIDIA) don't seem to be affected.

I'm going to try to reproduce this locally on Windows Debug.

Comment 7 by kbr@chromium.org, Apr 14 2016

Simply running the test in a loop with a Debug build on Windows does not reproduce the problem:

python content\test\gpu\run_gpu_test.py webgl_conformance --browser=debug --show-stdout --extra-browser-args="--enable-logging=stderr --js-flags=--expose-gc" --webgl-conformance-version=2.0.0 --webgl2-only=true --story-filter=conformance2_query_occlusion_query --page-repeat=1000 --max-failures=1

Expanding the tests run to try to catch the failure.

Comment 8 by kbr@chromium.org, Apr 14 2016

The test conformance2_misc_uninitialized_test_2 runs just before conformance2_query_occlusion_query, and I suspect it's allocating a lot of garbage that, once cleaned up, intermittently provokes the failure.

Am running the following on Windows in a loop:

python content\test\gpu\run_gpu_test.py webgl_conformance --browser=debug --show-stdout --extra-browser-args="--enable-logging=stderr --js-flags=--expose-gc" --webgl-conformance-version=2.0.0 --webgl2-only=true --story-filter="conformance2_misc|conformance2_query" --pageset-repeat=1000 --max-failures=1

Comment 9 by kbr@chromium.org, Apr 14 2016

Owner: haraken@chromium.org
I've been running the above command for over an hour with no crashes, and am at a loss as to how to proceed. As far as I can tell from https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20Release%20%28NVIDIA%29?numbuilds=200 , the flakiness is still present, though there were several green builds in between the failing ones.

These are the GYP_DEFINES I used for my Debug build:

disable_nacl=1 component=shared_library build_angle_deqp_tests=1 archive_gpu_tests=1 proprietary_codecs=1 ffmpeg_branding=Chrome blink_logging_always_on=1 test_isolation_mode=prepare

Kentaro, may I please ask for your help here? I think that the many Oilpan-related changes that landed over the past couple of days contained an unintended behavioral change.

This is the first failing build of this form I can find:

https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20Release%20%28NVIDIA%29/builds/23849

I am going to try adding a flaky suppression for this particular test to see if it will allow the remainder of the tests to run green.

Comment 10 by kbr@chromium.org, Apr 14 2016

Labels: -Pri-0 Pri-2
Must be tired -- I only just realized that this test is of course only failing on the FYI waterfall and can not possibly be blocking the CQ bots. De-prioritizing.

+sigbjorn

It looks like we're crashing in WebGLFramebuffer's destructor. All build logs are crashing there.

  blink::WebGLObject::detachAndDeleteObject [0x6A29F193+51]
  blink::WebGLFramebuffer::~WebGLFramebuffer [0x6A2A1373+19]

The first crash is observed in https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20Release%20%28NVIDIA%29/builds/23847.

The only recent oilpan-related change to webgl/ before the build is https://codereview.chromium.org/1878463002, but I'm not sure if it's related. Maybe some other oilpan-related change caused destruction timing of some objects, which triggered the crash.

Still looking, but any advice is welcome.

Sigbjorn: I'm sorry I've been asking the same question many times, but why is it safe to touch WebGLContextObject::m_context in WebGLFrameBuffer's destructor? It's touched via WebGLContextObject::getAGLInterface called by WebGLObject::deleteObject called by WebGLFrameBuffer's destructor.





Comment 13 by kbr@chromium.org, Apr 14 2016

Isn't that why WebGLRenderingContextBase uses EAGERLY_FINALIZE()?

If WebGLRenderingContextBase is GC'd first, then detachAndRemoveAllObjects() iterates down all of its m_contextObjects calling detachContext(), which nulls out m_context in the WebGLContextObject.

If WebGLContextObject is GC'd first, then it seems it's guaranteed that WebGLRenderingContextBase hasn't been GC'd in this cycle, yes? Otherwise WebGLRenderingContextBase's destructor would have been called first, and WebGLContextObject::m_context would still be valid.

Project Member

Comment 14 by bugdroid1@chromium.org, Apr 14 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/40c519e687363ca55328e6f2d8652ee4520cc41c

commit 40c519e687363ca55328e6f2d8652ee4520cc41c
Author: kbr <kbr@chromium.org>
Date: Thu Apr 14 05:42:05 2016

Suppress flaky Oilpan-related crashes in occlusion-query.html.

BUG= 603168 
TBR=zmo@chromium.org
CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel

Review URL: https://codereview.chromium.org/1885963003

Cr-Commit-Position: refs/heads/master@{#387245}

[modify] https://crrev.com/40c519e687363ca55328e6f2d8652ee4520cc41c/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py

Sorry about the slow response; #13 accurately explains #12.

Same also holds for WebGL2RenderingContextBases wrt eager finalization, which I'm guessing is involved for this test.
WebGL2RenderingContextBase keeps a WebGLFramebuffer reference set via bindFramebuffer() - how do we ensure that a buffer isn't created in one context and later bound to another ? It would confuse finalization, if achievable.

Comment 17 by kbr@chromium.org, Apr 14 2016

Resources are associated with a particular context. It's not currently possible in WebGL to use a resource allocated by one context with another context. (OpenGL-level resource sharing is difficult to implement portably and performantly.)
othanks, so it would have failed right off the bat. Verified via https://codereview.chromium.org/1885303002/
Hmm, I'm looking at the code but the WebGLFrameBuffer's destruction sequence looks correct to me.

Does anyone reproduce the crash locally? Maybe it would be helpful to know the exact line where the crash is occurring.


Comment 20 by kbr@chromium.org, Apr 18 2016

Unfortunately I haven't been able to reproduce the crash locally; see #9 above.

Same assessment here as #19 from going over the code again.

Re #9 and first failing build, https://codereview.chromium.org/1747283003 landed not long before -- has it been considered?

Comment 22 by kbr@chromium.org, Apr 20 2016

Cc: siev...@chromium.org
@sigbjornf: that code path should only take effect if Chrome's GPU sub-process exits, which should not be happening in these tests. It would in the context_lost_tests step, for example.

This might not be relevant but when using the ANGLE OpenGL backend this test would also crash the GPU process because of missing state tracking for the PIXEL_UNPACK_BUFFER object. (we would call the driver's function without the buffer bound, causing a nullptr dereference).

Comment 24 by kbr@chromium.org, Apr 21 2016

Thanks for the note. I'd expect the GPU process to crash 100% of the time in that case so the test would be failing reliably instead of flakily.

Comment 25 by kbr@chromium.org, Apr 22 2016

Blockedon: 605988
Project Member

Comment 26 by bugdroid1@chromium.org, Jul 19 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/468fefb5fa8e23b66fd56e54a96ad13691272b12

commit 468fefb5fa8e23b66fd56e54a96ad13691272b12
Author: cwallez <cwallez@chromium.org>
Date: Tue Jul 19 20:19:25 2016

WebGL2 expectations: tighten up Linux expectations

TBR=zmo@chromium.org
BUG= 483282 
BUG= 598902 
BUG= 603168 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2161993003
Cr-Commit-Position: refs/heads/master@{#406366}

[modify] https://crrev.com/468fefb5fa8e23b66fd56e54a96ad13691272b12/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py

Project Member

Comment 27 by bugdroid1@chromium.org, Apr 6 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c85136a913f508e2795af6b21873a9984ea721f7

commit c85136a913f508e2795af6b21873a9984ea721f7
Author: Kenneth Russell <kbr@chromium.org>
Date: Fri Apr 06 03:58:35 2018

Remove flaky expectation for occlusion-query.html.

This was flaking because of crashes in Oilpan. Since then, the WebGL
context's object graph has been redesigned. This test should no longer
be flaky.

TBR=kainino@chromium.org, jdarpinian@chromium.org

Bug:  603168 
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
Change-Id: Ibafa9168250094c4b070342a8a08a6bb92002174
Reviewed-on: https://chromium-review.googlesource.com/996965
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#548660}
[modify] https://crrev.com/c85136a913f508e2795af6b21873a9984ea721f7/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py

Comment 28 by kbr@chromium.org, Apr 6 2018

Status: WontFix (was: Assigned)
Calling this WontFix - no longer reproducible - at this point.

Sign in to add a comment