~WorkerSchedulerImpl releases a live registered TaskObserver |
||||||||||
Issue descriptionVersion: r415561 OS: All Got the following failure running with ASAN on WebGL 2.0 Conformance tests (as part of investigating Issue 609252 ). Looks like use after free for a TaskObserverAdapter on the Renderer thread. It occurs usually occurs within a minute, on a random test. To reproduce run with ASAN Mac build, with '--enable-unsafe-es3-apis' and run tests in: https://www.khronos.org/registry/webgl/sdk/tests/webgl-conformance-tests.html?version=2.0.0 ==30774==ERROR: AddressSanitizer: use-after-poison on address 0x7ee4598f5a00 at pc 0x000264c18abb bp 0x7fff57de6af0 sp 0x7fff57de6ae8 READ of size 8 at 0x7ee4598f5a00 thread T0 ==30774==WARNING: invalid path to external symbolizer! ==30774==WARNING: Failed to use and restart external symbolizer! #0 0x264c18aba in blink::scheduler::WebThreadBase::TaskObserverAdapter::DidProcessTask(base::PendingTask const&) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x11318aba) #1 0x264bf09ef in blink::scheduler::TaskQueueManager::ProcessTaskFromWorkQueue(blink::scheduler::internal::WorkQueue*) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x112f09ef) #2 0x264beb49e in blink::scheduler::TaskQueueManager::DoWork(base::TimeTicks, bool) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x112eb49e) #3 0x25a7972e5 in base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask const&) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x6e972e5) #4 0x25a82dbb4 in base::MessageLoop::RunTask(base::PendingTask const&) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x6f2dbb4) #5 0x25a82e8a6 in base::MessageLoop::DeferOrRunPendingTask(base::PendingTask) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x6f2e8a6) #6 0x25a82fcee in base::MessageLoop::DoWork() (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x6f2fcee) #7 0x25a83a649 in base::MessagePumpCFRunLoopBase::RunWork() (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x6f3a649) #8 0x25a804d19 in base::mac::CallWithEHFrame(void () block_pointer) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x6f04d19) #9 0x25a83831f in base::MessagePumpCFRunLoopBase::RunWorkSource(void*) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x6f3831f) #10 0x7fff85873880 in __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation+0xaa880) #11 0x7fff85852fbb in __CFRunLoopDoSources0 (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation+0x89fbb) #12 0x7fff858524de in __CFRunLoopRun (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation+0x894de) #13 0x7fff85851ed7 in CFRunLoopRunSpecific (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation+0x88ed7) #14 0x7fff819c8ed8 in -[NSRunLoop(NSRunLoop) runMode:beforeDate:] (/System/Library/Frameworks/Foundation.framework/Versions/C/Foundation+0x24ed8) #15 0x25a83c5ae in base::MessagePumpNSRunLoop::DoRun(base::MessagePump::Delegate*) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x6f3c5ae) #16 0x25a839c8b in base::MessagePumpCFRunLoopBase::Run(base::MessagePump::Delegate*) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x6f39c8b) #17 0x25a8b8bbb in base::RunLoop::Run() (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x6fb8bbb) #18 0x26a9e8bc1 in content::RendererMain(content::MainFunctionParams const&) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x170e8bc1) #19 0x259582d4c in content::ContentMainRunnerImpl::Run() (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x5c82d4c) #20 0x25957f333 in content::ContentMain(content::ContentMainParams const&) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x5c7f333) #21 0x2539054af in ChromeMain (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x54af) #22 0x107e16bd7 in main (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Helper.app/Contents/MacOS/Chromium Helper+0x100001bd7) #23 0x107e167c3 in start (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Helper.app/Contents/MacOS/Chromium Helper+0x1000017c3) AddressSanitizer can not describe address in more detail (wild memory access suspected). SUMMARY: AddressSanitizer: use-after-poison (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x11318aba) in blink::scheduler::WebThreadBase::TaskObserverAdapter::DidProcessTask(base::PendingTask const&) Shadow bytes around the buggy address: 0x1fdc8b31eaf0: 00 00 00 00 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 0x1fdc8b31eb00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 00 00 00 00 00 0x1fdc8b31eb10: 00 00 00 f7 00 00 00 00 00 00 00 00 00 00 00 00 0x1fdc8b31eb20: 00 00 00 f7 00 00 00 00 00 00 00 00 00 f7 00 00 0x1fdc8b31eb30: 00 00 00 00 00 00 00 00 00 00 00 f7 f7 f7 f7 f7 =>0x1fdc8b31eb40:[f7]f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 00 00 0x1fdc8b31eb50: 00 00 00 00 00 00 f7 00 00 00 00 00 00 00 00 f7 0x1fdc8b31eb60: 00 00 00 00 00 00 00 00 f7 f7 f7 f7 f7 f7 f7 f7 0x1fdc8b31eb70: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 0x1fdc8b31eb80: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 0x1fdc8b31eb90: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==30774==ABORTING Received signal 6 [0x00025a7957ea] [0x7fff87b2c52a] [0x000108b5c79b] [0x7fff949686df] [0x000107e8bac6] [0x000107e7217b] [0x000264c18abb] [0x000264bf09f0] [0x000264beb49f] [0x00025a7972e6] [0x00025a82dbb5] [0x00025a82e8a7] [0x00025a82fcef] [0x00025a83a64a] [0x00025a804d1a] [0x00025a838320] [0x7fff85873881] [0x7fff85852fbc] [0x7fff858524df] [0x7fff85851ed8] [0x7fff819c8ed9] [0x00025a83c5af] [0x00025a839c8c] [0x00025a8b8bbc] [0x00026a9e8bc2] [0x000259582d4d] [0x00025957f334] [0x0002539054b0] [0x000107e16bd8] [0x000107e167c4] [end of stack trace] To diagnose the issue I added debug tracking to TaskObservers, counting when they are registered & unregistered (AddObserver/RemoveObserver) and CHECK in base::MessageLoop::TaskObserver::~TaskObserver(). I got the following offending stack: [83998:34627:0906/181001:FATAL:message_loop.cc(120)] Check failed: registered_count == 0. 0 Chromium Framework 0x0000000111e85d23 _ZN4base5debug10StackTraceC1Ev + 19 1 Chromium Framework 0x0000000111eed382 _ZN7logging10LogMessageD2Ev + 402 2 Chromium Framework 0x0000000111f1f495 _ZN4base11MessageLoop12TaskObserverD2Ev + 277 3 Chromium Framework 0x000000011b2c91c1 _ZN5blink9scheduler19WorkerSchedulerImplD2Ev + 273 4 Chromium Framework 0x000000011b2c932e _ZN5blink9scheduler19WorkerSchedulerImplD0Ev + 14 5 Chromium Framework 0x000000011b2c8331 _ZN5blink9scheduler31WebThreadImplForWorkerScheduler29WillDestroyCurrentMessageLoopEv + 369 6 Chromium Framework 0x0000000111f21395 _ZN4base11MessageLoopD2Ev + 1413 7 Chromium Framework 0x0000000111f22a5e _ZN4base11MessageLoopD0Ev + 14 8 Chromium Framework 0x0000000112079bf1 _ZN4base6Thread10ThreadMainEv + 1505 9 Chromium Framework 0x0000000112060bbe _ZN4base12_GLOBAL__N_110ThreadFuncEPv + 206 10 libsystem_pthread.dylib 0x00007fff95d9c99d _pthread_body + 131 11 libsystem_pthread.dylib 0x00007fff95d9c91a _pthread_body + 0 12 libsystem_pthread.dylib 0x00007fff95d9a351 thread_start + 13
,
Sep 14 2016
I'm getting a slightly different error here -- still task runner related:
#0 0x7fef6ed2909e (/usr/local/google/code3/clankium-master/src/out_linux/Release/./libblink_platform.so+0x70e09e)
#1 0x7fef6ed09e77 (/usr/local/google/code3/clankium-master/src/out_linux/Release/./libblink_platform.so+0x6eee77)
#2 0x7fef6ed05d0e (/usr/local/google/code3/clankium-master/src/out_linux/Release/./libblink_platform.so+0x6ead0e)
#3 0x7fef7f6602c1 (/usr/local/google/code3/clankium-master/src/out_linux/Release/./libbase.so+0xe62c1)
#4 0x7fef7f6cb344 (/usr/local/google/code3/clankium-master/src/out_linux/Release/./libbase.so+0x151344)
#5 0x7fef7f6cbfb5 (/usr/local/google/code3/clankium-master/src/out_linux/Release/./libbase.so+0x151fb5)
#6 0x7fef7f6cd618 (/usr/local/google/code3/clankium-master/src/out_linux/Release/./libbase.so+0x153618)
#7 0x7fef7f6d1e41 (/usr/local/google/code3/clankium-master/src/out_linux/Release/./libbase.so+0x157e41)
#8 0x7fef7f73e666 (/usr/local/google/code3/clankium-master/src/out_linux/Release/./libbase.so+0x1c4666)
#9 0x7fef7a87a29f (/usr/local/google/code3/clankium-master/src/out_linux/Release/./libcontent.so+0x2dd629f)
#10 0x7fef7abd393c (/usr/local/google/code3/clankium-master/src/out_linux/Release/./libcontent.so+0x312f93c)
#11 0x7fef7abd67dc (/usr/local/google/code3/clankium-master/src/out_linux/Release/./libcontent.so+0x31327dc)
#12 0x7fef7abd2c1a (/usr/local/google/code3/clankium-master/src/out_linux/Release/./libcontent.so+0x312ec1a)
#13 0x7fef80f5f573 (/usr/local/google/code3/clankium-master/src/out_linux/Release/chrome+0x1196573)
#14 0x7fef67ab3f44 (/lib/x86_64-linux-gnu/libc.so.6+0x21f44)
AddressSanitizer can not describe address in more detail (wild memory access suspected).
SUMMARY: AddressSanitizer: use-after-poison (/usr/local/google/code3/clankium-master/src/out_linux/Release/./libblink_platform.so+0x70e09e)
Shadow bytes around the buggy address:
0x0fdad7ae49b0: f7 f7 f7 f7 f7 00 00 00 00 00 00 00 00 00 f7 f7
0x0fdad7ae49c0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
0x0fdad7ae49d0: f7 f7 f7 f7 f7 00 00 00 00 00 00 00 00 f7 00 00
0x0fdad7ae49e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 f7 00 00
0x0fdad7ae49f0: 00 00 00 00 00 00 00 f7 00 00 00 00 00 00 00 00
=>0x0fdad7ae4a00: 00 00 00 00 f7 f7 f7 f7 f7[f7]f7 f7 f7 f7 f7 f7
0x0fdad7ae4a10: f7 f7 f7 f7 f7 f7 f7 00 00 00 00 00 00 00 00 f7
0x0fdad7ae4a20: 00 00 00 00 00 00 00 00 f7 00 00 00 00 00 00 00
0x0fdad7ae4a30: 00 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
0x0fdad7ae4a40: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
0x0fdad7ae4a50: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Which symbolizes to:
#0 0x7fef6ed2909e in ?? third_party/WebKit/Source/platform/scheduler/child/webthread_base.cc:32:16
#1 0x7fef6ed09e77 in ProcessTaskFromWorkQueue third_party/WebKit/Source/platform/scheduler/base/task_queue_manager.cc:346:5
#2 0x7fef6ed05d0e in DoWork third_party/WebKit/Source/platform/scheduler/base/task_queue_manager.cc:234:13
#3 0x7fef7f6602c1 in Run base/callback.h:64:12
#4 0x7fef7f6602c1 in RunTask base/debug/task_annotator.cc:54:0
#5 0x7fef7f6cb344 in RunTask base/message_loop/message_loop.cc:487:19
#6 0x7fef7f6cbfb5 in DeferOrRunPendingTask base/message_loop/message_loop.cc:496:5
#7 0x7fef7f6cd618 in DoDelayedWork base/message_loop/message_loop.cc:659:10
#8 0x7fef7f6d1e41 in Run base/message_loop/message_pump_default.cc:39:27
#9 0x7fef7f73e666 in Run base/run_loop.cc:35:10
#10 0x7fef7a87a29f in RendererMain content/renderer/renderer_main.cc:198:23
#11 0x7fef7abd393c in RunZygote content/app/content_main_runner.cc:343:14
#12 0x7fef7abd67dc in Run content/app/content_main_runner.cc:786:12
#13 0x7fef7abd2c1a in ContentMain content/app/content_main.cc:20:28
#14 0x7fef80f5f573 in ChromeMain chrome/app/chrome_main.cc:85:12
#15 0x7fef67ab3f44 in __libc_start_main /build/eglibc-oGUzwX/eglibc-2.19/csu/libc-start.c:287:0
I'll try adding more instrumentation next.
,
Sep 14 2016
With some help from __asan_address_is_poisoned() I was able to narrow this down to the following WebThread::TaskObserver (call stack when the observer is registered): #0 0x7f4f077111e1 __interceptor_backtrace #1 0x7f4f05e95c03 base::debug::StackTrace::StackTrace() #2 0x7f4ef555fdc3 blink::scheduler::WebThreadBase::addTaskObserver() #3 0x7f4ee6fa6466 blink::EXTDisjointTimerQuery::endQueryEXT() #4 0x7f4ee665aebb blink::EXTDisjointTimerQueryV8Internal::endQueryEXTMethodCallback() #5 0x7f4ef9f5930a v8::internal::FunctionCallbackArguments::Call() #6 0x7f4efa0e7099 v8::internal::(anonymous namespace)::HandleApiCallHelper<>() #7 0x7f4efa0e492f v8::internal::Builtin_Impl_HandleApiCall() #8 0x7f4ebb0843a7 <unknown> Looks like its finalizer is never getting called for this observer, although it is a GarbageCollectedFinalized. Looking at the way the task observer is used, I wonder if we could just replace it with a simple PostTask that sets the same flag? Ken, what do you think?
,
Sep 15 2016
Sami, thank you for tracking that down. I thought the registering and unregistering code for these TaskObservers was "correct" but maybe there's some corner case I missed. Not sure how the TaskObserver is getting deleted without unregistering itself. The intent here was to determine when control returned to the main loop, but from talking with vmiura@, too, it looks like a simple PostTask will be sufficient -- as long as the conformance tests still pass. Taking this bug. Thanks again for your help investigating this.
,
Sep 15 2016
Actually, after talking with Victor -- given that the problems in Issue 609252 are also TaskObserver-related, and occur in the GPU process -- I think there is a non-zero probability that there is an actual bug in the management of the TaskObservers. Going to build ASAN and investigate this.
,
Sep 16 2016
Thanks to a tip from haraken@ the issue became obvious. """ 1) GC's marking phase finishes. 2) All dead objects are poisoned. At this point, no destructor has been called yet. 3) Lazy sweeping starts, which means that the sweeping phase runs incrementally while the user program runs. If you touch WebGLTimerQueryEXT in the lazy sweeping phase, you'll hit the poisoning error because the object is already poisoned. """ We still have the WebGLTimerQueryEXT registered as a task observer at #2. It needs to be unregistered in a pre-finalizer.
,
Sep 16 2016
As discussed offline, moving the unregistration to a pre-finalizer will solve the problem. That being said, do you really need to register WebGLTimerQueryEXT as a TaskObserver? WebThread::TaskObserver is discouraged because the task observation adds a non-negligible overhead to each task handling. (Remember that the observer callback needs to be called for each task.)
,
Sep 16 2016
kbr@ and I discussed the efficiency of this earlier. It is heavyweight. A PostTask would enable the needed functionality.
,
Sep 16 2016
Yes, adding one TaskObserver slows down all other task executions, so it should be avoided :)
,
Sep 16 2016
FYI I've got a patch here that replaces the TaskObserver with a posted task: https://codereview.chromium.org/2341043002/
,
Sep 16 2016
Thanks everyone for your help tracking this down and sorry for the bug I introduced. Per comment on https://codereview.chromium.org/2341043002/ I think it has the same basic problem. If a raw pointer to a GarbageCollectedFinalized object is ever exposed and there is a way of revoking that raw pointer in the object's destructor, then the revocation needs to be done in a pre-finalizer. Correct? If pre-finalizers are expensive then perhaps we should allocate separate C-heap objects which use WeakMembers to point to the GC'd objects. Or is that equally as expensive as adding a pre-finalizer?
,
Sep 17 2016
Sami, thanks for putting together the patch above -- since you're already working on this can I assign this to you again?
,
Sep 19 2016
FYI Sami is OOO till Thursday.
,
Sep 19 2016
OK, let me take this back and put up a slightly different patch.
,
Sep 22 2016
It turned out I was totally swamped this week and this will be the next thing I work on. Sami, if you get in on Thursday and want to revise your patch https://codereview.chromium.org/2341043002/ , please go ahead and take this back from me. Thanks.
,
Sep 22 2016
Sami, since you're already working with haraken@ to update https://codereview.chromium.org/2341043002/ let me please assign this to you again.
,
Sep 22 2016
sgtm.
,
Sep 23 2016
,
Sep 26 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/fcfd66a5660a6ec631facdc050e8f8fc3ded7858 commit fcfd66a5660a6ec631facdc050e8f8fc3ded7858 Author: skyostil <skyostil@chromium.org> Date: Mon Sep 26 20:17:09 2016 webgl: Replace query task observer with a posted task This patch replaces the task observers used by WebGL queries (timer and regular) with an explicitly posted task. In general task observers should be used sparingly because they add overhead to every task and their lifetime must be explicitly managed. We now post a regular task, which runs only after the current task has finished, meaning it is safe to update the cached query availability flag again. BUG= 644575 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.win:win_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2341043002 Cr-Commit-Position: refs/heads/master@{#420975} [modify] https://crrev.com/fcfd66a5660a6ec631facdc050e8f8fc3ded7858/third_party/WebKit/Source/modules/webgl/WebGLQuery.cpp [modify] https://crrev.com/fcfd66a5660a6ec631facdc050e8f8fc3ded7858/third_party/WebKit/Source/modules/webgl/WebGLQuery.h [modify] https://crrev.com/fcfd66a5660a6ec631facdc050e8f8fc3ded7858/third_party/WebKit/Source/modules/webgl/WebGLTimerQueryEXT.cpp [modify] https://crrev.com/fcfd66a5660a6ec631facdc050e8f8fc3ded7858/third_party/WebKit/Source/modules/webgl/WebGLTimerQueryEXT.h
,
Sep 27 2016
|
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by kbr@chromium.org
, Sep 13 2016Status: Assigned (was: Untriaged)