Enabling CPU profiler hangs page animation |
|||
Issue description1. open https://coordinated-rest.surge.sh/ 2. open JavaScript Profiler in devtools 3. select the controller.js worker script 4. click start game, 5. notice the smooth animation of the pink ball 6. click 'Start' in profiler panel. 7. notice the animation is only updates every 2s 8. Click Stop. 9. notice the animation resumes 60fps While it's 0.5FPS with the JS profiler, it's 0FPS in the Performance panel. v8.cpu_profiler and v8.cpu_profiler.hires trace categories also have an effect on the page. Interesting discovery: tweaking `heapStatsUpdateIntervalMs` in devtools' IsolateSelector.js has an effect. The `heapUsage()` call within `_updateStats()` triggers a frame while the page is being profiled.
,
Jun 6 2018
Without sampling (pthread_kill signals) the worker thread does perform some work besides waiting in mach_msg_trap. Here's the profile:
2513 Thread_1855401: DedicatedWorker thread
+ 2513 thread_start (in libsystem_pthread.dylib) + 13 [0x7fff53c51bf9]
+ 2513 _pthread_start (in libsystem_pthread.dylib) + 377 [0x7fff53c5250d]
+ 2513 _pthread_body (in libsystem_pthread.dylib) + 340 [0x7fff53c52661]
+ 2513 base::(anonymous namespace)::ThreadFunc(void*) (in Google Chrome Framework) load address 0x109a32000 + 0x20ffae7 []
+ 2513 base::Thread::ThreadMain() (in Google Chrome Framework) load address 0x109a32000 + 0x2101bbb [lock.h:26]
+ 2513 <name omitted> (in Google Chrome Framework) load address 0x109a32000 + 0x20d0105 [run_loop.cc:136]
+ 2353 base::MessagePumpDefault::Run(base::MessagePump::Delegate*) (in Google Chrome Framework) load address 0x109a32000 + 0x20ab072 [message_pump_default.cc:63]
+ ! 2335 base::WaitableEvent::TimedWaitUntil(base::TimeTicks const&) (in Google Chrome Framework) load address 0x109a32000 + 0x20e5d35 [waitable_event_mac.cc:149]
+ ! : 2334 mach_msg (in libsystem_kernel.dylib) + 60 [0x7fff53a80724]
+ ! : | 2334 mach_msg_trap (in libsystem_kernel.dylib) + 10 [0x7fff53a8120a]
+ ! : 1 mach_msg (in libsystem_kernel.dylib) + 60 [0x7fff53a80724]
+ ! 12 base::WaitableEvent::TimedWaitUntil(base::TimeTicks const&) (in Google Chrome Framework) load address 0x109a32000 + 0x20e5dc0 [activity_tracker.h:1315]
+ ! 5 base::WaitableEvent::TimedWaitUntil(base::TimeTicks const&) (in Google Chrome Framework) load address 0x109a32000 + 0x20e5dce [waitable_event_mac.cc:155]
+ ! : 2 base::ScopedBlockingCall::~ScopedBlockingCall() (in Google Chrome Framework) load address 0x109a32000 + 0x21000d0 [scoped_blocking_call.cc:41]
+ ! : 1 base::ScopedBlockingCall::~ScopedBlockingCall() (in Google Chrome Framework) load address 0x109a32000 + 0x2100142 [scoped_blocking_call.cc:44]
+ ! : | 1 base::ThreadLocalStorage::Slot::Set(void*) (in Google Chrome Framework) load address 0x109a32000 + 0x2102fd0 [thread_local_storage.cc:386]
+ ! : 1 base::ScopedBlockingCall::~ScopedBlockingCall() (in Google Chrome Framework) load address 0x109a32000 + 0x21000e4 [lazy_instance_helpers.h:78]
+ ! : 1 base::ScopedBlockingCall::~ScopedBlockingCall() (in Google Chrome Framework) load address 0x109a32000 + 0x2100136 [scoped_blocking_call.cc:43]
+ ! 1 base::WaitableEvent::TimedWaitUntil(base::TimeTicks const&) (in Google Chrome Framework) load address 0x109a32000 + 0x20e5d35 [waitable_event_mac.cc:149]
+ 128 base::MessagePumpDefault::Run(base::MessagePump::Delegate*) (in Google Chrome Framework) load address 0x109a32000 + 0x20ab03c [message_pump_default.cc:42]
+ ! 119 base::MessageLoop::DoDelayedWork(base::TimeTicks*) (in Google Chrome Framework) load address 0x109a32000 + 0x20aa786 [message_loop.cc:364]
+ ! : 118 base::MessageLoop::RunTask(base::PendingTask*) (in Google Chrome Framework) load address 0x109a32000 + 0x20aa174 [vector:639]
+ ! : | 118 base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) (in Google Chrome Framework) load address 0x109a32000 + 0x2084d27 [callback_forward.h:11]
+ ! : | 88 blink::scheduler::internal::ThreadControllerImpl::DoWork(blink::scheduler::internal::SequencedTaskSource::WorkType) (in Google Chrome Framework) load address 0x109a32000 + 0x1b009f9 [weak_ptr.h:243]
+ ! : | + 88 base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) (in Google Chrome Framework) load address 0x109a32000 + 0x2084d27 [callback_forward.h:11]
+ ! : | + 87 blink::TimerBase::RunInternal() (in Google Chrome Framework) load address 0x109a32000 + 0x4e4553e [trace_event.h:1106]
+ ! : | + ! 81 blink::DOMTimer::Fired() (in Google Chrome Framework) load address 0x109a32000 + 0x52a36f4 [member.h:88]
+ ! : | + ! : 79 blink::ScheduledAction::Execute(blink::WorkerGlobalScope*) (in Google Chrome Framework) load address 0x109a32000 + 0x4a7f174 [vector.h:424]
+ ! : | + ! : | 75 blink::V8ScriptRunner::CallFunction(v8::Local<v8::Function>, blink::ExecutionContext*, v8::Local<v8::Value>, int, v8::Local<v8::Value>*, v8::Isolate*) (in Google Chrome Framework) load address 0x109a32000 + 0x4a9bce1 [optional.h:62]
+ ! : | + ! : | + 75 v8::internal::Isolate::RunMicrotasks() (in Google Chrome Framework) load address 0x109a32000 + 0x1714692 [isolate.cc:3919]
+ ! : | + ! : | + 75 v8::internal::Execution::RunMicrotasks(v8::internal::Isolate*, v8::internal::Execution::MessageHandling, v8::internal::MaybeHandle<v8::internal::Object>*) (in Google Chrome Framework) load address 0x109a32000 + 0x1607cc4 [execution.cc:272]
+ ! : | + ! : | + 75 v8::internal::Execution::TryCall(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, v8::internal::Execution::MessageHandling, v8::internal::MaybeHandle<v8::internal::Object>*, v8::internal::Execution::Target) (in Google Chrome Framework) load address 0x109a32000 + 0x1607ba2 [execution.cc:191]
+ ! : | + ! : | + 75 v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, bool, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, v8::internal::Handle<v8::internal::Object>, v8::internal::Execution::MessageHandling, v8::internal::Execution::Target) (in Google Chrome Framework) load address 0x109a32000 + 0x16079ad [simulator.h:110]
+ ! : | + ! : | + 75 ??? (in <unknown binary>) [0x3fbff84681]
However when we start sending pthread_kill signals it seems to start spending most of the time waiting in mach_msg_trap. The profile:
2721 Thread_1669620: DedicatedWorker thread
+ 2721 thread_start (in libsystem_pthread.dylib) + 13 [0x7fff53c51bf9]
+ 2721 _pthread_start (in libsystem_pthread.dylib) + 377 [0x7fff53c5250d]
+ 2721 _pthread_body (in libsystem_pthread.dylib) + 340 [0x7fff53c52661]
+ 2721 base::(anonymous namespace)::ThreadFunc(void*) (in Google Chrome Framework) load address 0x100b00000 + 0x20ffae7 []
+ 2721 base::Thread::ThreadMain() (in Google Chrome Framework) load address 0x100b00000 + 0x2101bbb [lock.h:26]
+ 2721 <name omitted> (in Google Chrome Framework) load address 0x100b00000 + 0x20d0105 [run_loop.cc:136]
+ 2721 base::MessagePumpDefault::Run(base::MessagePump::Delegate*) (in Google Chrome Framework) load address 0x100b00000 + 0x20ab072 [message_pump_default.cc:63]
+ 2721 base::WaitableEvent::TimedWaitUntil(base::TimeTicks const&) (in Google Chrome Framework) load address 0x100b00000 + 0x20e5d35 [waitable_event_mac.cc:149]
+ 2652 mach_msg (in libsystem_kernel.dylib) + 209 [0x7fff53a807b9]
+ ! 2652 mach_msg_trap (in libsystem_kernel.dylib) + 10 [0x7fff53a8120a]
+ 30 ??? (in <unknown binary>) [0x7fc10b1df800]
+ ! 25 _sigtramp (in libsystem_platform.dylib) + 0 [0x7fff53c48f40]
+ ! 5 _sigtramp (in libsystem_platform.dylib) + 26 [0x7fff53c48f5a]
+ ! 3 v8::sampler::SignalHandler::HandleProfilerSignal(int, __siginfo*, void*) (in Google Chrome Framework) load address 0x100b00000 + 0x1a0e66f [sampler.cc:526]
+ ! 1 v8::sampler::SignalHandler::HandleProfilerSignal(int, __siginfo*, void*) (in Google Chrome Framework) load address 0x100b00000 + 0x1a0e6ef [sampler.cc:0]
+ ! | 1 v8::sampler::SamplerManager::DoSample(v8::RegisterState const&) (in Google Chrome Framework) load address 0x100b00000 + 0x1a0e79c [sampler.cc:306]
+ ! 1 v8::sampler::SignalHandler::HandleProfilerSignal(int, __siginfo*, void*) (in Google Chrome Framework) load address 0x100b00000 + 0x1a0e674 [sampler.cc:526]
+ 28 ??? (in <unknown binary>) [0x7fc10db55c00]
+ ! 26 _sigtramp (in libsystem_platform.dylib) + 0,13 [0x7fff53c48f40,0x7fff53c48f4d]
+ ! 2 _sigtramp (in libsystem_platform.dylib) + 26 [0x7fff53c48f5a]
+ ! 1 v8::sampler::SignalHandler::HandleProfilerSignal(int, __siginfo*, void*) (in Google Chrome Framework) load address 0x100b00000 + 0x1a0e6ef [sampler.cc:0]
+ ! | 1 v8::sampler::SamplerManager::DoSample(v8::RegisterState const&) (in Google Chrome Framework) load address 0x100b00000 + 0x1a0e7c5 [sampler.cc:310]
+ ! | 1 v8::Isolate::IsInUse() (in Google Chrome Framework) load address 0x100b00000 + 0x12e790c [isolate.h:547]
+ ! 1 v8::sampler::SignalHandler::HandleProfilerSignal(int, __siginfo*, void*) (in Google Chrome Framework) load address 0x100b00000 + 0x1a0e66f [sampler.cc:526]
+ 10 sandbox::mac::protobuf_seatbelt_2eproto::InitDefaults()::once (in Google Chrome Helper) load address 0x1009b2000 + 0xfffffffeff64e008 []
+ ! 9 _sigtramp (in libsystem_platform.dylib) + 0 [0x7fff53c48f40]
+ ! 1 _sigtramp (in libsystem_platform.dylib) + 26 [0x7fff53c48f5a]
+ ! 1 v8::sampler::SignalHandler::HandleProfilerSignal(int, __siginfo*, void*) (in Google Chrome Framework) load address 0x100b00000 + 0x1a0e66f [sampler.cc:526]
+ 1 mach_msg (in libsystem_kernel.dylib) + 60 [0x7fff53a80724]
+ 1 mach_msg_trap (in libsystem_kernel.dylib) + 10 [0x7fff53a8120a]
Putting some logging into base::WaitableEvent::TimedWaitUntil around mach_msg call:
132721017222 0x7000163fa000 DedicatedWorker thread Wait for 7927 us
ts:132721017224 tid:0x7000163fa000 'DedicatedWorker thread' mach_msg timeout_ms:8 options=0x102 MACH_RCV_TIMEOUT=0x100
ts:132721022761 tid:0x7000163fa000 'DedicatedWorker thread' mach_msg done res:0x0
132721022769 0x7000163fa000 DedicatedWorker thread done wait_result=1
132721023877 0x7000163fa000 DedicatedWorker thread Wait for 1272 us
ts:132721023883 tid:0x7000163fa000 'DedicatedWorker thread' mach_msg timeout_ms:2 options=0x102 MACH_RCV_TIMEOUT=0x100
ts:132721024402 tid:0x7000163fa000 'DedicatedWorker thread' mach_msg done res:0x0
132721024409 0x7000163fa000 DedicatedWorker thread done wait_result=1
Start sampling 0x7000163fa000
Start profiler ======================================
132721031848 0x7000163fa000 DedicatedWorker thread Wait for 7964 us
ts:132721031858 tid:0x7000163fa000 'DedicatedWorker thread' mach_msg timeout_ms:8 options=0x102 MACH_RCV_TIMEOUT=0x100
ts:132722702260 tid:0x7000163fa000 'DedicatedWorker thread' mach_msg done res:0x0
132722702269 0x7000163fa000 DedicatedWorker thread done wait_result=1
132722702776 0x7000163fa000 DedicatedWorker thread Wait for 7969 us
ts:132722702784 tid:0x7000163fa000 'DedicatedWorker thread' mach_msg timeout_ms:8 options=0x102 MACH_RCV_TIMEOUT=0x100
ts:132724704005 tid:0x7000163fa000 'DedicatedWorker thread' mach_msg done res:0x0
132724704016 0x7000163fa000 DedicatedWorker thread done wait_result=1
Consider the last pair of mach_msg and 'mach_msg done' lines. The call duration is:
132724704005 - 132722702784 = 2001221 usec = 2sec !!!
That is despite the mach_msg was called with timeout=8msec argument.
,
Jun 7 2018
mach traps aren't BSD syscalls and aren't interrupted by signals. this works as "designed".
,
Jun 7 2018
It's not about mach traps being interrupted by a signal or not. It's about why sending signals to a thread residing in a trap causes it to disrespect timeout argument?
,
Jun 7 2018
and what I mean by that is that mach traps will also not look at the XNU AST for BSD signals most likely. so as a workaround, when in profile mode, do any cheap BSD syscall right after returning from that trap, this will cause the signal to be delivered, like e.g. getppid()/getpgrp()/.. which should be as cheap as a syscall can be and will run through the AST.
,
Jun 7 2018
@comment 4: I don't think it direspects the argument, I think you keep going into that mach_msg() and timeout but just the signals aren't delivered because it's mach code
,
Jun 7 2018
actually an even better call would be sigpending() which is exactly about that and more likely to remain an unoptimized syscall unlike getppid() (which is not optimized today but could be some day, like getpid is).
,
Jun 7 2018
Thanks for looking into it.
I tried calling sigpending at the end of signal handler, but it didn't help.
Actually I still did not get why you say mach_msg works as intended. the patch I applied is:
diff --git a/base/synchronization/waitable_event_mac.cc b/base/synchronization/waitable_event_mac.cc
index ad6f8cb0024d..2d10e86cd09f 100644
--- a/base/synchronization/waitable_event_mac.cc
+++ b/base/synchronization/waitable_event_mac.cc
@@ -142,8 +142,14 @@ bool WaitableEvent::TimedWaitUntil(const TimeTicks& end_time) {
rcv_size = 0;
}
+ fprintf(stderr, "ts:%llu tid:%p '%s' mach_msg timeout_ms:%u options=0x%x MACH_RCV_TIMEOUT=0x%x\n",
+ (TimeTicks::Now() - TimeTicks()).InMicroseconds(),
+ (void*)pthread_self(), PlatformThread::GetName(), timeout, options, MACH_RCV_TIMEOUT);
kern_return_t kr = mach_msg(&msg.header, options, 0, rcv_size,
receive_right_->Name(), timeout, MACH_PORT_NULL);
+ fprintf(stderr, "ts:%llu tid:%p '%s' mach_msg done res:0x%x\n",
+ (TimeTicks::Now() - TimeTicks()).InMicroseconds(),
+ (void*)pthread_self(), PlatformThread::GetName(), kr);
if (kr == KERN_SUCCESS) {
return true;
} else if (rcv_size == 0 && kr == MACH_RCV_TOO_LARGE) {
Which then produces the following output:
ts:132722702784 tid:0x7000163fa000 'DedicatedWorker thread' mach_msg timeout_ms:8 options=0x102 MACH_RCV_TIMEOUT=0x100
ts:132724704005 tid:0x7000163fa000 'DedicatedWorker thread' mach_msg done res:0x0
Which basically says that it spent 2 seconds in mach_msg function while it should have timeout after 8 milliseconds. Why is this considered an expected behavior?
,
Jun 7 2018
Hmm ok we need to look into that more closely, that is indeed strange, I had not understood what the logging was exactly.
,
Jun 7 2018
Looks like a signal arriving during mach_msg wait does some kind of reset of the timeout countdown. I put a log into the signal handler and with 1ms rate of signals the log is: ts:155810380708 tid:0x7000154f7000 'DedicatedWorker thread' mach_msg timeout_ms:8 options=0x102 MACH_RCV_TIMEOUT=0x100 155810381641 thread: 0x7000154f7000 Signal Handler 155810382917 thread: 0x7000154f7000 Signal Handler 155810384221 thread: 0x7000154f7000 Signal Handler ... lots of signal invocations ... 155810990037 thread: 0x7000154f7000 Signal Handler 155810991385 thread: 0x7000154f7000 Signal Handler 155810992568 thread: 0x7000154f7000 Signal Handler ts:155810993422 tid:0x7000154f7000 'DedicatedWorker thread' mach_msg done res:0x0 Reducing the signal rate down to 5ms: does not change the picture: ts:156333249493 tid:0x70001634c000 'DedicatedWorker thread' mach_msg timeout_ms:8 options=0x102 MACH_RCV_TIMEOUT=0x100 156333249680 thread: 0x70001634c000 Signal Handler 156333254831 thread: 0x70001634c000 Signal Handler 156333260878 thread: 0x70001634c000 Signal Handler ... cut ... 156335235431 thread: 0x70001634c000 Signal Handler 156335240691 thread: 0x70001634c000 Signal Handler 156335247091 thread: 0x70001634c000 Signal Handler ts:156335250965 tid:0x70001634c000 'DedicatedWorker thread' mach_msg done res:0x0 However if I raise the rate above 8ms (typical delay for mach_msg call) it starts working as intended: ts:156706300047 tid:0x70001750d000 'DedicatedWorker thread' mach_msg timeout_ms:8 options=0x102 MACH_RCV_TIMEOUT=0x100 ts:156706300053 tid:0x70001750d000 'DedicatedWorker thread' mach_msg done res:0x0 ts:156706300063 tid:0x70001750d000 'DedicatedWorker thread' mach_msg timeout_ms:8 options=0x102 MACH_RCV_TIMEOUT=0x100 156706302704 thread: 0x70001750d000 Signal Handler ts:156706311422 tid:0x70001750d000 'DedicatedWorker thread' mach_msg done res:0x10004003 ts:156706311792 tid:0x70001750d000 'DedicatedWorker thread' mach_msg timeout_ms:4 options=0x102 MACH_RCV_TIMEOUT=0x100 ts:156706311799 tid:0x70001750d000 'DedicatedWorker thread' mach_msg done res:0x0 ts:156706311821 tid:0x70001750d000 'DedicatedWorker thread' mach_msg timeout_ms:4 options=0x102 MACH_RCV_TIMEOUT=0x100 156706314588 thread: 0x70001750d000 Signal Handler ts:156706319173 tid:0x70001750d000 'DedicatedWorker thread' mach_msg done res:0x10004003 ts:156706319231 tid:0x70001750d000 'DedicatedWorker thread' mach_msg timeout_ms:1 options=0x102 MACH_RCV_TIMEOUT=0x100 ts:156706319238 tid:0x70001750d000 'DedicatedWorker thread' mach_msg done res:0x0 ts:156706319251 tid:0x70001750d000 'DedicatedWorker thread' mach_msg timeout_ms:1 options=0x102 MACH_RCV_TIMEOUT=0x100 ts:156706320501 tid:0x70001750d000 'DedicatedWorker thread' mach_msg done res:0x10004003 ts:156706321307 tid:0x70001750d000 'DedicatedWorker thread' mach_msg timeout_ms:8 options=0x102 MACH_RCV_TIMEOUT=0x100 ts:156706321321 tid:0x70001750d000 'DedicatedWorker thread' mach_msg done res:0x0 ts:156706321378 tid:0x70001750d000 'DedicatedWorker thread' mach_msg timeout_ms:8 options=0x102 MACH_RCV_TIMEOUT=0x100 156706325117 thread: 0x70001750d000 Signal Handler ts:156706334213 tid:0x70001750d000 'DedicatedWorker thread' mach_msg done res:0x10004003 ts:156706334772 tid:0x70001750d000 'DedicatedWorker thread' mach_msg timeout_ms:8 options=0x102 MACH_RCV_TIMEOUT=0x100 ts:156706334780 tid:0x70001750d000 'DedicatedWorker thread' mach_msg done res:0x0 ts:156706334796 tid:0x70001750d000 'DedicatedWorker thread' mach_msg timeout_ms:8 options=0x102 MACH_RCV_TIMEOUT=0x100 156706336725 thread: 0x70001750d000 Signal Handler ts:156706345764 tid:0x70001750d000 'DedicatedWorker thread' mach_msg done res:0x10004003 When it exits because of the timeout (ret code 0x10004003) it always happens in timeout_ms since the [last] signal (not since the mach_msg entry). E.g.: t0=156706321378 mach_msg with TO=8ms t1=156706325117 t0+3739us signal handler t2=156706334213 t1+9096us mach_msg exited Also note when the signal rate is too high, e.g. 1ms it never exits mach_msg because of the timeout, it always happens because of a message eventually received.
,
Jun 7 2018
From what I can see here: https://github.com/apple/darwin-xnu/blob/5394bb038891708cd4ba748da79b90a33b19f82e/libsyscall/mach/mach_msg.c If the first call to MACH_MSG_TRAP at 103 exits with MACH_RCV_INTERRUPTED it will reenter MACH_MSG_TRAP int a loop at 118 effectively restarting the timeout countdown. I think we can workaround it by suppling MACH_RCV_INTERRUPT and implementing the proper looping outside the mach_msg. I'll try that.
,
Jun 7 2018
Yeah I was about to suggest that :)
,
Jun 12 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/73d10458de09c192eb31947b03dbf20aec4a1529 commit 73d10458de09c192eb31947b03dbf20aec4a1529 Author: Alexei Filippov <alph@chromium.org> Date: Tue Jun 12 07:14:37 2018 Make WaitableEvent::TimedWaitUntil properly handle timeout in presence of signals. If the thread is interrupted during mach_msg(), the system call will be restarted. However, the libsyscall wrapper does not adjust the timeout by the amount of time already waited. Using MACH_RCV_INTERRUPT will instead return from mach_msg(), so that the call can be retried with an adjusted timeout. BUG= 848114 Change-Id: Ia009ea40b0fac619575eb6aa2988ae75f0e97413 Reviewed-on: https://chromium-review.googlesource.com/1092014 Reviewed-by: Robert Sesek <rsesek@chromium.org> Reviewed-by: Daniel Cheng <dcheng@chromium.org> Commit-Queue: Alexei Filippov <alph@chromium.org> Cr-Commit-Position: refs/heads/master@{#566347} [modify] https://crrev.com/73d10458de09c192eb31947b03dbf20aec4a1529/base/synchronization/waitable_event_mac.cc
,
Jun 12 2018
|
|||
►
Sign in to add a comment |
|||
Comment 1 by alph@chromium.org
, Jun 2 2018