Issue metadata
Sign in to add a comment
|
TaskSchedulerWorkerPoolBlockingTest.ThreadBlockedUnblocked/MAY_BLOCK flakes under Fuchsia/x64 bot |
||||||||||||||||||||||||
Issue descriptionIn https://ci.chromium.org/buildbot/chromium.fyi/Fuchsia/13139 the TaskSchedulerWorkerPoolBlockingTest.ThreadBlockedUnblocked/MAY_BLOCK test got stuck waiting for the worker capacity to increase to the expected level. This was running in a 4-VCPU VM. Based on previous experience investigating TaskScheduler test flakes on Fuchsia, there is most likely some unusual timing (e.g. longer than usual scheduling delays) causing the expected behaviour not to materialize. The on-hang thread dumper kicked in, so we can see: [00526.316] 03899.04248> [3:932631393:0119/183713.885692:526316093:ERROR:process_fuchsia.cc(184)] zx_object_wait_one timed out, signals=0, deadline=526315366000, now=526315910000, delta=544000 [00526.319] 03899.04248> [3:932631393:0119/183713.887929:526318359:ERROR:test_launcher.cc(370)] about to process.Terminate() 34ded14f [00526.321] 03899.04248> [3:932631393:0119/183713.891073:526321490:ERROR:test_launcher.cc(376)] Dumping threads in process 32362 [00526.344] 55859.55885> Backtrace of threads of process 32362: base_unittests [00526.344] 55859.55885> arch: x86_64 [00526.344] 55859.55885> 8 thread(s) [00526.345] 55859.55885> dso: id=61ed0023843a671a29692cd9e1acc44f8da29864 base=0x763d2ad81000 name=liblaunchpad.so [00526.346] 55859.55885> dso: id=a64a2d45c2d07d9ebf83d133a66c69208a84dd12 base=0x5c2c28142000 name=<vDSO> [00526.346] 55859.55885> dso: id=96b27c2e66b29943 base=0x4f21954fc000 name=app:base_unittests [00526.347] 55859.55885> dso: id=9dbb16995a4a104c1fa86bed0147aac7e3d780c9 base=0x44f62ec0b000 name=libc.so [00526.347] 55859.55885> dso: id=e3391cbd9a52eac57ec374685b7a9d28bd85c140 base=0x5bce2ce6000 name=libfdio.so [00526.350] 55859.55885> <== Thread initial-thread[32411] ==> #01: pc 0x5c2c28148676 sp 0x3a12d5461858 (<vDSO>,0x6676) #02: pc 0x5c2c281484cc sp 0x3a12d5461860 (<vDSO>,0x64cc) #03: pc 0x44f62ec5128f sp 0x3a12d5461870 (libc.so,0x4628f) #04: base::PlatformThread::Sleep(base::TimeDelta) at ??:? #05: base::internal::TaskSchedulerWorkerPoolBlockingTest::ExpectWorkerCapacityIncreasesTo(unsigned long) at ??:? #06: base::internal::TaskSchedulerWorkerPoolBlockingTest_ThreadBlockedUnblocked_Test::TestBody() at ??:? #07: testing::Test::Run() at ??:? #08: testing::TestInfo::Run() at ??:? #09: testing::TestCase::Run() at ??:? #10: testing::internal::UnitTestImpl::RunAllTests() at ??:? #11: testing::UnitTest::Run() at ??:? #12: base::TestSuite::Run() at ??:? #13: base::(anonymous namespace)::LaunchUnitTestsInternal(base::RepeatingCallback<int ()> const&, unsigned long, int, bool, base::RepeatingCallback<void ()> const&) at ??:? #14: base::LaunchUnitTests(int, char**, base::RepeatingCallback<int ()> const&) at ??:? #15: main at ??:? #16: pc 0x44f62ec2614e sp 0x3a12d5461fe0 (libc.so,0x1b14e) #17: pc 0 sp 0x3a12d5462000 [00526.368] 55859.55885> <== Thread TaskSchedulerServiceThread[42380] ==> #01: pc 0x5c2c2814883b sp 0x614dec642ce8 (<vDSO>,0x683b) #02: pc 0x5c2c28148569 sp 0x614dec642cf0 (<vDSO>,0x6569) #03: pc 0x44f62ec4f571 sp 0x614dec642d10 (libc.so,0x44571) #04: pc 0x44f62ec22f88 sp 0x614dec642d50 (libc.so,0x17f88) #05: base::ConditionVariable::Wait() at ??:? #06: base::WaitableEvent::TimedWaitUntil(base::TimeTicks const&) at ??:? #07: base::WaitableEvent::Wait() at ??:? #08: base::MessagePumpDefault::Run(base::MessagePump::Delegate*) at ??:? #09: base::RunLoop::Run() at ??:? #10: base::Thread::ThreadMain() at ??:? #11: base::(anonymous namespace)::ThreadFunc(void*) at ??:? #12: pc 0x44f62ec238b6 sp 0x614dec642fe0 (libc.so,0x188b6) #13: pc 0x44f62ec996da sp 0x614dec642ff0 (libc.so,0x8e6da) #14: pc 0 sp 0x614dec643000 [00526.384] 55859.55885> <== Thread TaskSchedulerAWorker[42387] ==> #01: pc 0x5c2c2814883b sp 0x24c561e91988 (<vDSO>,0x683b) #02: pc 0x5c2c28148569 sp 0x24c561e91990 (<vDSO>,0x6569) #03: pc 0x44f62ec4f571 sp 0x24c561e919b0 (libc.so,0x44571) #04: pc 0x44f62ec22f88 sp 0x24c561e919f0 (libc.so,0x17f88) #05: base::ConditionVariable::Wait() at ??:? #06: base::WaitableEvent::TimedWaitUntil(base::TimeTicks const&) at ??:? #07: base::WaitableEvent::Wait() at ??:? #08: base::internal::Invoker<base::internal::BindState<base::internal::TaskSchedulerWorkerPoolBlockingTest::SaturateWithBlockingTasks(base::internal::(anonymous namespace)::NestedBlockingType const&)::{lambda(base::RepeatingCallback<void ()>*, base::WaitableEvent*, base::internal::(anonymous namespace)::NestedBlockingType const&)#1}, base::internal::UnretainedWrapper<base::RepeatingCallback<void ()> >, base::internal::TaskSchedulerWorkerPoolBlockingTest::SaturateWithBlockingTasks(base::internal::(anonymous namespace)::NestedBlockingType const&)::{lambda(base::RepeatingCallback<void ()>*, base::WaitableEvent*, base::internal::(anonymous namespace)::NestedBlockingType const&)#1}<base::WaitableEvent>, base::internal::(anonymous namespace)::NestedBlockingType>, void ()>::RunOnce(base::internal::BindStateBase*) at ??:? #09: base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) at ??:? #10: base::internal::TaskTracker::RunOrSkipTask(base::internal::Task, base::internal::Sequence*, bool) at ??:? #11: base::internal::TaskTracker::RunNextTask(scoped_refptr<base::internal::Sequence>, base::internal::CanScheduleSequenceObserver*) at ??:? #12: base::internal::SchedulerWorker::Thread::ThreadMain() at ??:? #13: base::(anonymous namespace)::ThreadFunc(void*) at ??:? #14: pc 0x44f62ec238b6 sp 0x24c561e91fe0 (libc.so,0x188b6) #15: pc 0x44f62ec996da sp 0x24c561e91ff0 (libc.so,0x8e6da) #16: pc 0 sp 0x24c561e92000 [00526.400] 55859.55885> <== Thread TaskSchedulerAWorker[42392] ==> #01: pc 0x5c2c2814883b sp 0x6e450539f988 (<vDSO>,0x683b) #02: pc 0x5c2c28148569 sp 0x6e450539f990 (<vDSO>,0x6569) #03: pc 0x44f62ec4f571 sp 0x6e450539f9b0 (libc.so,0x44571) #04: pc 0x44f62ec22f88 sp 0x6e450539f9f0 (libc.so,0x17f88) #05: base::ConditionVariable::Wait() at ??:? #06: base::WaitableEvent::TimedWaitUntil(base::TimeTicks const&) at ??:? #07: base::WaitableEvent::Wait() at ??:? #08: base::internal::Invoker<base::internal::BindState<base::internal::TaskSchedulerWorkerPoolBlockingTest::SaturateWithBlockingTasks(base::internal::(anonymous namespace)::NestedBlockingType const&)::{lambda(base::RepeatingCallback<void ()>*, base::WaitableEvent*, base::internal::(anonymous namespace)::NestedBlockingType const&)#1}, base::internal::UnretainedWrapper<base::RepeatingCallback<void ()> >, base::internal::TaskSchedulerWorkerPoolBlockingTest::SaturateWithBlockingTasks(base::internal::(anonymous namespace)::NestedBlockingType const&)::{lambda(base::RepeatingCallback<void ()>*, base::WaitableEvent*, base::internal::(anonymous namespace)::NestedBlockingType const&)#1}<base::WaitableEvent>, base::internal::(anonymous namespace)::NestedBlockingType>, void ()>::RunOnce(base::internal::BindStateBase*) at ??:? #09: base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) at ??:? #10: base::internal::TaskTracker::RunOrSkipTask(base::internal::Task, base::internal::Sequence*, bool) at ??:? #11: base::internal::TaskTracker::RunNextTask(scoped_refptr<base::internal::Sequence>, base::internal::CanScheduleSequenceObserver*) at ??:? #12: base::internal::SchedulerWorker::Thread::ThreadMain() at ??:? #13: base::(anonymous namespace)::ThreadFunc(void*) at ??:? #14: pc 0x44f62ec238b6 sp 0x6e450539ffe0 (libc.so,0x188b6) #15: pc 0x44f62ec996da sp 0x6e450539fff0 (libc.so,0x8e6da) #16: pc 0 sp 0x6e45053a0000 [00526.416] 55859.55885> <== Thread TaskSchedulerAWorker[42397] ==> #01: pc 0x5c2c2814883b sp 0x371375360d18 (<vDSO>,0x683b) #02: pc 0x5c2c28148569 sp 0x371375360d20 (<vDSO>,0x6569) #03: pc 0x44f62ec4f571 sp 0x371375360d40 (libc.so,0x44571) #04: pc 0x44f62ec22f88 sp 0x371375360d80 (libc.so,0x17f88) #05: base::ConditionVariable::Wait() at ??:? #06: base::WaitableEvent::TimedWaitUntil(base::TimeTicks const&) at ??:? #07: base::WaitableEvent::Wait() at ??:? #08: base::internal::SchedulerWorker::Delegate::WaitForWork(base::WaitableEvent*) at ??:? #09: base::internal::SchedulerWorker::Thread::ThreadMain() at ??:? #10: base::(anonymous namespace)::ThreadFunc(void*) at ??:? #11: pc 0x44f62ec238b6 sp 0x371375360fe0 (libc.so,0x188b6) #12: pc 0x44f62ec996da sp 0x371375360ff0 (libc.so,0x8e6da) #13: pc 0 sp 0x371375361000 [00526.430] 55859.55885> <== Thread TaskSchedulerAWorker[42402] ==> #01: pc 0x5c2c2814883b sp 0x788d04a6b988 (<vDSO>,0x683b) #02: pc 0x5c2c28148569 sp 0x788d04a6b990 (<vDSO>,0x6569) #03: pc 0x44f62ec4f571 sp 0x788d04a6b9b0 (libc.so,0x44571) #04: pc 0x44f62ec22f88 sp 0x788d04a6b9f0 (libc.so,0x17f88) #05: base::ConditionVariable::Wait() at ??:? #06: base::WaitableEvent::TimedWaitUntil(base::TimeTicks const&) at ??:? #07: base::WaitableEvent::Wait() at ??:? #08: base::internal::Invoker<base::internal::BindState<base::internal::TaskSchedulerWorkerPoolBlockingTest::SaturateWithBlockingTasks(base::internal::(anonymous namespace)::NestedBlockingType const&)::{lambda(base::RepeatingCallback<void ()>*, base::WaitableEvent*, base::internal::(anonymous namespace)::NestedBlockingType const&)#1}, base::internal::UnretainedWrapper<base::RepeatingCallback<void ()> >, base::internal::TaskSchedulerWorkerPoolBlockingTest::SaturateWithBlockingTasks(base::internal::(anonymous namespace)::NestedBlockingType const&)::{lambda(base::RepeatingCallback<void ()>*, base::WaitableEvent*, base::internal::(anonymous namespace)::NestedBlockingType const&)#1}<base::WaitableEvent>, base::internal::(anonymous namespace)::NestedBlockingType>, void ()>::RunOnce(base::internal::BindStateBase*) at ??:? #09: base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) at ??:? #10: base::internal::TaskTracker::RunOrSkipTask(base::internal::Task, base::internal::Sequence*, bool) at ??:? #11: base::internal::TaskTracker::RunNextTask(scoped_refptr<base::internal::Sequence>, base::internal::CanScheduleSequenceObserver*) at ??:? #12: base::internal::SchedulerWorker::Thread::ThreadMain() at ??:? #13: base::(anonymous namespace)::ThreadFunc(void*) at ??:? #14: pc 0x44f62ec238b6 sp 0x788d04a6bfe0 (libc.so,0x188b6) #15: pc 0x44f62ec996da sp 0x788d04a6bff0 (libc.so,0x8e6da) #16: pc 0 sp 0x788d04a6c000 [00527.695] 55859.55885> <== Thread TaskSchedulerAWorker[42458] ==> #01: pc 0x5c2c2814883b sp 0x38e2009a1988 (<vDSO>,0x683b) #02: pc 0x5c2c28148569 sp 0x38e2009a1990 (<vDSO>,0x6569) #03: pc 0x44f62ec4f571 sp 0x38e2009a19b0 (libc.so,0x44571) #04: pc 0x44f62ec22f88 sp 0x38e2009a19f0 (libc.so,0x17f88) #05: base::ConditionVariable::Wait() at ??:? #06: base::WaitableEvent::TimedWaitUntil(base::TimeTicks const&) at ??:? #07: base::WaitableEvent::Wait() at ??:? #08: base::internal::Invoker<base::internal::BindState<base::internal::TaskSchedulerWorkerPoolBlockingTest::SaturateWithBlockingTasks(base::internal::(anonymous namespace)::NestedBlockingType const&)::{lambda(base::RepeatingCallback<void ()>*, base::WaitableEvent*, base::internal::(anonymous namespace)::NestedBlockingType const&)#1}, base::internal::UnretainedWrapper<base::RepeatingCallback<void ()> >, base::internal::TaskSchedulerWorkerPoolBlockingTest::SaturateWithBlockingTasks(base::internal::(anonymous namespace)::NestedBlockingType const&)::{lambda(base::RepeatingCallback<void ()>*, base::WaitableEvent*, base::internal::(anonymous namespace)::NestedBlockingType const&)#1}<base::WaitableEvent>, base::internal::(anonymous namespace)::NestedBlockingType>, void ()>::RunOnce(base::internal::BindStateBase*) at ??:? #09: base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) at ??:? #10: base::internal::TaskTracker::RunOrSkipTask(base::internal::Task, base::internal::Sequence*, bool) at ??:? #11: base::internal::TaskTracker::RunNextTask(scoped_refptr<base::internal::Sequence>, base::internal::CanScheduleSequenceObserver*) at ??:? #12: base::internal::SchedulerWorker::Thread::ThreadMain() at ??:? #13: base::(anonymous namespace)::ThreadFunc(void*) at ??:? #14: pc 0x44f62ec238b6 sp 0x38e2009a1fe0 (libc.so,0x188b6) #15: pc 0x44f62ec996da sp 0x38e2009a1ff0 (libc.so,0x8e6da) #16: pc 0 sp 0x38e2009a2000 [00527.710] 55859.55885> <== Thread TaskSchedulerAWorker[42463] ==> #01: pc 0x5c2c2814883b sp 0x2ae6cb135d18 (<vDSO>,0x683b) #02: pc 0x5c2c28148569 sp 0x2ae6cb135d20 (<vDSO>,0x6569) #03: pc 0x44f62ec4f571 sp 0x2ae6cb135d40 (libc.so,0x44571) #04: pc 0x44f62ec22f88 sp 0x2ae6cb135d80 (libc.so,0x17f88) #05: base::ConditionVariable::Wait() at ??:? #06: base::WaitableEvent::TimedWaitUntil(base::TimeTicks const&) at ??:? #07: base::WaitableEvent::Wait() at ??:? #08: base::internal::SchedulerWorker::Delegate::WaitForWork(base::WaitableEvent*) at ??:? #09: base::internal::SchedulerWorker::Thread::ThreadMain() at ??:? #10: base::(anonymous namespace)::ThreadFunc(void*) at ??:? #11: pc 0x44f62ec238b6 sp 0x2ae6cb135fe0 (libc.so,0x188b6) #12: pc 0x44f62ec996da sp 0x2ae6cb135ff0 (libc.so,0x8e6da) #13: pc 0 sp 0x2ae6cb136000 [00527.775] 03899.04248> [3:932631393:0119/183715.345318:527775749:ERROR:test_launcher.cc(398)] going to zx_task_kill(job) for 34ded14f [00527.779] 03899.04248> [3:932631393:0119/183715.348923:527779330:ERROR:test_launcher.cc(415)] removing 34ded14f from live process list
,
Jan 19 2018
IIUC this could result from the |service_thread_task_runner_| not getting scheduled in time, since it is responsible for increasing the worker-pool capacity - WDYT?
,
Jan 22 2018
@wez: neat! Love the on-hang thread dumper :) -- is it fuschia only? can its output be dropped in-place in the error snippet for the failing test ([1] in this case)? fdoray@ is the expert for ScopedBlockingCall (and related tests): can you PTaL? Looks like we fail to reach expectations to increase worker capacity on Fuschia. [1] https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.fyi%2FFuchsia%2F13139%2F%2B%2Frecipes%2Fsteps%2Fbase_unittests%2F0%2Flogs%2FTaskSchedulerWorkerPoolBlockingTest.ThreadBlockedUnblocked__x2f_MAY_BLOCK%2F0
,
Jan 22 2018
Re #3: This is currently using a Fuchsia-specific 'threads' command which iterates over the threads in a target process, suspending, dumping and resuming them in turn. We invoke the command when TestLauncher sees a job process take too long to execute; I'll check whether we can redirect its output into the sub-process' stdout/stderr pipe rather than direct to console stdout. Baking something equivalent into TestLauncher, cross-platform could also be useful, yes, but may mean introducing Crashpad/Breakpad dependencies in TestLauncher, for it to be able to dump threads from another process remotely.
,
Feb 26 2018
This flaked again on our Fuchsia/x64/Debug FYI bot at https://build.chromium.org/p/chromium.fyi/builders/Fuchsia%20%28dbg%29/builds/16661.
,
Mar 1 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0fa2041e49fc2c94f22c314bde7ef0ab5ffe105c commit 0fa2041e49fc2c94f22c314bde7ef0ab5ffe105c Author: Gabriel Charette <gab@chromium.org> Date: Thu Mar 01 15:30:20 2018 [SchedulerWorkerPoolImpl] Introduce WaitForNWorkersCleanedUpForTesting() And use it instead of flaky sleep timings in tests that exercise the reclaim timeout. Bonus: this speeds up these tests from ~1500ms to ~500ms in the normal case. R=fdoray@chromium.org, robliao@chromium.org Bug: 735701 , 803900 Change-Id: I3b271fda6c7b42033ac7b3c128e8663e80acebb3 Reviewed-on: https://chromium-review.googlesource.com/937505 Commit-Queue: Gabriel Charette <gab@chromium.org> Reviewed-by: François Doray <fdoray@chromium.org> Reviewed-by: Robert Liao <robliao@chromium.org> Cr-Commit-Position: refs/heads/master@{#540162} [modify] https://crrev.com/0fa2041e49fc2c94f22c314bde7ef0ab5ffe105c/base/task_scheduler/scheduler_worker_pool_impl.cc [modify] https://crrev.com/0fa2041e49fc2c94f22c314bde7ef0ab5ffe105c/base/task_scheduler/scheduler_worker_pool_impl.h [modify] https://crrev.com/0fa2041e49fc2c94f22c314bde7ef0ab5ffe105c/base/task_scheduler/scheduler_worker_pool_impl_unittest.cc
,
Mar 6 2018
Flaked again in https://ci.chromium.org/buildbot/chromium.fyi/Fuchsia%20(dbg)/16915 and https://build.chromium.org/p/chromium.fyi/builders/Fuchsia%20%28dbg%29/builds/16894 with a hang waiting for a capacity increase. In one failure there are eight threads in total (four in SaturateWithBlockingTasks, three idle, one initial-thread), in the other there are seven (four in SaturateWithBlockingTasks, two idle, one initial-thread).
,
May 3 2018
This was also caused by rogue SchedulerWorkers IMO. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by w...@chromium.org
, Jan 19 2018