Destabilized use of LockImplPosix starting around 3/21 |
||||||
Issue description
Destablized use of LockImplPosix starting around 3/21.
Failure log:
3/17 (Possibly Unrelated) base_unittests Exception Failure to Complete
TaskSchedulerSingleThreadTaskRunnerManagerTest.PostDelayedTask [STACK1]
3/21
16:42 First Recorded Failure
LockImplPosix: Check failed: rv == 0 (22 vs. 0). Invalid argument
https://uberchromegw.corp.google.com/i/internal.bling.main/builders/smoke/builds/14764
4 Successful Runs
19:44 https://uberchromegw.corp.google.com/i/internal.bling.main/builders/smoke/builds/14769
2 Successful Runs
21:41 https://uberchromegw.corp.google.com/i/internal.bling.main/builders/smoke/builds/14772
5 Successful Runs
04:44 https://uberchromegw.corp.google.com/i/internal.bling.main/builders/smoke/builds/14778
3 Days Pass
3/24
11:52 https://uberchromegw.corp.google.com/i/internal.bling.main/builders/smoke/builds/14828
TaskSchedulerSingleThreadTaskRunnerManagerTest.PrioritySetCorrectly [STACK2]
3/24
19:47 https://uberchromegw.corp.google.com/i/internal.bling.main/builders/smoke/builds/14836
TaskSchedulerSingleThreadTaskRunnerManagerTest.PostDelayedTask [STACK1]
3/24
21:47 https://uberchromegw.corp.google.com/i/internal.bling.main/builders/smoke/builds/14839
2 Successful Runs
3/25
10:53 https://uberchromegw.corp.google.com/i/internal.bling.main/builders/smoke/builds/14843
7 Successful Runs
3/27
03:27 https://uberchromegw.corp.google.com/i/internal.bling.main/builders/smoke/builds/14851
TaskSchedulerWorkerPoolCheckTlsReuse.CheckDetachedThreads [STACK3]
3/24
19:47 https://uberchromegw.corp.google.com/i/internal.bling.main/builders/smoke/builds/14854
[STACK1]
[ RUN ] TaskSchedulerSingleThreadTaskRunnerManagerTest.PostDelayedTask
[31612:125479:0321/164626.669783:151662621172523:FATAL:lock_impl_posix.cc(65)] Check failed: rv == 0 (22 vs. 0). Invalid argument
0 base_unittests 0x013f8be6 base::debug::StackTrace::StackTrace(unsigned long) + 134
1 base_unittests 0x013f8c64 base::debug::StackTrace::StackTrace(unsigned long) + 36
2 base_unittests 0x013f73e7 base::debug::StackTrace::StackTrace() + 39
3 base_unittests 0x0147239c logging::LogMessage::~LogMessage() + 92
4 base_unittests 0x014700d7 logging::LogMessage::~LogMessage() + 23
5 base_unittests 0x0159549e base::internal::LockImpl::Lock() + 334
6 base_unittests 0x002fe2cf base::Lock::Acquire() + 31
7 base_unittests 0x015b7a83 base::internal::SchedulerLockImpl::Acquire() + 35
8 base_unittests 0x00e7d2e4 base::internal::AutoSchedulerLock::AutoSchedulerLock(base::internal::SchedulerLock&) + 36
9 base_unittests 0x00e7d294 base::internal::AutoSchedulerLock::AutoSchedulerLock(base::internal::SchedulerLock&) + 36
10 base_unittests 0x015dd337 base::internal::TaskTracker::IsShutdownComplete() const + 39
11 base_unittests 0x015c441d base::internal::SchedulerWorker::ShouldExit() + 29
12 base_unittests 0x015c49cc base::internal::SchedulerWorker::Thread::ThreadMain() + 396
13 base_unittests 0x015ec577 base::(anonymous namespace)::ThreadFunc(void*) + 519
14 libsystem_pthread.dylib 0x0992e780 _pthread_body + 138
15 libsystem_pthread.dylib 0x0992e6f6 _pthread_body + 0
16 libsystem_pthread.dylib 0x0992bf7a thread_start + 34
[STACK2]
[ RUN ] TaskSchedulerSingleThreadTaskRunnerManagerTest.PrioritySetCorrectly
[57069:1803:0324/195225.144893:72225941318571:FATAL:lock_impl_posix.cc(53)] Check failed: rv == 0 (16 vs. 0). Resource busy
0 base_unittests 0x013d5796 base::debug::StackTrace::StackTrace(unsigned long) + 134
1 base_unittests 0x013d5814 base::debug::StackTrace::StackTrace(unsigned long) + 36
2 base_unittests 0x013d3f97 base::debug::StackTrace::StackTrace() + 39
3 base_unittests 0x0144ef4c logging::LogMessage::~LogMessage() + 92
4 base_unittests 0x0144cc87 logging::LogMessage::~LogMessage() + 23
5 base_unittests 0x015721b3 base::internal::LockImpl::~LockImpl() + 307
6 base_unittests 0x015721d7 base::internal::LockImpl::~LockImpl() + 23
7 base_unittests 0x01571683 base::Lock::~Lock() + 259
8 base_unittests 0x015716a7 base::Lock::~Lock() + 23
9 base_unittests 0x015946b0 base::internal::SchedulerLockImpl::~SchedulerLockImpl() + 64
10 base_unittests 0x00e00b37 base::internal::SchedulerLock::~SchedulerLock() + 23
11 base_unittests 0x00e005e7 base::internal::SchedulerLock::~SchedulerLock() + 23
12 base_unittests 0x015b97f2 base::internal::TaskTracker::~TaskTracker() + 274
13 base_unittests 0x015b9967 base::internal::TaskTracker::~TaskTracker() + 23
14 base_unittests 0x00e0d936 base::internal::(anonymous namespace)::TaskSchedulerSingleThreadTaskRunnerManagerTest::~TaskSchedulerSingleThreadTaskRunnerManagerTest() + 214
15 base_unittests 0x00e11097 base::internal::TaskSchedulerSingleThreadTaskRunnerManagerTest_PrioritySetCorrectly_Test::~TaskSchedulerSingleThreadTaskRunnerManagerTest_PrioritySetCorrectly_Test() + 23
16 base_unittests 0x00e0c137 base::internal::TaskSchedulerSingleThreadTaskRunnerManagerTest_PrioritySetCorrectly_Test::~TaskSchedulerSingleThreadTaskRunnerManagerTest_PrioritySetCorrectly_Test() + 23
17 base_unittests 0x00e0c15a base::internal::TaskSchedulerSingleThreadTaskRunnerManagerTest_PrioritySetCorrectly_Test::~TaskSchedulerSingleThreadTaskRunnerManagerTest_PrioritySetCorrectly_Test() + 26
18 base_unittests 0x017b5f76 testing::Test::DeleteSelf_() + 38
19 base_unittests 0x017df3d7 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 103
20 base_unittests 0x017b413a void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 106
21 base_unittests 0x017b5db8 testing::TestInfo::Run() + 328
22 base_unittests 0x017b7494 testing::TestCase::Run() + 260
[STACK3]
[ RUN ] TaskSchedulerWorkerPoolCheckTlsReuse.CheckDetachedThreads
[17234:118591:0327/080303.411960:21984564534804:FATAL:lock_impl_posix.cc(65)] Check failed: rv == 0 (22 vs. 0). Invalid argument
0 base_unittests 0x01406566 base::debug::StackTrace::StackTrace(unsigned long) + 134
1 base_unittests 0x014065e4 base::debug::StackTrace::StackTrace(unsigned long) + 36
2 base_unittests 0x01404d67 base::debug::StackTrace::StackTrace() + 39
3 base_unittests 0x0147fd1c logging::LogMessage::~LogMessage() + 92
4 base_unittests 0x0147da57 logging::LogMessage::~LogMessage() + 23
5 base_unittests 0x015a324e base::internal::LockImpl::Lock() + 334
6 base_unittests 0x002fa7df base::Lock::Acquire() + 31
7 base_unittests 0x015c5533 base::internal::SchedulerLockImpl::Acquire() + 35
8 base_unittests 0x00e85d64 base::internal::AutoSchedulerLock::AutoSchedulerLock(base::internal::SchedulerLock&) + 36
9 base_unittests 0x00e85d14 base::internal::AutoSchedulerLock::AutoSchedulerLock(base::internal::SchedulerLock&) + 36
10 base_unittests 0x015eade7 base::internal::TaskTracker::IsShutdownComplete() const + 39
11 base_unittests 0x015d1ecd base::internal::SchedulerWorker::ShouldExit() + 29
12 base_unittests 0x015d247c base::internal::SchedulerWorker::Thread::ThreadMain() + 396
13 base_unittests 0x015fa027 base::(anonymous namespace)::ThreadFunc(void*) + 519
14 libsystem_pthread.dylib 0x0998c780 _pthread_body + 138
15 libsystem_pthread.dylib 0x0998c6f6 _pthread_body + 0
16 libsystem_pthread.dylib 0x09989f7a thread_start + 34
,
Mar 27 2017
,
Mar 27 2017
Updated the description for accuracy.
,
Mar 27 2017
,
Mar 27 2017
I am able to repro this on device only with gtest_repeat. This has not yet reproed for me in the simulator.
,
Mar 27 2017
SchedulerWorkers are somehow outliving the test.
When the lock returns 22 (invalid argument), it's a simple case where the test has returned and the SchedulerWorker is still running
* thread #1: tid = 0xac1fd4, 0x0000000186085e1c libsystem_kernel.dylib`__psynch_cvwait + 8, queue = 'com.apple.main-thread'
frame #0: 0x0000000186085e1c libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000018614ab74 libsystem_pthread.dylib`_pthread_cond_wait + 640
frame #2: 0x00000001011f2658 base_unittests`base::ConditionVariable::Wait(this=0x000000016fdb82e8) + 92 at condition_variable_posix.cc:69
frame #3: 0x00000001011f4ca0 base_unittests`base::WaitableEvent::TimedWaitUntil(this=0x000000016fdb89a0, end_time=0x000000016fdb83a8) + 548 at waitable_event_posix.cc:225
frame #4: 0x00000001011f49ac base_unittests`base::WaitableEvent::Wait(this=0x000000016fdb89a0) + 88 at waitable_event_posix.cc:157
frame #5: 0x0000000100ba0cc4 base_unittests`base::internal::TaskSchedulerSingleThreadTaskRunnerManagerTest_PrioritySetCorrectly_Test::TestBody(this=0x0000000103428cf0) + 1248 at scheduler_single_thread_task_runner_manager_unittest.cc:187
Both threads appear to reference different locks, suggesting this thread might also have outlived its test.
Test Thread
(lldb) print this->task_tracker.shutdown_lock_
(base::internal::SchedulerLock) $6 = {
base::internal::SchedulerLockImpl = {
lock_ = {
owning_thread_ref_ = {
id_ = 0x0000000000000000
}
lock_ = {
native_handle_ = (__sig = 1297437784, __opaque = char [56] @ 0x00007fb365801580)
}
}
}
}
Worker Thread
(lldb) print this->shutdown_lock_
(base::internal::SchedulerLock) $5 = {
base::internal::SchedulerLockImpl = {
lock_ = {
owning_thread_ref_ = {
id_ = 0x0000000000000000
}
lock_ = {
native_handle_ = (__sig = 1297437784, __opaque = char [56] @ 0x00007fb355f21020)
}
}
}
}
I have not yet reproed 16 (Resource busy).
,
Mar 27 2017
,
Mar 28 2017
This is likely caused by the SingleThreadTaskRunner unregistration path. In this path, we release the SchedulerWorker from the SchedulerSingleThreadTaskRunnerManager and instruct it to do a non-blocking termination. Because we release the reference, we can no longer perform a Join on the thread.
,
Mar 29 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/624dc15f7cb6b404e607f536aef0c18d325903ee commit 624dc15f7cb6b404e607f536aef0c18d325903ee Author: robliao <robliao@chromium.org> Date: Wed Mar 29 18:59:09 2017 Change SchedulerWorker::ShouldExit Checks Ordering A SchedulerWorker may outlive the test environment when it's released as a result of SingleThreadTaskRunner unregistration in the SchedulerSingleThreadTaskRunnerManager. As a result, SchedulerWorker should prefer checking its own fields first to validate that it should continue to run before consulting the environment provided TaskTracker. BUG= 705500 Review-Url: https://codereview.chromium.org/2781933003 Cr-Commit-Position: refs/heads/master@{#460475} [modify] https://crrev.com/624dc15f7cb6b404e607f536aef0c18d325903ee/base/task_scheduler/scheduler_worker.cc
,
Mar 29 2017
This should be fixed now. Will monitor the bot. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by robliao@chromium.org
, Mar 27 2017