New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 672534 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Dec 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 650725



Sign in to add a comment

Clearing LocalDeviceInfoProvider causes race condition crash in DeviceInfoSyncBridge::ApplySyncChanges

Project Member Reported by s...@chromium.org, Dec 8 2016

Issue description

Found this crash running TwoClientAppsSyncTest.UnexpectedLaunchType but it should be theoretically possible to hit for real clients, although probably quite rare. Disabling sync / signing out / clearing local data, we clear the local device provider https://cs.chromium.org/chromium/src/components/browser_sync/profile_sync_service.cc?l=815

However, the current USS DeviceInfo implementation does not correctly account for this possibility. If we have outstanding foreign changes that are working their way across threads (sync -> ui), it's possible  DeviceInfoSyncBridge::ApplySyncChanges are about to get called right after the LocalDeviceInfoProvider::Clear call happens on the UI thread. We crash at https://cs.chromium.org/chromium/src/components/sync/device_info/device_info_sync_bridge.cc?l=176

While the DeviceInfoSyncBridge::ApplySyncChanges DCHECKs |has_provider_initialized_|, this doesn't get updated on LocalDeviceInfoProvider::Clear.

BrowserTestBase received signal: Segmentation fault. Backtrace:
#0 0x7fa8aaa4d1de base::debug::StackTrace::StackTrace()
#1 0x000004d98867 content::(anonymous namespace)::DumpStackTraceSignalHandler()
#2 0x7fa892bddcb0 <unknown>
#3 0x00000060a92c logging::CheckOpResult::message()
#4 0x00000060a815 std::string::_M_rep()
#5 0x00000060d6e5 std::string::size()
#6 0x000000640ba6 _ZSteqIcEN9__gnu_cxx11__enable_ifIXsr9__is_charIT_EE7__valueEbE6__typeERKSbIS2_St11char_traitsIS2_ESaIS2_EESA_
#7 0x0000050634a6 syncer::DeviceInfoSyncBridge::ApplySyncChanges()
#8 0x0000017bb3ab syncer::SharedModelTypeProcessor::OnUpdateReceived()
#9 0x0000009fcd24 _ZN4base8internal13FunctorTraitsIMN10extensions11ImageLoaderEFvRKNS_8CallbackIFvRKN3gfx11ImageFamilyEELNS0_8CopyModeE1ELNS0_10RepeatModeE1EEERKNSt7__debug6vectorINS3_10LoadResultESaISH_EEEEvE6InvokeIRKNS_7WeakPtrIS3_EEJSE_SL_EEEvSN_OT_DpOT0_
#10 0x0000009fe434 _ZN4base8internal12InvokeHelperILb1EvE8MakeItSoIRKMN10extensions11ImageLoaderEFvRKNS_8CallbackIFvRKN3gfx11ImageFamilyEELNS0_8CopyModeE1ELNS0_10RepeatModeE1EEERKNSt7__debug6vectorINS5_10LoadResultESaISJ_EEEERKNS_7WeakPtrIS5_EEJSG_SN_EEEvOT_OT0_DpOT1_
#11 0x000001851cb4 _ZN4base8internal7InvokerINS0_9BindStateIMN6syncer18ModelTypeProcessorEFvRKN7sync_pb14ModelTypeStateERKNSt7__debug6vectorINS3_18UpdateResponseDataESaISB_EEEEJNS_7WeakPtrIS4_EES6_SD_EEEFvvEE7RunImplIRKSH_RKSt5tupleIJSJ_S6_SD_EEJLm0ELm1ELm2EEEEvOT_OT0_NS_13IndexSequenceIJXspT1_EEEE
#12 0x000001851b8c _ZN4base8internal7InvokerINS0_9BindStateIMN6syncer18ModelTypeProcessorEFvRKN7sync_pb14ModelTypeStateERKNSt7__debug6vectorINS3_18UpdateResponseDataESaISB_EEEEJNS_7WeakPtrIS4_EES6_SD_EEEFvvEE3RunEPNS0_13BindStateBaseE
#13 0x7fa8aaa53001 _ZNO4base8internal8RunMixinINS_8CallbackIFvvELNS0_8CopyModeE0ELNS0_10RepeatModeE0EEEE3RunEv
#14 0x7fa8aaa529d2 base::debug::TaskAnnotator::RunTask()
#15 0x7fa8aaae38fa base::MessageLoop::RunTask()
#16 0x7fa8aaae3b84 base::MessageLoop::DeferOrRunPendingTask()
#17 0x7fa8aaae3e6e base::MessageLoop::DoWork()
#18 0x7fa8aaafc1cc base::MessagePumpGlib::HandleDispatch()
#19 0x7fa8aaafc991 base::(anonymous namespace)::WorkSourceDispatch()
#20 0x7fa8957d8e04 g_main_context_dispatch
#21 0x7fa8957d9048 <unknown>
#22 0x7fa8957d90ec g_main_context_iteration
#23 0x7fa8aaafc2cf base::MessagePumpGlib::Run()
#24 0x7fa8aaae347a base::MessageLoop::RunHandler()
#25 0x7fa8aab90153 base::RunLoop::Run()
#26 0x000004dc3413 content::RunThisRunLoop()
#27 0x000004dc415f content::MessageLoopRunner::Run()
#28 0x000004dc47fd content::WindowedNotificationObserver::Wait()
#29 0x00000178da44 InProcessBrowserTest::CloseBrowserSynchronously()
#30 0x000001774496 SyncTest::TearDownOnMainThread()
#31 0x00000178eaf0 InProcessBrowserTest::RunTestOnMainThreadLoop()
#32 0x000004d98583 content::BrowserTestBase::ProxyRunTestOnMainThreadLoop()
#33 0x000000740b45 _ZN4base8internal13FunctorTraitsIMN18OAuth2TokenService7FetcherEFvvEvE6InvokeIPS3_JEEEvS5_OT_DpOT0_
#34 0x000000740a61 _ZN4base8internal12InvokeHelperILb0EvE8MakeItSoIRKMN18OAuth2TokenService7FetcherEFvvEJPS5_EEEvOT_DpOT0_
#35 0x000004d99ad7 _ZN4base8internal7InvokerINS0_9BindStateIMN7content15BrowserTestBaseEFvvEJNS0_17UnretainedWrapperIS4_EEEEEFvvEE7RunImplIRKS6_RKSt5tupleIJS8_EEJLm0EEEEvOT_OT0_NS_13IndexSequenceIJXspT1_EEEE
#36 0x000004d99a1c _ZN4base8internal7InvokerINS0_9BindStateIMN7content15BrowserTestBaseEFvvEJNS0_17UnretainedWrapperIS4_EEEEEFvvEE3RunEPNS0_13BindStateBaseE
#37 0x00000076ce3b base::internal::RunMixin<>::Run()
#38 0x0000023755d5 ChromeBrowserMainParts::PreMainMessageLoopRunImpl()
#39 0x0000023740e0 ChromeBrowserMainParts::PreMainMessageLoopRun()
#40 0x7fa8a40f2791 content::BrowserMainLoop::PreMainMessageLoopRun()
#41 0x7fa8a3718aa5 _ZN4base8internal13FunctorTraitsIMN7content22IndexedDBCallbacksImpl13InternalStateEFvvEvE6InvokeIPS4_JEEEvS6_OT_DpOT0_
#42 0x7fa8a40fad41 _ZN4base8internal12InvokeHelperILb0EiE8MakeItSoIRKMN7content15BrowserMainLoopEFivEJPS5_EEEiOT_DpOT0_
#43 0x7fa8a40face7 _ZN4base8internal7InvokerINS0_9BindStateIMN7content15BrowserMainLoopEFivEJNS0_17UnretainedWrapperIS4_EEEEEFivEE7RunImplIRKS6_RKSt5tupleIJS8_EEJLm0EEEEiOT_OT0_NS_13IndexSequenceIJXspT1_EEEE
#44 0x7fa8a40fac2c _ZN4base8internal7InvokerINS0_9BindStateIMN7content15BrowserMainLoopEFivEJNS0_17UnretainedWrapperIS4_EEEEEFivEE3RunEPNS0_13BindStateBaseE
#45 0x7fa8a360191b base::internal::RunMixin<>::Run()
#46 0x7fa8a4cc342b content::StartupTaskRunner::RunAllTasksNow()
#47 0x7fa8a40f0590 content::BrowserMainLoop::CreateStartupTasks()
#48 0x7fa8a40feb80 content::BrowserMainRunnerImpl::Initialize()
#49 0x7fa8a40ec7cf content::BrowserMain()
#50 0x7fa8a58f13c6 content::RunNamedProcessTypeMain()
#51 0x7fa8a58f37f2 content::ContentMainRunnerImpl::Run()
#52 0x7fa8a58f0712 content::ContentMain()
#53 0x000004d9828b content::BrowserTestBase::SetUp()
#54 0x00000178d154 InProcessBrowserTest::SetUp()
#55 0x000001770051 SyncTest::SetUp()
#56 0x00000189f0ce testing::internal::HandleSehExceptionsInMethodIfSupported<>()
#57 0x00000188ffb2 testing::internal::HandleExceptionsInMethodIfSupported<>()
#58 0x0000018849c4 testing::Test::Run()
#59 0x0000018851cd testing::TestInfo::Run()
#60 0x00000188576f testing::TestCase::Run()
#61 0x00000188aabc testing::internal::UnitTestImpl::RunAllTests()

 

Comment 1 by s...@chromium.org, Dec 8 2016

Blocking: 650725
Project Member

Comment 2 by bugdroid1@chromium.org, Dec 12 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/24268f797538e6add4338602ff1ebd78affe2648

commit 24268f797538e6add4338602ff1ebd78affe2648
Author: skym <skym@chromium.org>
Date: Mon Dec 12 18:22:04 2016

[Sync] Actively guard against provider being cleared in DeviceInfoSyncBridge.

When a user signs out and Sync starts to shut down, we immediately
clear the LocalDeviceInfoProvider. Many methods in DeviceInfoSyncBridge
depend on being able to access information in the provider, and can
crash at this point. The bridge should be disabled momentarily, but
this will require threads hops UI->Sync->UI. Meanwhile, various other
posted tasks can call into the bridge and hit this race condition.

We were previously guarding against this scenario when the pulse timer
would fire, but we were not guarding against processor's Merge/Apply
calls or ModelTypeStore async initialization. I've added nullptr checks
to all async entry points that mostly no-op when they detect this
scenario.

One slight hiccup is that the bridge currently leaves the pulse timer
running even when disabled. My intention is that this change will be
merged back into M56 so I wanted to keep this CL as small as possible.
When  crbug.com/672600  is addressed I think it will make the most sense
to disable the pulse timer while disabled, since it should always be
re-enabled with an accurate timer when things are working again.

Additionally, some state tracking such as has_provider_initialized_
should probably just be removed at this point. This should be caught in
 crbug.com/659263 

BUG= 672534 

Review-Url: https://codereview.chromium.org/2568543003
Cr-Commit-Position: refs/heads/master@{#437902}

[modify] https://crrev.com/24268f797538e6add4338602ff1ebd78affe2648/components/sync/device_info/device_info_sync_bridge.cc
[modify] https://crrev.com/24268f797538e6add4338602ff1ebd78affe2648/components/sync/device_info/device_info_sync_bridge_unittest.cc

Comment 3 by s...@chromium.org, Dec 12 2016

Status: Fixed (was: Started)

Comment 4 by s...@chromium.org, Dec 20 2016

Labels: Merge-Request-56

Comment 5 by dimu@chromium.org, Dec 20 2016

Labels: -Merge-Request-56 Merge-Approved-56 Hotlist-Merge-Approved
Your change meets the bar and is auto-approved for M56 (branch: 2924)
Project Member

Comment 6 by bugdroid1@chromium.org, Dec 20 2016

Labels: -merge-approved-56 merge-merged-2924
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b3d4540a88c89ed2232defdf55aacb62c35b3088

commit b3d4540a88c89ed2232defdf55aacb62c35b3088
Author: Sky Malice <skym@chromium.org>
Date: Tue Dec 20 23:52:35 2016

[Sync] Actively guard against provider being cleared in DeviceInfoSyncBridge.

When a user signs out and Sync starts to shut down, we immediately
clear the LocalDeviceInfoProvider. Many methods in DeviceInfoSyncBridge
depend on being able to access information in the provider, and can
crash at this point. The bridge should be disabled momentarily, but
this will require threads hops UI->Sync->UI. Meanwhile, various other
posted tasks can call into the bridge and hit this race condition.

We were previously guarding against this scenario when the pulse timer
would fire, but we were not guarding against processor's Merge/Apply
calls or ModelTypeStore async initialization. I've added nullptr checks
to all async entry points that mostly no-op when they detect this
scenario.

One slight hiccup is that the bridge currently leaves the pulse timer
running even when disabled. My intention is that this change will be
merged back into M56 so I wanted to keep this CL as small as possible.
When  crbug.com/672600  is addressed I think it will make the most sense
to disable the pulse timer while disabled, since it should always be
re-enabled with an accurate timer when things are working again.

Additionally, some state tracking such as has_provider_initialized_
should probably just be removed at this point. This should be caught in
 crbug.com/659263 

BUG= 672534 

Review-Url: https://codereview.chromium.org/2568543003
Cr-Commit-Position: refs/heads/master@{#437902}
(cherry picked from commit 24268f797538e6add4338602ff1ebd78affe2648)

Review-Url: https://codereview.chromium.org/2594863003 .
Cr-Commit-Position: refs/branch-heads/2924@{#568}
Cr-Branched-From: 3a87aecc31cd1ffe751dd72c04e5a96a1fc8108a-refs/heads/master@{#433059}

[modify] https://crrev.com/b3d4540a88c89ed2232defdf55aacb62c35b3088/components/sync/device_info/device_info_sync_bridge.cc
[modify] https://crrev.com/b3d4540a88c89ed2232defdf55aacb62c35b3088/components/sync/device_info/device_info_sync_bridge_unittest.cc

Comment 7 by s...@chromium.org, Dec 21 2016

The merge into M56 was reverted, see https://codereview.chromium.org/2587343005/

Sign in to add a comment