Fix KeyedServices creating/getting from wrong threads |
|||||||||||||
Issue descriptionWhile fixing a problem with an abscence of debug assertion on KeyedService construction (https://bugs.chromium.org/p/chromium/issues/detail?id=701321), I've found that DCHECK(CalledOnValidThread()) is not called where it should be. After fixing that, few bugs appeared in NaCl and Sync service logic. These bugs need to be fixed after 701321 will be completed and DCHECK(CalledOnValidThread()) should be restored. NaCl: [22036:14544:0314/060338.808:FATAL:keyed_service_base_factory.cc(87)] Check failed: CalledOnValidThread(). Backtrace: base::debug::StackTrace::StackTrace [0x70473E40+32] base::debug::StackTrace::StackTrace [0x704732AD+13] logging::LogMessage::~LogMessage [0x7049DB11+49] KeyedServiceBaseFactory::AssertContextWasntDestroyed [0x6A4A62CC+90] BrowserContextKeyedServiceFactory::GetContextToUse [0x69DA3698+14] KeyedServiceFactory::GetServiceForContext [0x6A4A6858+196] extensions::ExtensionSystemFactory::GetForBrowserContext [0x02FB3615+23] extensions::ExtensionSystem::Get [0x01C9D707+27] NaClBrowserDelegateImpl::GetExtensionInfoMap [0x027A07D3+103] NaClBrowserDelegateImpl::MapUrlToLocalFilePath [0x027A0740+36] nacl::NaClHostMessageFilter::LaunchNaClContinuationOnIOThread [0x037EE56A+414] Sync service: (1) [11052:14732:0314/060126.726:FATAL:keyed_service_base_factory.cc(87)] Check failed: CalledOnValidThread(). Backtrace: base::debug::StackTrace::StackTrace [0x70963E40+32] base::debug::StackTrace::StackTrace [0x709632AD+13] logging::LogMessage::~LogMessage [0x7098DB11+49] KeyedServiceBaseFactory::AssertContextWasntDestroyed [0x6C3762CC+90] BrowserContextKeyedServiceFactory::GetContextToUse [0x6BDA3698+14] KeyedServiceFactory::GetServiceForContext [0x6C376858+196] extensions::StorageFrontend::Get [0x00DF60D8+28] extensions::settings_sync_util::GetSyncableService [0x01D361A5+193] browser_sync::ChromeSyncClient::GetSyncableServiceForType [0x0148324A+130] syncer::SharedChangeProcessor::Connect [0x02AA0023+311] syncer::SharedChangeProcessor::StartAssociation [0x02A9F80A+206] (2) [10612:11124:0314/060126.867:FATAL:keyed_service_base_factory.cc(87)] Check failed: CalledOnValidThread(). Backtrace: base::debug::StackTrace::StackTrace [0x70963E40+32] base::debug::StackTrace::StackTrace [0x709632AD+13] logging::LogMessage::~LogMessage [0x7098DB11+49] KeyedServiceBaseFactory::AssertContextWasntDestroyed [0x6C3762CC+90] BrowserContextKeyedServiceFactory::GetContextToUse [0x6BDA3698+14] KeyedServiceFactory::GetServiceForContext [0x6C376858+196] HistoryServiceFactory::GetForProfile [0x014A160A+110] browser_sync::ChromeSyncClient::GetSyncableServiceForType [0x014832C1+249] syncer::SharedChangeProcessor::Connect [0x02AA0023+311] syncer::SharedChangeProcessor::StartAssociation [0x02A9F80A+206]
,
Mar 15 2017
I've attached a list with all failing tests that I found in our build environment. Basically it contains browser_tests.*NaCl* and (I guess) almost everything from sync_integration_tests.
,
Mar 15 2017
I'm fine owning the fixing of ChromeSyncClient's access of extensions and history service factories, but the NaCl access is outside the scope of my knowledge. Hopefully someone more knowledge about NaCl can own that.
,
Mar 16 2017
It looks like SharedChangeProcessor has its own lock to facilitate having ::Disconnect() called on it when the ProfileSyncService shuts down (which owns ChromeSyncClient). It seems this makes using ChromeSyncClient/Profile to call BrowserContextKeyedServiceFactory objects safe-ish, in that we know the Profile object hasn't been destroyed yet. However, my understanding (which could be wrong) is that we're clearly violating thread safety agreements the BrowserContextKeyedServiceFactory have, they don't want to take locks to avoid double initializing their services. So we need to fix this. Because ChromeSyncClient seems to be only used to access SyncableServices on the model thread, which support weak pointers, we should be able to grab that instead on the UI thread, and then jump. WebData backed types don't fit this nicely, but I should be able to copy the same approach I used for the USS WebData controller. I'm hesitant to remove the locking/disconnect mechanism entirely from the processor. It's unclear to me if it's still needed at all, but can be done in a follow up task if possible.
,
Mar 24 2017
701321 is commited. I've created a CL with enabled DCHECKs for thread checking. You can find currently failed tests here: https://codereview.chromium.org/2775753002 bradnelson@ could you please take a look at the test failures in browser_tests: https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Ftryserver.chromium.mac%2Fmac_chromium_rel_ng%2F414010%2F%2B%2Frecipes%2Fsteps%2Fbrowser_tests__with_patch_%2F0%2Fstdout There is a problem with accessing extensions::ExtensionSystem from an invalid thread in NaClBrowserDelegateImpl::GetExtensionInfoMap(). It needs to be fixed so we can safely enable currently disabled DCHECKs in base KeyedService factories.
,
Mar 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/31da0bbe6ad57750e8821aec9380a0dcd3ce0f54 commit 31da0bbe6ad57750e8821aec9380a0dcd3ce0f54 Author: skym <skym@chromium.org> Date: Tue Mar 28 16:52:20 2017 [Sync] Adding missing DependsOn to *ProfileSyncServiceFactory. It looks like most of the recently added model types have not been adding their associated KeyedService dependencies to *ProfileSyncServiceFactory. This change updates the dependencies to include everything used by both *ProfileSyncServiceFactory and *ChromeSyncClient, along with some alphabetization for the ios DependOn()s. Couldn't add a dependency to SupervisedUserServiceFactory because it apparently already depends on ProfileSyncServiceFactory, so added TODOs to fix this. BUG=701326 Review-Url: https://codereview.chromium.org/2768923005 Cr-Commit-Position: refs/heads/master@{#460133} [modify] https://crrev.com/31da0bbe6ad57750e8821aec9380a0dcd3ce0f54/chrome/browser/supervised_user/supervised_user_service_factory.cc [modify] https://crrev.com/31da0bbe6ad57750e8821aec9380a0dcd3ce0f54/chrome/browser/sync/chrome_sync_client.cc [modify] https://crrev.com/31da0bbe6ad57750e8821aec9380a0dcd3ce0f54/chrome/browser/sync/profile_sync_service_factory.cc [modify] https://crrev.com/31da0bbe6ad57750e8821aec9380a0dcd3ce0f54/docs/sync/model_api.md [modify] https://crrev.com/31da0bbe6ad57750e8821aec9380a0dcd3ce0f54/ios/chrome/browser/sync/ios_chrome_profile_sync_service_factory.cc
,
Mar 29 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/cd6b31d4b9613087e1a68c620af83b6fe1892bdf commit cd6b31d4b9613087e1a68c620af83b6fe1892bdf Author: skym <skym@chromium.org> Date: Wed Mar 29 06:47:00 2017 [Sync] Stop accessing BrowserContextKeyedServiceFactory on non-UI thread. In the past the AsyncDirectoryTypeController was passing a raw pointer to a ChromeSyncClient to the model thread, where it had GetSyncableServiceForType called on it. This was needed for several model types, which has specific logic around only accessing their SyncableService on their model threads. However, this was also bad, as BrowserContextKeyedServiceFactorys were being used with a Profile object on a non-UI thread, which is against the rules. To fix this, GetSyncableServiceForType returns a callback which should only be run on the model thread, that will return the same WeakPtr<SyncableService> that we were used to. The various non-UI thread model types do slightly different things as each type has different rules and threading guarantees. The most involved model type in this CL was extensions, which did not have sufficient threading constraints to ensure safety. Added weak pointer support to SyncValueStoreCache to fix this. BUG=701326 Review-Url: https://codereview.chromium.org/2769113002 Cr-Commit-Position: refs/heads/master@{#460304} [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/chrome/browser/extensions/api/storage/settings_apitest.cc [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/chrome/browser/extensions/api/storage/settings_sync_util.cc [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/chrome/browser/extensions/api/storage/settings_sync_util.h [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/chrome/browser/extensions/api/storage/sync_value_store_cache.cc [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/chrome/browser/extensions/api/storage/sync_value_store_cache.h [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/chrome/browser/sync/chrome_sync_client.cc [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/chrome/browser/sync/chrome_sync_client.h [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/autofill/core/browser/webdata/autofill_profile_data_type_controller.cc [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/autofill/core/browser/webdata/autofill_profile_data_type_controller.h [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/browser_sync/profile_sync_test_util.cc [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/search_engines/search_engine_data_type_controller_unittest.cc [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/sync/driver/async_directory_type_controller.cc [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/sync/driver/async_directory_type_controller_unittest.cc [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/sync/driver/directory_data_type_controller.cc [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/sync/driver/directory_data_type_controller.h [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/sync/driver/fake_generic_change_processor.cc [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/sync/driver/fake_generic_change_processor.h [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/sync/driver/fake_sync_client.cc [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/sync/driver/fake_sync_client.h [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/sync/driver/generic_change_processor.cc [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/sync/driver/generic_change_processor.h [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/sync/driver/generic_change_processor_factory.cc [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/sync/driver/generic_change_processor_factory.h [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/sync/driver/generic_change_processor_unittest.cc [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/sync/driver/shared_change_processor.cc [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/sync/driver/shared_change_processor.h [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/sync/driver/shared_change_processor_unittest.cc [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/components/sync/driver/sync_client.h [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/ios/chrome/browser/sync/ios_chrome_sync_client.h [modify] https://crrev.com/cd6b31d4b9613087e1a68c620af83b6fe1892bdf/ios/chrome/browser/sync/ios_chrome_sync_client.mm
,
Mar 29 2017
+benwells@ Thanks skym@! I ran CL with enabled DCHECKs and now there are only NaCl problems left. https://codereview.chromium.org/2775753002 https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2775753002/1 benwells@, NaCl is currently accessing extensions::InfoMap from an invalid thread. Example: https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Ftryserver.chromium.mac%2Fmac_chromium_rel_ng%2F417133%2F%2B%2Frecipes%2Fsteps%2Fbrowser_tests__with_patch_%2F0%2Flogs%2FAppBackgroundPageNaClTest.BackgroundKeepaliveActive%2F0 I'm not sure if we should wait for NaCl guys to fix this or maybe it's a bigger issue in the context of the Extensions subsystem and it would be better to fix this much quickly. Could you please help me with contacting right people on this?
,
Mar 29 2017
,
Apr 6 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a56dd49ba64abaf5300e5ccd8874447821e53da1 commit a56dd49ba64abaf5300e5ccd8874447821e53da1 Author: skym <skym@chromium.org> Date: Thu Apr 06 19:20:27 2017 Revert of [Sync] Stop accessing BrowserContextKeyedServiceFactory on non-UI thread. (patchset #8 id:140001 of https://codereview.chromium.org/2769113002/ ) Reason for revert: Caused a memory regression, see crbug.com/708666 , unclear to me why exactly this is. Progress investigating the memory regression seems tricky so far, so reverting to fix more quickly. Original issue's description: > [Sync] Stop accessing BrowserContextKeyedServiceFactory on non-UI thread. > > In the past the AsyncDirectoryTypeController was passing a raw pointer > to a ChromeSyncClient to the model thread, where it had > GetSyncableServiceForType called on it. This was needed for several > model types, which has specific logic around only accessing their > SyncableService on their model threads. However, this was also bad, as > BrowserContextKeyedServiceFactorys were being used with a Profile > object on a non-UI thread, which is against the rules. > > To fix this, GetSyncableServiceForType returns a callback which should > only be run on the model thread, that will return the same > WeakPtr<SyncableService> that we were used to. The various non-UI > thread model types do slightly different things as each type has > different rules and threading guarantees. > > The most involved model type in this CL was extensions, which did not > have sufficient threading constraints to ensure safety. Added weak > pointer support to SyncValueStoreCache to fix this. > > BUG=701326 > > Review-Url: https://codereview.chromium.org/2769113002 > Cr-Commit-Position: refs/heads/master@{#460304} > Committed: https://chromium.googlesource.com/chromium/src/+/cd6b31d4b9613087e1a68c620af83b6fe1892bdf TBR=pavely@chromium.org,rdevlin.cronin@chromium.org,mathp@chromium.org,vasilii@chromium.org # Not skipping CQ checks because original CL landed more than 1 days ago. BUG=701326 Review-Url: https://codereview.chromium.org/2799653006 Cr-Commit-Position: refs/heads/master@{#462578} [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/chrome/browser/extensions/api/storage/settings_apitest.cc [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/chrome/browser/extensions/api/storage/settings_sync_util.cc [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/chrome/browser/extensions/api/storage/settings_sync_util.h [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/chrome/browser/extensions/api/storage/sync_value_store_cache.cc [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/chrome/browser/extensions/api/storage/sync_value_store_cache.h [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/chrome/browser/sync/chrome_sync_client.cc [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/chrome/browser/sync/chrome_sync_client.h [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/autofill/core/browser/webdata/autofill_profile_data_type_controller.cc [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/autofill/core/browser/webdata/autofill_profile_data_type_controller.h [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/browser_sync/profile_sync_test_util.cc [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/search_engines/search_engine_data_type_controller_unittest.cc [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/sync/driver/async_directory_type_controller.cc [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/sync/driver/async_directory_type_controller_unittest.cc [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/sync/driver/directory_data_type_controller.cc [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/sync/driver/directory_data_type_controller.h [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/sync/driver/fake_generic_change_processor.cc [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/sync/driver/fake_generic_change_processor.h [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/sync/driver/fake_sync_client.cc [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/sync/driver/fake_sync_client.h [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/sync/driver/generic_change_processor.cc [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/sync/driver/generic_change_processor.h [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/sync/driver/generic_change_processor_factory.cc [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/sync/driver/generic_change_processor_factory.h [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/sync/driver/generic_change_processor_unittest.cc [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/sync/driver/shared_change_processor.cc [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/sync/driver/shared_change_processor.h [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/sync/driver/shared_change_processor_unittest.cc [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/components/sync/driver/sync_client.h [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/ios/chrome/browser/sync/ios_chrome_sync_client.h [modify] https://crrev.com/a56dd49ba64abaf5300e5ccd8874447821e53da1/ios/chrome/browser/sync/ios_chrome_sync_client.mm
,
Apr 12 2017
Reverted due to memory concerns on mobile, see issue 708666 . Unclear if this commit was actually the cause of the memory regression. Additionally, the tangential switch from EXPLICIT_ACCESS to IMPLICIT_ACCESS when retrieving the HistoryService caused a ChromeOS crash, see issue 709610, created issue 710906 to ultimately fix the oddity, but initially we'll probably be able to avoid this by simply keeping EXPLICIT_ACCESS.
,
Apr 24 2017
,
Apr 24 2017
Any updates?
,
Apr 24 2017
Unfortunately, no. It doesn't look like the memory regression has improved since my revert, which likely means my CL wasn't the culprit. I'd consider re-landing as is if we figured out and fixed the memory regression, see issue 708666 . Otherwise I'd need to spend significant effort here, which I don't currently have the bandwidth to do.
,
Jun 16 2017
Users experienced this crash on the following builds: Android Dev 61.0.3129.3 - 10.74 CPM, 98 reports, 38 clients (signature WriteMinidump) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Jan 17 2018
This really still needs to be done. We're doing bad things with the ChromeSyncClient right now. Might be as simple as re-landing #7 after fixing merge conflicts. May be more difficult. But as of today, the ChromeSyncClient access and passes around pointers in a clearly non-thread-safe way.
,
Jun 8 2018
Hey guys! Any news? I think we need a thread-safety owner here. Any candidates?
,
Jul 4
,
Aug 7
sync-triage ping: any updates?
,
Aug 7
,
Aug 7
FYI, we recently fixed a similar example with HistoryService.
,
Aug 7
Extensions not having migrated to USS yet, there fix mentioned above (https://chromium-review.googlesource.com/1148568) wouldn't be applicable. If anyone is willing the invest time, we could do something similar in AsyncDirectoryTypeController (+SharedChangeProcessor). Realistically, the crbug.com/870624 might be addressed soon, and this problem would go away. Except perhaps the first stacktrace in the original report, which I'm not familiar with.
,
Aug 13
Crash analysis has not encountered any reports for this bug for the past 90 days. We have added the label 'crash-BugNoRepro' Crash analysis will be automatically closing the bug in 10 days. If you do not want Crash analysis to automatically close the bug, please remove the label 'crash-BugNoRepro'. If you have any feedback on this feature, please contact pranavk@
,
Aug 14
Removing 'crash-BugNoRepro' because not everything is fixed here yet.
,
Oct 15
sync-triage ping: any updates?
,
Oct 15
crbug.com/870624 is doing good progress so I think this is our best hope right now.
,
Dec 17
sync-triage ping: Mikel, can we close this now?
,
Dec 17
Not yet: the new codepath is behind a feature toggle. Let me add the launchbug are blocking bug. |
|||||||||||||
►
Sign in to add a comment |
|||||||||||||
Comment 1 by s...@chromium.org
, Mar 14 2017