New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 901287 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Nov 9
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug
Team-Security-UX

Blocking:
issue 900022



Sign in to add a comment

Flakiness in unit_test caused by SiteEngagementService::GetAllDetails

Project Member Reported by sebmarchand@chromium.org, Nov 2

Issue description

While looking at the logs of recent runs of unit_tests on the win7_chromium_rel_ng trybot (e.g. : https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8930979181890839536/+/steps/unit_tests__with_patch_/0/stdout ) I've realized that a lot (~45) of tests are crashing with the following stack trace:

  Received fatal exception EXCEPTION_ACCESS_VIOLATION

  Backtrace:

	HostContentSettingsMap::AddSettingsForOneType [0x0454C96F+51]
	HostContentSettingsMap::GetSettingsForOneType [0x0454C910+254]
	std::vector<mojom::SiteEngagementDetails,std::allocator<mojom::SiteEngagementDetails> >::reserve [0x03F3E247+97]
	SiteEngagementService::GetAllDetails [0x03F3CCB1+155]
	base::internal::Invoker<base::internal::BindState<std::vector<mojom::SiteEngagementDetails,std::allocator<mojom::SiteEngagementDetails> > (__cdecl*)(base::Clock *,HostContentSettingsMap *),base::internal::UnretainedWrapper<base::Clock>,base::internal::Unr [0x021803A4+20]
	base::RepeatingCallback<std::vector<std::unique_ptr<content::URLLoaderThrottle,std::default_delete<content::URLLoaderThrottle> >,std::allocator<std::unique_ptr<content::URLLoaderThrottle,std::default_delete<content::URLLoaderThrottle> > > > __cdecl(void)> [0x0170B8DF+47]
	base::internal::ReturnAsParamAdapter<std::vector<mojom::SiteEngagementDetails,std::allocator<mojom::SiteEngagementDetails> > > [0x03F3E670+32]
	base::internal::FunctorTraits<void (__cdecl*)(scoped_refptr<device::UsbDeviceHandle>,int),void>::Invoke<void (__cdecl*)(scoped_refptr<device::UsbDeviceHandle>,int),scoped_refptr<device::UsbDeviceHandle>,int> [0x0081A251+49]
	base::internal::Invoker<base::internal::BindState<void (__cdecl*)(scoped_refptr<device::UsbDeviceHandle>,int),scoped_refptr<device::UsbDeviceHandle>,int>,void __cdecl(void)>::RunOnce [0x0081A397+23]
	base::internal::PostTaskAndReplyImpl::PostTaskAndReply [0x050C9E49+985]
	base::internal::PostTaskAndReplyImpl::PostTaskAndReply [0x050CA201+1937]
	base::debug::TaskAnnotator::RunTask [0x050B74BA+298]
	base::internal::TaskTracker::RunOrSkipTask [0x04E16751+689]
	base::test::ScopedTaskEnvironment::TestTaskTracker::RunOrSkipTask [0x03B59BFF+175]
	base::internal::TaskTracker::RunAndPopNextTask [0x04E15B53+467]
	base::internal::SchedulerWorker::RunWorker [0x06A8A395+517]
	base::internal::SchedulerWorker::RunBackgroundPooledWorker [0x06A89F2C+28]
	base::PlatformThread::GetCurrentThreadPriority [0x03E41A05+581]
	BaseThreadInitThunk [0x7657338A+18]
	RtlInitializeExceptionChain [0x77169902+99]
	RtlInitializeExceptionChain [0x771698D5+54]



These tests then get retried at the end of the run and they pass on the second run. The second run is usually done in isolated process, which might mean that these tests are initially failing because some global variable gets leaked and reused in separate tests.

This type of issue slow down the CQ and makes it harder to run large batch of tests locally (as there's no automatic retry when running with "--gtest_filter").

Marking this as Win only as I've only observed this on the Win7 bot so far.
 
Owner: dominickn@google.com
Punting to dominickn@ who I believe worked on this last.
Components: Internals>Permissions>SiteEngagement
Owner: dominickn@chromium.org
Cc: raymes@chromium.org
SiteEngagementService::GetAllDetails() itself is only callable on the UI thread, but it calls SiteEngagementService::GetAllDetailsImpl(). That method is independently called in an AfterStartupTask on a background thread. I suspect the global state may be in HostContentSettingsMap or one of its providers, which is being hit concurrently?

This implies to me that HostContentSettingsMap::GetSettingsForOneType is not as thread-safe as its header comment suggests. +raymes for thoughts there.
Cc: msramek@chromium.org engedy@chromium.org
I'm guessing this is something to do with messing with state in HostContentSettingsMap in another test that runs before this one.

Could it be this? https://cs.chromium.org/chromium/src/chrome/browser/content_settings/host_content_settings_map_unittest.cc?q=file:test+RegisterUserModifiableProvider&sq=package:chromium&g=0&l=1719

+msramek and engedy
This seems to be caused by crrev.com/c/1307018 ( Issue 900022 ), which expects that the HostContentSettingsMap (a KeyedService) will surely outlive the posted background task. But this is not true in tests where nothing really holds a strong reference to the service, we play fast and loose with testing profiles, and the worker threads seem to slightly outlive individual test cases.

@Martin, it looks like HostContentSettingsMap is designed to handle the situation when it outlives its UI-thread dependencies (e.g. the PrefService). Would it be a good solution for GetAllDetailsImpl() to simply hold a strong reference to the HostContentSettingsMap?

FWIW, slightly better stack trace courtesy of ASAN:

==2500==ERROR: AddressSanitizer: heap-use-after-free on address 0x11d6774172b0 at pc 0x7ff7f31e8264 bp 0x00341e8fdea0 sp 0x00341e8fdee8
READ of size 8 at 0x11d6774172b0 thread T88
    #0 0x7ff7f31e8263 in HostContentSettingsMap::GetSettingsForOneType src\components\content_settings\core\browser\host_content_settings_map.cc:432
    #1 0x7ff7f23d3b3e in `anonymous namespace'::GetAllDetailsImpl src\chrome\browser\engagement\site_engagement_service.cc:100
    #2 0x7ff7f23db0fd in base::internal::Invoker<base::internal::BindState<std::vector<mojom::SiteEngagementDetails,std::allocator<mojom::SiteEngagementDetails> > (*)(base::Clock *, HostContentSettingsMap *),base::internal::UnretainedWrapper<base::Clock>,base::internal::UnretainedWrapper<HostContentSettingsMap> >,std::vector<mojom::SiteEngagementDetails,std::allocator<mojom::SiteEngagementDetails> > ()>::RunOnce src\base\bind_internal.h:658
    #3 0x7ff7f23dc160 in base::OnceCallback<std::vector<mojom::SiteEngagementDetails,std::allocator<mojom::SiteEngagementDetails> > ()>::Run src\base\callback.h:99
    #4 0x7ff7f23db916 in base::internal::ReturnAsParamAdapter<std::vector<mojom::SiteEngagementDetails,std::allocator<mojom::SiteEngagementDetails> > > src\base\post_task_and_reply_with_result_internal.h:22
    #5 0x7ff7f23dbea4 in base::internal::Invoker<base::internal::BindState<void (*)(base::OnceCallback<std::vector<mojom::SiteEngagementDetails,std::allocator<mojom::SiteEngagementDetails> > ()>, std::unique_ptr<std::vector<mojom::SiteEngagementDetails,std::allocator<mojom::SiteEngagementDetails> >,std::default_delete<std::vector<mojom::SiteEngagementDetails,std::allocator<mojom::SiteEngagementDetails> > > > *),base::OnceCallback<std::vector<mojom::SiteEngagementDetails,std::allocator<mojom::SiteEngagementDetails> > ()>,std::unique_ptr<std::vector<mojom::SiteEngagementDetails,std::allocator<mojom::SiteEngagementDetails> >,std::default_delete<std::vector<mojom::SiteEngagementDetails,std::allocator<mojom::SiteEngagementDetails> > > > *>,void ()>::RunOnce src\base\bind_internal.h:658
    #6 0x7fffce2990bf in base::`anonymous namespace'::PostTaskAndReplyRelay::RunTaskAndPostReply src\base\threading\post_task_and_reply_impl.cc:79
    #7 0x7fffce299a05 in base::internal::Invoker<base::internal::BindState<void (*)(base::(anonymous namespace)::PostTaskAndReplyRelay),base::(anonymous namespace)::PostTaskAndReplyRelay>,void ()>::RunOnce src\base\bind_internal.h:658
    #8 0x7fffcdfa7d13 in base::debug::TaskAnnotator::RunTask src\base\debug\task_annotator.cc:99
    #9 0x7fffce27782c in base::internal::TaskTracker::RunOrSkipTask src\base\task\task_scheduler\task_tracker.cc:647
    #10 0x7ff7f1b92342 in base::test::ScopedTaskEnvironment::TestTaskTracker::RunOrSkipTask src\base\test\scoped_task_environment.cc:360
    #11 0x7fffce27508c in base::internal::TaskTracker::RunAndPopNextTask src\base\task\task_scheduler\task_tracker.cc:502
    #12 0x7fffce24f6c4 in base::internal::SchedulerWorker::RunWorker src\base\task\task_scheduler\scheduler_worker.cc:332
    #13 0x7fffce24e217 in base::internal::SchedulerWorker::RunBackgroundPooledWorker src\base\task\task_scheduler\scheduler_worker.cc:230
    #14 0x7fffce298778 in base::`anonymous namespace'::ThreadFunc src\base\threading\platform_thread_win.cc:100
    #15 0x7fffa8666e88 in _asan_print_accumulated_stats+0x1138 (src\out\Release\clang_rt.asan_dynamic-x86_64.dll+0x180036e88)
    #16 0x7ff80fa41fe3 in BaseThreadInitThunk+0x13 (C:\WINDOWS\System32\KERNEL32.DLL+0x180011fe3)
    #17 0x7ff81200cb30 in RtlUserThreadStart+0x20 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x18006cb30)

0x11d6774172b0 is located 48 bytes inside of 168-byte region [0x11d677417280,0x11d677417328)
freed by thread T0 here:
    #0 0x7fffa86682a0 in operator delete+0xb0 (src\out\Release\clang_rt.asan_dynamic-x86_64.dll+0x1800382a0)
    #1 0x7ff7f31f0ecc in HostContentSettingsMap::`vector deleting destructor'+0x22 (src\out\Release\unit_tests.exe+0x148b80ecc)
    #2 0x7fffdc71dad5 in impl::RefcountedKeyedServiceTraits::Destruct src\components\keyed_service\core\refcounted_keyed_service.cc:16
    #3 0x7fffdc722d22 in std::_Tree<std::_Tmap_traits<base::SupportsUserData *,scoped_refptr<RefcountedKeyedService>,std::less<base::SupportsUserData *>,std::allocator<std::pair<base::SupportsUserData *const,scoped_refptr<RefcountedKeyedService> > >,0> >::erase src\third_party\depot_tools\win_toolchain\vs_files\3bc0ec615cf20ee342f3bc29bc991b5ad66d8d2c\VC\Tools\MSVC\14.14.26428\include\xtree:1425
    #4 0x7fffdc71fb19 in RefcountedKeyedServiceFactory::Disassociate src\components\keyed_service\core\refcounted_keyed_service_factory.cc:108
    #5 0x7fffdc71ff00 in RefcountedKeyedServiceFactory::ContextDestroyed src\components\keyed_service\core\refcounted_keyed_service_factory.cc:120
    #6 0x7fffdc70e486 in DependencyManager::DestroyContextServices src\components\keyed_service\core\dependency_manager.cc:100
    #7 0x7ff7f1b64b02 in TestingProfile::~TestingProfile src\chrome\test\base\testing_profile.cc:510
    #8 0x7ff7ee67a036 in SessionRestoreStatsCollectorTest_DiscardDeferredTabs_Test::~SessionRestoreStatsCollectorTest_DiscardDeferredTabs_Test src\chrome\browser\sessions\session_restore_stats_collector_unittest.cc:662
    #9 0x7ff7eefd3b08 in testing::TestInfo::Run src\third_party\googletest\src\googletest\src\gtest.cc:2703
    #10 0x7ff7eefd4c07 in testing::TestCase::Run src\third_party\googletest\src\googletest\src\gtest.cc:2816
    #11 0x7ff7eefef30a in testing::internal::UnitTestImpl::RunAllTests src\third_party\googletest\src\googletest\src\gtest.cc:5182
    #12 0x7ff7eefee7d0 in testing::UnitTest::Run src\third_party\googletest\src\googletest\src\gtest.cc:4791
    #13 0x7ff7f1ba36e9 in base::TestSuite::Run src\base\test\test_suite.cc:294
    #14 0x7ff7ef872a49 in base::OnceCallback<int ()>::Run src\base\callback.h:99
    #15 0x7ff7f1baa655 in base::`anonymous namespace'::LaunchUnitTestsInternal src\base\test\launcher\unit_test_launcher.cc:225
    #16 0x7ff7f1baa239 in base::LaunchUnitTests src\base\test\launcher\unit_test_launcher.cc:575
    #17 0x7ff7f821c32e in main src\chrome\test\base\run_all_unittests.cc:30
    #18 0x7ff7f823ae2b in __scrt_common_main_seh f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:283
    #19 0x7ff80fa41fe3 in BaseThreadInitThunk+0x13 (C:\WINDOWS\System32\KERNEL32.DLL+0x180011fe3)
    #20 0x7ff81200cb30 in RtlUserThreadStart+0x20 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x18006cb30)

previously allocated by thread T0 here:
    #0 0x7fffa8667fe0 in operator new+0xb0 (src\out\Release\clang_rt.asan_dynamic-x86_64.dll+0x180037fe0)
    #1 0x7ff7f231853c in HostContentSettingsMapFactory::BuildServiceInstanceFor src\chrome\browser\content_settings\host_content_settings_map_factory.cc:83
    #2 0x7fffdc784175 in RefcountedBrowserContextKeyedServiceFactory::BuildServiceInstanceFor src\components\keyed_service\content\refcounted_browser_context_keyed_service_factory.cc:99
    #3 0x7fffdc71ed59 in RefcountedKeyedServiceFactory::GetServiceForContext src\components\keyed_service\core\refcounted_keyed_service_factory.cc:88
    #4 0x7fffdc7840c4 in RefcountedBrowserContextKeyedServiceFactory::GetServiceForBrowserContext src\components\keyed_service\content\refcounted_browser_context_keyed_service_factory.cc:60
    #5 0x7ff7f23182a1 in HostContentSettingsMapFactory::GetForProfile src\chrome\browser\content_settings\host_content_settings_map_factory.cc:59
    #6 0x7ff7f2317f07 in CookieSettingsFactory::BuildServiceInstanceFor src\chrome\browser\content_settings\cookie_settings_factory.cc:67
    #7 0x7fffdc784175 in RefcountedBrowserContextKeyedServiceFactory::BuildServiceInstanceFor src\components\keyed_service\content\refcounted_browser_context_keyed_service_factory.cc:99
    #8 0x7fffdc71ed59 in RefcountedKeyedServiceFactory::GetServiceForContext src\components\keyed_service\core\refcounted_keyed_service_factory.cc:88
    #9 0x7fffdc7840c4 in RefcountedBrowserContextKeyedServiceFactory::GetServiceForBrowserContext src\components\keyed_service\content\refcounted_browser_context_keyed_service_factory.cc:60
    #10 0x7ff7f2317aaf in CookieSettingsFactory::GetForProfile src\chrome\browser\content_settings\cookie_settings_factory.cc:27
    #11 0x7ff7f6a87a45 in OneGoogleBarServiceFactory::BuildServiceInstanceFor src\chrome\browser\search\one_google_bar\one_google_bar_service_factory.cc:59
    #12 0x7fffdc783897 in BrowserContextKeyedServiceFactory::BuildServiceInstanceFor src\components\keyed_service\content\browser_context_keyed_service_factory.cc:105
    #13 0x7fffdc7145e0 in KeyedServiceFactory::GetServiceForContext src\components\keyed_service\core\keyed_service_factory.cc:88
    #14 0x7ff7f555addf in LocalNtpSource::LocalNtpSource src\chrome\browser\search\local_ntp_source.cc:664
    #15 0x7ff7f2a3e48d in InstantService::InstantService src\chrome\browser\search\instant_service.cc:232
    #16 0x7fffdc783897 in BrowserContextKeyedServiceFactory::BuildServiceInstanceFor src\components\keyed_service\content\browser_context_keyed_service_factory.cc:105
    #17 0x7fffdc7145e0 in KeyedServiceFactory::GetServiceForContext src\components\keyed_service\core\keyed_service_factory.cc:88
    #18 0x7ff7f22a378a in ChromeContentBrowserClient::IsSuitableHost src\chrome\browser\chrome_content_browser_client.cc:1654
    #19 0x7fffc6f6fea2 in content::RenderProcessHostImpl::IsSuitableHost src\content\browser\renderer_host\render_process_host_impl.cc:3758
    #20 0x7fffc732fd7b in content::SiteInstanceImpl::HasWrongProcessForURL src\content\browser\site_instance_impl.cc:273
    #21 0x7fffc68f2f71 in content::RenderFrameHostManager::DetermineSiteInstanceForURL src\content\browser\frame_host\render_frame_host_manager.cc:1239
    #22 0x7fffc68f1be4 in content::RenderFrameHostManager::GetSiteInstanceForNavigation src\content\browser\frame_host\render_frame_host_manager.cc:1045
    #23 0x7fffc68ed6b6 in content::RenderFrameHostManager::GetSiteInstanceForNavigationRequest src\content\browser\frame_host\render_frame_host_manager.cc:1988
    #24 0x7fffc68ebfe8 in content::RenderFrameHostManager::GetFrameHostForNavigation src\content\browser\frame_host\render_frame_host_manager.cc:500
    #25 0x7fffc68ebe1c in content::RenderFrameHostManager::DidCreateNavigationRequest src\content\browser\frame_host\render_frame_host_manager.cc:479
    #26 0x7fffc679f32f in content::FrameTreeNode::CreatedNavigationRequest src\content\browser\frame_host\frame_tree_node.cc:388
    #27 0x7fffc6821d44 in content::NavigatorImpl::Navigate src\content\browser\frame_host\navigator_impl.cc:357
    #28 0x7fffc67c2095 in content::NavigationControllerImpl::NavigateToExistingPendingEntry src\content\browser\frame_host\navigation_controller_impl.cc:2388
    #29 0x7fffc74233d2 in content::WebContentsImpl::WasShown src\content\browser\web_contents\web_contents_impl.cc:1670
By GetAllDetailsImpl() holding a strong reference to HCSM, do you mean SiteEngagementService, as a KeyedService, depending on HCSM in the keyed services dependency graph?

I think that's reasonable; in fact, likely an oversight that it doesn't today.
That's also probably something we should do, but I actually meant GetAllDetailsImpl being posted without base::Unretained, so that it holds a strong reference to HCSM while it's pending/executing on the background worker thread.
engedy@, thanks very much for the investigation! That detailed trace from Windows really makes it obvious what's happening here.

msramek: SiteEngagementService already depends on HCSM in the keyed services dependency graph.

This problem is because GetAllDetailsImpl is a method in the anonymous namespace; it's run on a background thread so if I understand correctly, we can't have a weak pointer to SiteEngagementService (weak pointers must always be derefed on the same sequence; in this case, the UI thread for site engagement).

I've thought a little and the solutions I can think of are:

1. wrap the background thread call to GetAllDetailsImpl() in an AddRef() and Release() on HCSM. That's pretty horrifying to be honest and I'd like to avoid this if possible.

2. find some way of avoiding running the AfterStartupTask in tests. It's honestly not necessary - the only purpose is for metrics recording. Eliminating the call could make a lot of tests run a little faster. The concern is that we wouldn't have that callpath explicitly tested, but we could add back a test that did exercise it in the way it's called in prod.

For example, what if I move the call to SiteEngagementService::RecordMetrics out of the SiteEngagementService constructor and into somewhere like the SiteEngagementServiceFactory, or the tab_helpers.cc file, or somewhere that would take it out of unit tests all together?
Dominick, I'm worried this is not a test-only problem, and could also cause shutdown crashes in production. Please do double-check this, but skimming through the code it looks like that profiles (and thus keyed services) are destroyed [1] before the worked threads are shut down and joined [2]. 

[1] https://cs.chromium.org/chromium/src/content/browser/browser_main_loop.cc?rcl=18a9ef4599aa9e6cb6fdc27484ac50fcfdda9482&l=1027

[2] https://cs.chromium.org/chromium/src/content/browser/browser_main_loop.cc?rcl=18a9ef4599aa9e6cb6fdc27484ac50fcfdda9482&l=1101
s/worked/worker/
> it's run on a background thread so if I understand correctly, we can't have a weak pointer to SiteEngagementService (weak pointers must always be derefed on the same sequence; in this case, the UI thread for site engagement).

I think engedy meant holding a strong reference (scoped_refptr). HCSM is a RefCountedThreadSafe so it should be fine to hold a strong reference to it on a background thread, as long as that's what makes sense from a lifetime perspective.
Status: Started (was: Untriaged)
Thanks. I'll add the scoped_refptr and remove the on startup call to record metrics (it'll still get recorded on the first engagement event after startup).
Project Member

Comment 13 by bugdroid1@chromium.org, Nov 8

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/edcb399f3af684f9b688783da003c52307e6f9ce

commit edcb399f3af684f9b688783da003c52307e6f9ce
Author: Dominick Ng <dominickn@chromium.org>
Date: Thu Nov 08 04:08:44 2018

Fix use-after-free due to SiteEngagementService background thread activity.

This CL:
 - stops recording site engagement metrics at startup
 - passes a scoped_refptr to HostContentSettingsMap in a background
   thread call to query the content settings.

This addresses a use-after-free due to the HostContentSettingsMap
being freed before the background thread execution completes.

BUG= 901287 

Change-Id: I7a1cd46a4628da490387aa98a7fc383ae5a5adb3
Reviewed-on: https://chromium-review.googlesource.com/c/1322339
Reviewed-by: calamity <calamity@chromium.org>
Commit-Queue: Dominick Ng <dominickn@chromium.org>
Cr-Commit-Position: refs/heads/master@{#606332}
[modify] https://crrev.com/edcb399f3af684f9b688783da003c52307e6f9ce/chrome/browser/engagement/site_engagement_service.cc

Plus the following revision should also refer to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f70115b4df17a675652ef65521ba6ddf81f59037

commit f70115b4df17a675652ef65521ba6ddf81f59037
Author: Balazs Engedy <engedy@chromium.org>
Date: Fri Nov 09 10:12:19 2018

Prevent use-after-free in SiteEngagementService.

The profile and its KeyedServices are normally destroyed before the
TaskScheduler shuts down background threads, so the background task to
record site engagement metrics needs to avoid using any members of
SiteEngagementService

Bug:  900022 
Change-Id: Ibdebbd5a64d59fad29b7715be3557eeed411d741
Reviewed-on: https://chromium-review.googlesource.com/c/1326441
Reviewed-by: Dominick Ng <dominickn@chromium.org>
Commit-Queue: Balazs Engedy <engedy@chromium.org>
Cr-Commit-Position: refs/heads/master@{#606781}
[modify] https://crrev.com/f70115b4df17a675652ef65521ba6ddf81f59037/chrome/browser/engagement/site_engagement_service.cc
Blocking: 900022
Status: Fixed (was: Started)
Need to keep an eye on crash reports from Dev channel, but things stay quiet, no merges will be needed: the issue was introduced in 72.0.3597.0 by crrev.com/c/1307018, and both fixes made it to M72.



Sign in to add a comment