TSAN race in MultiThreadedCertVerifierTest |
|||||
Issue descriptionStarted with https://build.chromium.org/p/chromium.memory.full/builders/Linux%20TSan%20Tests/builds/3390 so probably r430866 ? WARNING: ThreadSanitizer: data race (pid=17819) Read of size 8 at 0x0000051f6ba8 by thread T1: #0 base::FeatureList::IsEnabled(base::Feature const&) base/feature_list.cc:137:8 (net_unittests+0x00000329c7ed) #1 net::CertVerifyProc::Verify(...) net/cert/cert_verify_proc.cc:492:9 (net_unittests+0x000002ec81d9) #2 net::DoVerifyOnWorkerThread(...) net/cert/multi_threaded_cert_verifier.cc:193:25 (net_unittests+0x000002ed8032) Previous write of size 8 at 0x0000051f6ba8 by main thread: #0 base::FeatureList::ClearInstanceForTesting() base/feature_list.cc:203:14 (net_unittests+0x00000329cc53) #1 base::TestSuite::Shutdown() base/test/test_suite.cc:388:5 (net_unittests+0x0000038b190e) #2 NetTestSuite::Shutdown() net/test/net_test_suite.cc:37:14 (net_unittests+0x00000327dcea) Location is global 'base::(anonymous namespace)::g_instance' of size 8 at 0x0000051f6ba8 (net_unittests+0x0000051f6ba8) Thread T1 'WorkerPool/1784' (tid=17843, running) created by main thread at: #0 pthread_create <null> (net_unittests+0x0000005cb045) #1 base::(anonymous namespace)::CreateThread(...) base/threading/platform_thread_posix.cc:110:13 (net_unittests+0x0000032f7717) ... #10 base::WorkerPool::PostTaskAndReply(...) base/threading/worker_pool.cc:116:51 (net_unittests+0x000003300b1f) #11 net::CertVerifierJob::Start(...) net/cert/multi_threaded_cert_verifier.cc:244:12 (net_unittests+0x000002ed897d) #12 net::MultiThreadedCertVerifier::Verify(...) net/cert/multi_threaded_cert_verifier.cc:375:19 (net_unittests+0x000002ed84ad) #13 net::MultiThreadedCertVerifierTest_CancelRequestThenQuit_Test::TestBody() net/cert/multi_threaded_cert_verifier_unittest.cc:165:23 (net_unittests+0x00000099fb8e)
,
Nov 9 2016
Probably the simplest fix would be to sample the base::FeatureList value when the CertVerifyProc is created? Though that would need confirming that base::FeatureList is set up at that time.
,
Nov 9 2016
Another possibility would be if base::TestSuite waited for base::WorkerPool threads to end before shutting down, since it does some things on shutdown that normally wouldn't happen in production code, causing base::FeatureList to violate its API promise (thread-safe after initialization).
,
Nov 9 2016
Is nothing sacred! Yes, this is a regression from Issue 653691 , and checking in the CVP constructor is probably fine - I'll do that.
,
Nov 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9d3cd28a62874bbdc27be9134dd6fe53f6b8cd10 commit 9d3cd28a62874bbdc27be9134dd6fe53f6b8cd10 Author: rsleevi <rsleevi@chromium.org> Date: Thu Nov 10 03:21:39 2016 Resolve a TSAN race in CertVerifyProc TSAN gets unhappy because MultiThreadedCertVerifier abandons its worker threads (because they may be blocked on OS APIs) when shutting down tests, which interferes with ScopedFeatureList shutdown code in //net. This simply moves the check from being on the worker thread when it's running, to being on the main thread when the CertVerifyProc is constructed. BUG= 663850 Review-Url: https://codereview.chromium.org/2493593003 Cr-Commit-Position: refs/heads/master@{#431164} [modify] https://crrev.com/9d3cd28a62874bbdc27be9134dd6fe53f6b8cd10/net/cert/cert_verify_proc.cc [modify] https://crrev.com/9d3cd28a62874bbdc27be9134dd6fe53f6b8cd10/net/cert/cert_verify_proc.h
,
Nov 10 2016
+cc next memory sheriff Hmm, this is now causing three reports on the TSan bots (https://build.chromium.org/p/chromium.memory.full/builders/Linux%20TSan%20Tests/builds/3437) E.g. report: [ RUN ] ReloadCacheControlBrowserTest.BypassingReload [21701:21701:1109/202614:11450637833:WARNING:audio_manager.cc(317)] Multiple instances of AudioManager detected [21701:21701:1109/202614:11450638044:WARNING:audio_manager.cc(278)] Multiple instances of AudioManager detected Xlib: extension "RANDR" missing on display ":9". ================== WARNING: ThreadSanitizer: data race (pid=21701) Write of size 8 at 0x0000090a5220 by main thread: #0 base::FeatureList::ClearInstanceForTesting() base/feature_list.cc:203:14 (content_browsertests+0x0000028a8173) #1 base::test::ScopedFeatureList::InitWithFeatureList(std::__1::unique_ptr<base::FeatureList, std::__1::default_delete<base::FeatureList> >) base/test/scoped_feature_list.cc:53:28 (content_browsertests+0x000002472e66) #2 base::test::ScopedFeatureList::InitFromCommandLine(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) base/test/scoped_feature_list.cc:62:3 (content_browsertests+0x000002472f83) #3 base::test::ScopedFeatureList::InitAndDisableFeature(base::Feature const&) base/test/scoped_feature_list.cc:70:3 (content_browsertests+0x0000024730d2) #4 content::(anonymous namespace)::ReloadCacheControlBrowserTest::SetUpOnMainThread() content/browser/loader/reload_cache_control_browsertest.cc:48:26 (content_browsertests+0x00000072b581) #5 content::ContentBrowserTest::RunTestOnMainThreadLoop() content/public/test/content_browser_test.cc:135:3 (content_browsertests+0x000002440a16) #6 content::BrowserTestBase::ProxyRunTestOnMainThreadLoop() content/public/test/browser_test_base.cc:336:3 (content_browsertests+0x000002446fc5) #7 Invoke<content::BrowserTestBase *> base/bind_internal.h:214:12 (content_browsertests+0x000002447b15) #8 MakeItSo<void (content::BrowserTestBase::*const &)(), content::BrowserTestBase *> base/bind_internal.h:285 (content_browsertests+0x000002447b15) <snip> Previous read of size 8 at 0x0000090a5220 by thread T12: #0 base::FeatureList::IsEnabled(base::Feature const&) base/feature_list.cc:137:8 (content_browsertests+0x0000028a7ccd) #1 net::CertVerifyProc::CertVerifyProc() net/cert/cert_verify_proc.cc:394:32 (content_browsertests+0x0000033076d3) #2 net::CertVerifyProcNSS::CertVerifyProcNSS() net/cert/cert_verify_proc_nss.cc:783:20 (content_browsertests+0x000003308f5d) #3 net::CertVerifyProc::CreateDefault() net/cert/cert_verify_proc.cc:377:14 (content_browsertests+0x000003307673) #4 net::CertVerifier::CreateDefault() net/cert/cert_verifier.cc:86:11 (content_browsertests+0x0000032bf12a) #5 content::ShellURLRequestContextGetter::GetURLRequestContext() content/shell/browser/shell_url_request_context_getter.cc:142:33 (content_browsertests+0x0000024caca8) #6 content::ChromeAppCacheService::InitializeOnIOThread(base::FilePath const&, content::ResourceContext*, net::URLRequestContextGetter*, scoped_refptr<storage::SpecialStoragePolicy>) content/browser/appcache/chrome_appcache_service.cc:44:49 (content_browsertests+0x000001da276c) <snip>
,
Nov 10 2016
+asvitkine. At this point, this sounds like a problem with how base::FeatureList is set up in tests.
,
Nov 10 2016
So this is saying that there's basically no synchronization point (e.g. a lock) between IsEnabled() getting checked on the IOThread vs. later SetUpOnMainThread() being called which puts sets up a testing FeatureList instance. Do we know if the two are from different test runs? Or is it the same test run but SetUpOnMainThread() gets called after we've created the IO thread and possibly ran some code there? Do we just need to set up the test FeatureList instance earlier than SetUpOnMainThread() or is this reporting a case where there's a race between an earlier test and the new test both being run in the same process?
,
Nov 10 2016
The latter - SetUpOnMainThread is created after the IOThread is created (and other threads are running). So I agree with David that I think it's just shaking out an existing datarace caused by the test doing the FeatureList after threads are started, and that it should be moved earlier.
,
Nov 17 2016
,
Nov 17 2016
I think we could just move:
scoped_feature_list_.InitAndDisableFeature(
features::kNonValidatingReloadOnNormalReload);
from SetUpOnMain() to ReloadCacheControlBrowserTest() ctor.
I believe that should make it run before IO thread is created - which is what we want (i.e. matches browser behavior where all feature overrides are set up before any other threads are started).
I'm traveling today; rsleevi@, do you want to try the above?
,
Nov 18 2016
The suggestion in comment 11 works for me locally to fix: ReloadCacheControlBrowserTest.BypassingReload ReloadCacheControlBrowserTest.NormalReload ReloadCacheControlWithAnExperimentBrowserTest.ReloadMainResource - I needed to run with --no-sandbox, as described on the Chromium TSAN instructions page. - TSAN fails with some libdbus errors locally on Ubuntu 14.04 first. I needed to comment out the bluez::DBusBluezManagerWrapperLinux calls in content/shell/browser/shell_browser_main_parts.cc to avoid that.
,
Nov 18 2016
thestig: If you have the changes made already, do you want to just send the CL?
,
Nov 18 2016
Uh, I guess.
,
Nov 18 2016
Actually I think it might not be that simple. Assigning to myself (assuming thestig@ hasn't yet solved it!).
,
Nov 18 2016
I have not. It's all yours. :)
,
Nov 18 2016
Actually, maybe it *is* fine for content_browsertests. For just browser_tests there is a problem - because those run the Chrome startup code which creates a different FeatureList in ChromeBrowserMainParts::SetupFieldTrials() - so overriding state gets lost by that codepath being ran after the test ctor.
,
Nov 19 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f1c4940a6107ab9e26fcc63b57077d6fb0d918cf commit f1c4940a6107ab9e26fcc63b57077d6fb0d918cf Author: asvitkine <asvitkine@chromium.org> Date: Sat Nov 19 00:25:13 2016 Fix TSAN errors with reload_cache_control_browsertest.cc. Initializing the feature list in the ctor makes sure its initialized before threads get started that query its value. BUG= 663850 Review-Url: https://codereview.chromium.org/2516783002 Cr-Commit-Position: refs/heads/master@{#433347} [modify] https://crrev.com/f1c4940a6107ab9e26fcc63b57077d6fb0d918cf/content/browser/loader/reload_cache_control_browsertest.cc
,
Nov 19 2016
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by davidben@chromium.org
, Nov 9 2016