New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 663850 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Nov 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 2
Type: Bug



Sign in to add a comment

TSAN race in MultiThreadedCertVerifierTest

Project Member Reported by thestig@chromium.org, Nov 9 2016

Issue description

Started with https://build.chromium.org/p/chromium.memory.full/builders/Linux%20TSan%20Tests/builds/3390 so probably r430866 ?

WARNING: ThreadSanitizer: data race (pid=17819)
  Read of size 8 at 0x0000051f6ba8 by thread T1:
    #0 base::FeatureList::IsEnabled(base::Feature const&) base/feature_list.cc:137:8 (net_unittests+0x00000329c7ed)
    #1 net::CertVerifyProc::Verify(...) net/cert/cert_verify_proc.cc:492:9 (net_unittests+0x000002ec81d9)
    #2 net::DoVerifyOnWorkerThread(...) net/cert/multi_threaded_cert_verifier.cc:193:25 (net_unittests+0x000002ed8032)

  Previous write of size 8 at 0x0000051f6ba8 by main thread:
    #0 base::FeatureList::ClearInstanceForTesting() base/feature_list.cc:203:14 (net_unittests+0x00000329cc53)
    #1 base::TestSuite::Shutdown() base/test/test_suite.cc:388:5 (net_unittests+0x0000038b190e)
    #2 NetTestSuite::Shutdown() net/test/net_test_suite.cc:37:14 (net_unittests+0x00000327dcea)

  Location is global 'base::(anonymous namespace)::g_instance' of size 8 at 0x0000051f6ba8 (net_unittests+0x0000051f6ba8)

  Thread T1 'WorkerPool/1784' (tid=17843, running) created by main thread at:
    #0 pthread_create <null> (net_unittests+0x0000005cb045)
    #1 base::(anonymous namespace)::CreateThread(...) base/threading/platform_thread_posix.cc:110:13 (net_unittests+0x0000032f7717)
    ...
    #10 base::WorkerPool::PostTaskAndReply(...) base/threading/worker_pool.cc:116:51 (net_unittests+0x000003300b1f)
    #11 net::CertVerifierJob::Start(...) net/cert/multi_threaded_cert_verifier.cc:244:12 (net_unittests+0x000002ed897d)
    #12 net::MultiThreadedCertVerifier::Verify(...) net/cert/multi_threaded_cert_verifier.cc:375:19 (net_unittests+0x000002ed84ad)
    #13 net::MultiThreadedCertVerifierTest_CancelRequestThenQuit_Test::TestBody() net/cert/multi_threaded_cert_verifier_unittest.cc:165:23 (net_unittests+0x00000099fb8e)


 
Looks like the cause is that MultiThreadedCertVerifierTest.CancelRequestThenQuit doesn't join all its workers before ending the test. This should be okay since all the callbacks and such are neutered, but the base::FeatureList global gets reset on test shutdown.
Probably the simplest fix would be to sample the base::FeatureList value when the CertVerifyProc is created? Though that would need confirming that base::FeatureList is set up at that time.
Another possibility would be if base::TestSuite waited for base::WorkerPool threads to end before shutting down, since it does some things on shutdown that normally wouldn't happen in production code, causing base::FeatureList to violate its API promise (thread-safe after initialization).
Is nothing sacred!

Yes, this is a regression from  Issue 653691 , and checking in the CVP constructor is probably fine - I'll do that.
Project Member

Comment 5 by bugdroid1@chromium.org, Nov 10 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9d3cd28a62874bbdc27be9134dd6fe53f6b8cd10

commit 9d3cd28a62874bbdc27be9134dd6fe53f6b8cd10
Author: rsleevi <rsleevi@chromium.org>
Date: Thu Nov 10 03:21:39 2016

Resolve a TSAN race in CertVerifyProc

TSAN gets unhappy because MultiThreadedCertVerifier abandons its
worker threads (because they may be blocked on OS APIs) when
shutting down tests, which interferes with ScopedFeatureList
shutdown code in //net.

This simply moves the check from being on the worker thread
when it's running, to being on the main thread when the
CertVerifyProc is constructed.

BUG= 663850 

Review-Url: https://codereview.chromium.org/2493593003
Cr-Commit-Position: refs/heads/master@{#431164}

[modify] https://crrev.com/9d3cd28a62874bbdc27be9134dd6fe53f6b8cd10/net/cert/cert_verify_proc.cc
[modify] https://crrev.com/9d3cd28a62874bbdc27be9134dd6fe53f6b8cd10/net/cert/cert_verify_proc.h

Cc: glider@chromium.org
+cc next memory sheriff

Hmm, this is now causing three reports on the TSan bots (https://build.chromium.org/p/chromium.memory.full/builders/Linux%20TSan%20Tests/builds/3437)

E.g. report:
[ RUN      ] ReloadCacheControlBrowserTest.BypassingReload
[21701:21701:1109/202614:11450637833:WARNING:audio_manager.cc(317)] Multiple instances of AudioManager detected
[21701:21701:1109/202614:11450638044:WARNING:audio_manager.cc(278)] Multiple instances of AudioManager detected
Xlib:  extension "RANDR" missing on display ":9".
==================
WARNING: ThreadSanitizer: data race (pid=21701)
  Write of size 8 at 0x0000090a5220 by main thread:
    #0 base::FeatureList::ClearInstanceForTesting() base/feature_list.cc:203:14 (content_browsertests+0x0000028a8173)
    #1 base::test::ScopedFeatureList::InitWithFeatureList(std::__1::unique_ptr<base::FeatureList, std::__1::default_delete<base::FeatureList> >) base/test/scoped_feature_list.cc:53:28 (content_browsertests+0x000002472e66)
    #2 base::test::ScopedFeatureList::InitFromCommandLine(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) base/test/scoped_feature_list.cc:62:3 (content_browsertests+0x000002472f83)
    #3 base::test::ScopedFeatureList::InitAndDisableFeature(base::Feature const&) base/test/scoped_feature_list.cc:70:3 (content_browsertests+0x0000024730d2)
    #4 content::(anonymous namespace)::ReloadCacheControlBrowserTest::SetUpOnMainThread() content/browser/loader/reload_cache_control_browsertest.cc:48:26 (content_browsertests+0x00000072b581)
    #5 content::ContentBrowserTest::RunTestOnMainThreadLoop() content/public/test/content_browser_test.cc:135:3 (content_browsertests+0x000002440a16)
    #6 content::BrowserTestBase::ProxyRunTestOnMainThreadLoop() content/public/test/browser_test_base.cc:336:3 (content_browsertests+0x000002446fc5)
    #7 Invoke<content::BrowserTestBase *> base/bind_internal.h:214:12 (content_browsertests+0x000002447b15)
    #8 MakeItSo<void (content::BrowserTestBase::*const &)(), content::BrowserTestBase *> base/bind_internal.h:285 (content_browsertests+0x000002447b15)

<snip>

  Previous read of size 8 at 0x0000090a5220 by thread T12:
    #0 base::FeatureList::IsEnabled(base::Feature const&) base/feature_list.cc:137:8 (content_browsertests+0x0000028a7ccd)
    #1 net::CertVerifyProc::CertVerifyProc() net/cert/cert_verify_proc.cc:394:32 (content_browsertests+0x0000033076d3)
    #2 net::CertVerifyProcNSS::CertVerifyProcNSS() net/cert/cert_verify_proc_nss.cc:783:20 (content_browsertests+0x000003308f5d)
    #3 net::CertVerifyProc::CreateDefault() net/cert/cert_verify_proc.cc:377:14 (content_browsertests+0x000003307673)
    #4 net::CertVerifier::CreateDefault() net/cert/cert_verifier.cc:86:11 (content_browsertests+0x0000032bf12a)
    #5 content::ShellURLRequestContextGetter::GetURLRequestContext() content/shell/browser/shell_url_request_context_getter.cc:142:33 (content_browsertests+0x0000024caca8)
    #6 content::ChromeAppCacheService::InitializeOnIOThread(base::FilePath const&, content::ResourceContext*, net::URLRequestContextGetter*, scoped_refptr<storage::SpecialStoragePolicy>) content/browser/appcache/chrome_appcache_service.cc:44:49 (content_browsertests+0x000001da276c)

 <snip>
Cc: asvitk...@chromium.org
+asvitkine. At this point, this sounds like a problem with how base::FeatureList is set up in tests.
So this is saying that there's basically no synchronization point (e.g. a lock) between IsEnabled() getting checked on the IOThread vs. later SetUpOnMainThread() being called which puts sets up a testing FeatureList instance.

Do we know if the two are from different test runs? Or is it the same test run but SetUpOnMainThread() gets called after we've created the IO thread and possibly ran some code there? Do we just need to set up the test FeatureList instance earlier than SetUpOnMainThread() or is this reporting a case where there's a race between an earlier test and the new test both being run in the same process?
The latter - SetUpOnMainThread is created after the IOThread is created (and other threads are running).

So I agree with David that I think it's just shaking out an existing datarace caused by the test doing the FeatureList after threads are started, and that it should be moved earlier.
Cc: fdoray@chromium.org toyoshim@chromium.org
 Issue 666304  has been merged into this issue.
I think we could just move:

    scoped_feature_list_.InitAndDisableFeature(
        features::kNonValidatingReloadOnNormalReload);

from SetUpOnMain() to ReloadCacheControlBrowserTest() ctor.

I believe that should make it run before IO thread is created - which is what we want (i.e. matches browser behavior where all feature overrides are set up before any other threads are started).

I'm traveling today; rsleevi@, do you want to try the above?
The suggestion in comment 11 works for me locally to fix:

ReloadCacheControlBrowserTest.BypassingReload
ReloadCacheControlBrowserTest.NormalReload
ReloadCacheControlWithAnExperimentBrowserTest.ReloadMainResource

- I needed to run with --no-sandbox, as described on the Chromium TSAN instructions page.
- TSAN fails with some libdbus errors locally on Ubuntu 14.04 first. I needed to comment out the bluez::DBusBluezManagerWrapperLinux calls in content/shell/browser/shell_browser_main_parts.cc to avoid that.
thestig: If you have the changes made already, do you want to just send the CL?
Uh, I guess.
Owner: asvitk...@chromium.org
Actually I think it might not be that simple. Assigning to myself (assuming thestig@ hasn't yet solved it!).
I have not. It's all yours. :)
Actually, maybe it *is* fine for content_browsertests.

For just browser_tests there is a problem - because those run the Chrome startup code which creates a different FeatureList in ChromeBrowserMainParts::SetupFieldTrials() - so overriding state gets lost by that codepath being ran after the test ctor.
Project Member

Comment 18 by bugdroid1@chromium.org, Nov 19 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f1c4940a6107ab9e26fcc63b57077d6fb0d918cf

commit f1c4940a6107ab9e26fcc63b57077d6fb0d918cf
Author: asvitkine <asvitkine@chromium.org>
Date: Sat Nov 19 00:25:13 2016

Fix TSAN errors with reload_cache_control_browsertest.cc.

Initializing the feature list in the ctor makes sure its
initialized before threads get started that query its
value.

BUG= 663850 

Review-Url: https://codereview.chromium.org/2516783002
Cr-Commit-Position: refs/heads/master@{#433347}

[modify] https://crrev.com/f1c4940a6107ab9e26fcc63b57077d6fb0d918cf/content/browser/loader/reload_cache_control_browsertest.cc

Status: Fixed (was: Assigned)

Sign in to add a comment