New issue
Advanced search Search tips

Issue 714789 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 2
Type: Bug



Sign in to add a comment

Data race in WebDataRequestManager::CancelRequest

Project Member Reported by ClusterFuzz, Apr 24 2017

Issue description

Detailed report: https://clusterfuzz.com/testcase?key=6664163662495744

Fuzzer: inferno_layout_test_unmodified
Job Type: linux_tsan_chrome_mp
Platform Id: linux

Crash Type: Data race WRITE 8
Crash Address: 0x7b0c00155290
Crash State:
  WebDataRequestManager::CancelRequest
  WebDatabaseService::CancelRequest
  WebDataServiceBase::CancelRequest
  
Sanitizer: thread (TSAN)

Regressed: https://clusterfuzz.com/revisions?job=linux_tsan_chrome_mp&range=466626:466630

Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=6664163662495744


Additional requirements: Requires Gestures

Issue filed automatically.

See https://dev.chromium.org/Home/chromium-security/bugs/reproducing-clusterfuzz-bugs for more information.
 
Cc: rogerm@chromium.org fdoray@chromium.org a...@chromium.org
Components: Internals>Network
Labels: M-60 Test-Predator-Wrong
Predator and regression range did n't given any suspected CL. could someone  please take a look and help us to find correct owner?.
Thank you.

Comment 2 by eroman@chromium.org, Apr 25 2017

Cc: pkasting@chromium.org pwnall@chromium.org
Components: -Internals>Network UI>Browser>Autofill
Nothing in that range stands out.

Can a components/webdata OWNER take a look?
Owner: rogerm@chromium.org
Status: Assigned (was: Untriaged)
I didn't write any of this code, but the design looks unsafe to me.  WebDataRequests have a |manager_| field that can be read and written on different threads, but the class seems to have no locking.  Various consumers, such as WebDataRequestManager, have locking, but WebDataRequests are not always accessed through those classes, and can be manipulated directly, leading to problems like this one.  All WebDataRequest has is some sort of check of the lock inside its |manager_|.

I think this problem was caused by rogerm in https://codereview.chromium.org/2571923002 , which removed the request lock that was guarding access to this member.  The CL description says that this lock was ineffective, but AFAICT it would have prevented exactly this race.

Comment 4 by rogerm@chromium.org, Apr 25 2017

Status: Started (was: Assigned)

Comment 5 by rogerm@chromium.org, Apr 25 2017

looking

Comment 6 by rogerm@chromium.org, Apr 25 2017

TLDR; I think I was perhaps too "clever" for TSAN here. A "fix" is to just remove the IsActive() calls from the db reader/writer. It's slightly less efficient, but won't raise TSAN's hackles.

The pattern used here is the double-checked lock pattern.

  if (mgr is already null) {
    // checking that a pointer has reverted to NULL is a safe
    // optimization to skip over the locked region. It can be
    // done without holding the lock even if writes to mgr
    // aren't atomic because we'll check it again inside the
    // locked region if it's non-null here.
    return;
  }

  Acquire lock;
  DoStuffWithManager();

TSAN is *correctly* pointing out that there is a race with reading (but not using) the value of the manager pointer outside the lock. I failed to consider that TSAN would pick this up when I wrote that code; but, the "race" is by safe and by design.

The code is semantically equivalent, just less efficient with the unguarded nullptr check removed, which would satisfy TSAN.

Adding a lock to the WebDataRequest would also satisfy TSAN, but adds absolutely no additional safety or semantic value compared to the code as is, at the expense of additional contention.

Suppose a lock was re-added to WebDataRequest and held during the IsActive() so the manager pointer null check were atomic...

In the DB thread:

  // This check still either looks at the request after it is marked
  // as inactive or not. The lock would satisfy TSAN, but doesn't
  // affect the safety of this operation, at the expense of contention.
  if (!request->isActive())
    return;

  // At this point, we don't actually know if the request is active or
  // not, we just know that we missed an early abort opportunity. This
  // is the same with or without a lock in the request.

  // We post to a SingleThreadTaskRunner (it's not immediately clear if
  // this is always the same one). In the task body, we only manipulate
  // the request data while holding the manager's lock, though the handler
  // also has the same double checked lock pattern optimization. 
  request_manager->RequestCompleted(std::move(request), ...);

Comment 7 by eroman@chromium.org, Apr 25 2017

Thanks for working on this rogerm!

On a side-note for the importance of fixing this:

I am not a TSAN expert, but my understanding is that there is effectively no such thing as a "benign data race" in the face of an optimizing C++ compiler.

Check out: https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong

Even given assumptions about the hardware this double-checked locking runs on, you don't actually know what crazy assembly the compiler will generate (it may not be what you think).

For instance, what if the compiler chose to re-use that memory slot for other temporary storage within the critical section (which it is free to do since it assumes code is race-free). If it did that and wrote spurious nullptrs, then this code may incorrectly see those temporary nullptrs and determine that !IsActive(), and subsequently delete the in-progress request.

That is a convoluted example, but the moral of the story is that proving "benign races" are safe is fraught with peril. Compilers are very good at breaking seemingly reasonable code :)

Comment 8 by rogerm@chromium.org, Apr 25 2017

Fair enough!  :)

I'll put together a CL to get rid of the unguarded nullptr checks.

Looking a bit deeper...

The thread transitions from the DB back to the manager are performed with PostTask to ThreadTaskRunnerHandle::Get(). It's not clear to me whether or not the code that's protected by the manager's lock might actually always be running on the same thread (i.e., is this code already thread and sequence affine?).

Comment 9 by rogerm@chromium.org, Apr 26 2017

I've re-added the lock to WebDataRequest.

I couldn't find any other way to satisfy TSAN short of removing the nullptr checks from the DB thread context, which would mean losing the ability to skip doing some work if a request is cancelled before we've started working on it.

... and I'm not prepared to make the case that the unguarded check is safe under any optimization a current or future compiler could apply.   :)

https://codereview.chromium.org/2845753002/
Side note: you could use atomic ops in place of a lock if you are concerned about the extra work the locking does.

i.e. check out base/synchronization/atomic_flag.h
Project Member

Comment 11 by bugdroid1@chromium.org, May 6 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/11cfe2438623a20f5ba74be863b6719b7525be18

commit 11cfe2438623a20f5ba74be863b6719b7525be18
Author: rogerm <rogerm@chromium.org>
Date: Sat May 06 00:07:19 2017

Remove data race in WebDataRequest::CancelRequest.

As an optimization, WebDataRequest::IsActive() was making an
(intentionally) unguarded nullptr check on its manager_ pointer
member in order to avoid doing some work if the request was
cancelled (which would set the pointer to nullptr in an
otherwise thread safe manner).

TSAN (correctly) detected and reported this race.

Given the latitude that an optimizing compiler is allowed in
rewriting code, it's not clear that this can be proven to be
a safe optimization... so, used atomic operations to protect
accesses to the manager_ member.

Additional related changes in this CL:

* Make the other members of WebDataRequest constants.
* Remove some thread safety debug checks that were validating
  the assumptions made by the optimization (which was removed).

BUG= 714789 

Review-Url: https://codereview.chromium.org/2845753002
Cr-Commit-Position: refs/heads/master@{#469821}

[modify] https://crrev.com/11cfe2438623a20f5ba74be863b6719b7525be18/components/webdata/common/web_data_request_manager.cc
[modify] https://crrev.com/11cfe2438623a20f5ba74be863b6719b7525be18/components/webdata/common/web_data_request_manager.h

Project Member

Comment 12 by ClusterFuzz, May 7 2017

ClusterFuzz has detected this issue as fixed in range 469797:469824.

Detailed report: https://clusterfuzz.com/testcase?key=6664163662495744

Fuzzer: inferno_layout_test_unmodified
Job Type: linux_tsan_chrome_mp
Platform Id: linux

Crash Type: Data race WRITE 8
Crash Address: 0x7b0c00155290
Crash State:
  WebDataRequestManager::CancelRequest
  WebDatabaseService::CancelRequest
  WebDataServiceBase::CancelRequest
  
Sanitizer: thread (TSAN)

Regressed: https://clusterfuzz.com/revisions?job=linux_tsan_chrome_mp&range=466626:466630
Fixed: https://clusterfuzz.com/revisions?job=linux_tsan_chrome_mp&range=469797:469824

Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=6664163662495744


Additional requirements: Requires Gestures

See https://dev.chromium.org/Home/chromium-security/bugs/reproducing-clusterfuzz-bugs for more information.

If you suspect that the result above is incorrect, try re-doing that job on the test case report page.
Project Member

Comment 13 by ClusterFuzz, May 7 2017

Labels: ClusterFuzz-Verified
Status: Verified (was: Started)
ClusterFuzz testcase 6664163662495744 is verified as fixed, so closing issue.

If this is incorrect, please add ClusterFuzz-Wrong label and re-open the issue.
Project Member

Comment 14 by bugdroid1@chromium.org, May 9 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/53b4d73d43f959faf2710241cdae3fd726602c6f

commit 53b4d73d43f959faf2710241cdae3fd726602c6f
Author: rogerm <rogerm@chromium.org>
Date: Tue May 09 20:31:38 2017

Simplify WebDataRequestManager::RequestCompletedOnThread()

This CL simplifies to WebDataRequestManager::RequestCompletedOnThread()
to call WebDataRequestManager::CancelRequest() to stop tracking the
request, instead of duplicating the contents of CancelRequest.

BUG= 714789 

Review-Url: https://codereview.chromium.org/2866813002
Cr-Commit-Position: refs/heads/master@{#470416}

[modify] https://crrev.com/53b4d73d43f959faf2710241cdae3fd726602c6f/components/webdata/common/web_data_request_manager.cc

Sign in to add a comment