Data race in WebDataRequestManager::CancelRequest |
|||||
Issue descriptionDetailed report: https://clusterfuzz.com/testcase?key=6664163662495744 Fuzzer: inferno_layout_test_unmodified Job Type: linux_tsan_chrome_mp Platform Id: linux Crash Type: Data race WRITE 8 Crash Address: 0x7b0c00155290 Crash State: WebDataRequestManager::CancelRequest WebDatabaseService::CancelRequest WebDataServiceBase::CancelRequest Sanitizer: thread (TSAN) Regressed: https://clusterfuzz.com/revisions?job=linux_tsan_chrome_mp&range=466626:466630 Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=6664163662495744 Additional requirements: Requires Gestures Issue filed automatically. See https://dev.chromium.org/Home/chromium-security/bugs/reproducing-clusterfuzz-bugs for more information.
,
Apr 25 2017
Nothing in that range stands out. Can a components/webdata OWNER take a look?
,
Apr 25 2017
I didn't write any of this code, but the design looks unsafe to me. WebDataRequests have a |manager_| field that can be read and written on different threads, but the class seems to have no locking. Various consumers, such as WebDataRequestManager, have locking, but WebDataRequests are not always accessed through those classes, and can be manipulated directly, leading to problems like this one. All WebDataRequest has is some sort of check of the lock inside its |manager_|. I think this problem was caused by rogerm in https://codereview.chromium.org/2571923002 , which removed the request lock that was guarding access to this member. The CL description says that this lock was ineffective, but AFAICT it would have prevented exactly this race.
,
Apr 25 2017
,
Apr 25 2017
looking
,
Apr 25 2017
TLDR; I think I was perhaps too "clever" for TSAN here. A "fix" is to just remove the IsActive() calls from the db reader/writer. It's slightly less efficient, but won't raise TSAN's hackles.
The pattern used here is the double-checked lock pattern.
if (mgr is already null) {
// checking that a pointer has reverted to NULL is a safe
// optimization to skip over the locked region. It can be
// done without holding the lock even if writes to mgr
// aren't atomic because we'll check it again inside the
// locked region if it's non-null here.
return;
}
Acquire lock;
DoStuffWithManager();
TSAN is *correctly* pointing out that there is a race with reading (but not using) the value of the manager pointer outside the lock. I failed to consider that TSAN would pick this up when I wrote that code; but, the "race" is by safe and by design.
The code is semantically equivalent, just less efficient with the unguarded nullptr check removed, which would satisfy TSAN.
Adding a lock to the WebDataRequest would also satisfy TSAN, but adds absolutely no additional safety or semantic value compared to the code as is, at the expense of additional contention.
Suppose a lock was re-added to WebDataRequest and held during the IsActive() so the manager pointer null check were atomic...
In the DB thread:
// This check still either looks at the request after it is marked
// as inactive or not. The lock would satisfy TSAN, but doesn't
// affect the safety of this operation, at the expense of contention.
if (!request->isActive())
return;
// At this point, we don't actually know if the request is active or
// not, we just know that we missed an early abort opportunity. This
// is the same with or without a lock in the request.
// We post to a SingleThreadTaskRunner (it's not immediately clear if
// this is always the same one). In the task body, we only manipulate
// the request data while holding the manager's lock, though the handler
// also has the same double checked lock pattern optimization.
request_manager->RequestCompleted(std::move(request), ...);
,
Apr 25 2017
Thanks for working on this rogerm! On a side-note for the importance of fixing this: I am not a TSAN expert, but my understanding is that there is effectively no such thing as a "benign data race" in the face of an optimizing C++ compiler. Check out: https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong Even given assumptions about the hardware this double-checked locking runs on, you don't actually know what crazy assembly the compiler will generate (it may not be what you think). For instance, what if the compiler chose to re-use that memory slot for other temporary storage within the critical section (which it is free to do since it assumes code is race-free). If it did that and wrote spurious nullptrs, then this code may incorrectly see those temporary nullptrs and determine that !IsActive(), and subsequently delete the in-progress request. That is a convoluted example, but the moral of the story is that proving "benign races" are safe is fraught with peril. Compilers are very good at breaking seemingly reasonable code :)
,
Apr 25 2017
Fair enough! :) I'll put together a CL to get rid of the unguarded nullptr checks. Looking a bit deeper... The thread transitions from the DB back to the manager are performed with PostTask to ThreadTaskRunnerHandle::Get(). It's not clear to me whether or not the code that's protected by the manager's lock might actually always be running on the same thread (i.e., is this code already thread and sequence affine?).
,
Apr 26 2017
I've re-added the lock to WebDataRequest. I couldn't find any other way to satisfy TSAN short of removing the nullptr checks from the DB thread context, which would mean losing the ability to skip doing some work if a request is cancelled before we've started working on it. ... and I'm not prepared to make the case that the unguarded check is safe under any optimization a current or future compiler could apply. :) https://codereview.chromium.org/2845753002/
,
Apr 26 2017
Side note: you could use atomic ops in place of a lock if you are concerned about the extra work the locking does. i.e. check out base/synchronization/atomic_flag.h
,
May 6 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/11cfe2438623a20f5ba74be863b6719b7525be18 commit 11cfe2438623a20f5ba74be863b6719b7525be18 Author: rogerm <rogerm@chromium.org> Date: Sat May 06 00:07:19 2017 Remove data race in WebDataRequest::CancelRequest. As an optimization, WebDataRequest::IsActive() was making an (intentionally) unguarded nullptr check on its manager_ pointer member in order to avoid doing some work if the request was cancelled (which would set the pointer to nullptr in an otherwise thread safe manner). TSAN (correctly) detected and reported this race. Given the latitude that an optimizing compiler is allowed in rewriting code, it's not clear that this can be proven to be a safe optimization... so, used atomic operations to protect accesses to the manager_ member. Additional related changes in this CL: * Make the other members of WebDataRequest constants. * Remove some thread safety debug checks that were validating the assumptions made by the optimization (which was removed). BUG= 714789 Review-Url: https://codereview.chromium.org/2845753002 Cr-Commit-Position: refs/heads/master@{#469821} [modify] https://crrev.com/11cfe2438623a20f5ba74be863b6719b7525be18/components/webdata/common/web_data_request_manager.cc [modify] https://crrev.com/11cfe2438623a20f5ba74be863b6719b7525be18/components/webdata/common/web_data_request_manager.h
,
May 7 2017
ClusterFuzz has detected this issue as fixed in range 469797:469824. Detailed report: https://clusterfuzz.com/testcase?key=6664163662495744 Fuzzer: inferno_layout_test_unmodified Job Type: linux_tsan_chrome_mp Platform Id: linux Crash Type: Data race WRITE 8 Crash Address: 0x7b0c00155290 Crash State: WebDataRequestManager::CancelRequest WebDatabaseService::CancelRequest WebDataServiceBase::CancelRequest Sanitizer: thread (TSAN) Regressed: https://clusterfuzz.com/revisions?job=linux_tsan_chrome_mp&range=466626:466630 Fixed: https://clusterfuzz.com/revisions?job=linux_tsan_chrome_mp&range=469797:469824 Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=6664163662495744 Additional requirements: Requires Gestures See https://dev.chromium.org/Home/chromium-security/bugs/reproducing-clusterfuzz-bugs for more information. If you suspect that the result above is incorrect, try re-doing that job on the test case report page.
,
May 7 2017
ClusterFuzz testcase 6664163662495744 is verified as fixed, so closing issue. If this is incorrect, please add ClusterFuzz-Wrong label and re-open the issue.
,
May 9 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/53b4d73d43f959faf2710241cdae3fd726602c6f commit 53b4d73d43f959faf2710241cdae3fd726602c6f Author: rogerm <rogerm@chromium.org> Date: Tue May 09 20:31:38 2017 Simplify WebDataRequestManager::RequestCompletedOnThread() This CL simplifies to WebDataRequestManager::RequestCompletedOnThread() to call WebDataRequestManager::CancelRequest() to stop tracking the request, instead of duplicating the contents of CancelRequest. BUG= 714789 Review-Url: https://codereview.chromium.org/2866813002 Cr-Commit-Position: refs/heads/master@{#470416} [modify] https://crrev.com/53b4d73d43f959faf2710241cdae3fd726602c6f/components/webdata/common/web_data_request_manager.cc |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by mummare...@chromium.org
, Apr 24 2017Components: Internals>Network
Labels: M-60 Test-Predator-Wrong