Windows ASan crashing all over the place due to StringImpl ref count issues |
||||||||||||||
Issue descriptionAfter bug 635715 , it appears that we are getting a large volume of crash reports on CF that weren't found on other platforms/configurations, even though the majority of these look like they shouldn't be specific to Windows. Some examples that I filed: bug 636457 , bug 636456 , bug 636453 . I can reproduce these with builds pulled from https://commondatastorage.googleapis.com/chromium-browser-asan/index.html?prefix=win32-release. However, none of these reproduce on Linux ASan builds (32 or 64-bit) or a locally built Windows Chrome without any sanitizers (with full pageheap turned on). Reid, could you please take a look into these to see if anything is going wrong with the ASan instrumentation on Windows?
,
Aug 10 2016
This is really bad, it is good we only filed a few reports as otherwise this would creating even more developer confusion due to false reports. e.g. bad reports https://cluster-fuzz.appspot.com/testcase?key=4930301359030272 https://cluster-fuzz.appspot.com/testcase?key=6093690592559104 https://cluster-fuzz.appspot.com/testcase?key=4556624876535808 https://cluster-fuzz.appspot.com/testcase?key=5618336430030848
,
Aug 10 2016
I can reliably repro locally with my local asan build, so this is not a goma thing.
,
Aug 10 2016
It could be related to the roll, since we didn't see these crashes before. but yes, any of these should reliably reproduce, and i also tried with blank asan_options (unlike CF), i still see same bad crash.
,
Aug 11 2016
I spent a while debugging this, and if you disable optimizations on SelectorChecker::checkPseudoElement with '#pragma clang optimize off', the report makes a little more sense:
#0 0x1ad8f194 in WTF::AtomicString::isEmpty C:\src\chromium\src\third_party\WebKit\Source\wtf\text\WTFString.h:107
#1 0x1ad89c28 in blink::SelectorChecker::checkPseudoClass C:\src\chromium\src\third_party\WebKit\Source\core\css\SelectorChecker.cpp:657
Now we see that we really are accessing a string that asan thinks has been freed. It is accessing offset 4 of a StringImpl, to pull out m_length, which is where we UAF.
I don't know why the debug location is off.
As for why it doesn't reproduce on other platforms... I don't know. It could either be a compiler bug in clang, or an environmental thing that we share with MSVC, like right-to-left argument evaluation order.
,
Aug 11 2016
Yeah, more and more I'm becoming convinced that this is a refcount getting dropped on the floor, maybe because of some compiler or platform-specific handle passing bug. These bytes make for a very very plausible freed StringImpl: 3:029> db 0x056228a0 056228a0 a7 00 80 58 54 00 00 00-00 00 00 ba 0a 41 41 41 ...XT........AAA 056228b0 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 056228c0 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 056228d0 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 056228e0 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 056228f0 41 41 41 41 41 41 41 41-41 41 41 41 41 20 63 0a AAAAAAAAAAAAA c. There's 12 bytes of StringImpl, with a screwy refcount, and 84 bytes of plausible ASCII data.
,
Aug 11 2016
These kind of bugs are showing up with one particular fuzzer bj_broddelwerk which is known to create recursive trees in editing. Could this be an indication of oom somehow which ASAN is incorrectly processing. Based on even more ton of reports today, it is hard to believe this as legit.
,
Aug 11 2016
free stack is common in many reports, could be legit too.
freed by thread T0 here:
#0 0xe224ea8 in free e:\b\build\slave\win_upload_clang\build\src\third_party\llvm\projects\compiler-rt\lib\asan\asan_malloc_win.cc:44
#1 0x793471f in WTF::StringImpl::destroyIfNotStatic third_party/WebKit/Source/wtf/text/StringImpl.cpp:286
#2 0xac7d6fb in blink::BreakingContext::handleText third_party/WebKit/Source/core/layout/line/BreakingContextInlineHeaders.h:915
#3 0xac6fa4f in blink::LineBreaker::nextLineBreak third_party/WebKit/Source/core/layout/line/LineBreaker.cpp:86
+cc some editing folks as well.
,
Aug 11 2016
,
Aug 11 2016
I think it's a legit UAF, and there are one of two possibilities: 1. This is a real, Windows-specific UAF that affects Chrome's official build with MSVC. This is what we set out to find in the first place. :) 2. This UAF is only present when using clang as the compiler, either because the code is relying on some undefined or implementation defined behavior, or because there is a bug in clang. I can easily imagine that the refcounting smart pointers used by webkit have some hidden gotchas.
,
Aug 11 2016
,
Aug 11 2016
,
Aug 11 2016
Issue 635047 has been merged into this issue.
,
Aug 11 2016
I've tried a lot of different things here. I've manually instrumented chrome so I can track the bad allocation refcount operations, I have a log of the refcount operations, and I'm diffing that log against MSVC and looking at the points in the log where things start to differ. It's slow going, though.
,
Aug 12 2016
At first I thought this might be the threaded HTML parser since I see that in a bunch of the stacks, but aarya@ has some that make no sense: https://cluster-fuzz.appspot.com/testcase?key=6025319368884224 which makes me think maybe this is a compiler/optimizer bug?
,
Aug 12 2016
I'm not sure it's the compiler's fault, I think it might be some subtle corner case in copy/move semantics. Clang is picking a different constructor overload to do something somewhere either in LazyLineBreakIterator or some password form autofill code (sorry, I have it in the debugger at work). It has something to do with WebPrivatePtr in WebString. BTW that code is a heinous ODR violation, I wouldn't be surprised if that's the issue.
,
Aug 12 2016
My evidence for these claims is that I have a log of all the refcount operations on the WTF::StringImpl that is used after free, and the refcount logs diverge between MSVC and Clang somewhere in the code that I mentioned. I'll have to dig more tomorrow.
,
Aug 12 2016
,
Aug 12 2016
I'm pretty sure the bug is inside blink::BreakingContext::handleText. In an asan build, we come out of that function with one less refcount than we do with MSVC.
,
Aug 12 2016
Now I'm pretty sure this is a clang bug. My bet is it's a bad interaction between ASan alloca instrumentation and Windows pass-by-value conventions.
,
Aug 12 2016
This looks like a good reduction:
struct Foo {
Foo();
Foo(const Foo &o);
~Foo();
int a;
};
__forceinline void h(Foo o) {}
__forceinline void g() { h(Foo()); }
void f() { g(); }
Clang miscompiles it. This comes up in BreakingContext::handleText because it calls lots of functions marked ALWAYS_INLINE, and one of them happens to contain an inalloca call (lastBreakablePositionForBreakAll).
Removing ALWAYS_INLINE makes the problem go away.
,
Aug 12 2016
Bug in the inliner, fixed in LLVM r278571.
,
Aug 13 2016
Thanks a lot Reid. Lets roll forward clang if possible early next week.
,
Aug 15 2016
,
Aug 18 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/29ecb32ba7b5ca183782c5f5d2b7a1ae23309edb commit 29ecb32ba7b5ca183782c5f5d2b7a1ae23309edb Author: hans <hans@chromium.org> Date: Thu Aug 18 04:31:59 2016 Roll clang 277962:278861 BUG= 636558 , 637866 Review-Url: https://codereview.chromium.org/2241413003 Cr-Commit-Position: refs/heads/master@{#412745} [modify] https://crrev.com/29ecb32ba7b5ca183782c5f5d2b7a1ae23309edb/base/threading/thread_unittest.cc [modify] https://crrev.com/29ecb32ba7b5ca183782c5f5d2b7a1ae23309edb/build/config/compiler/BUILD.gn [modify] https://crrev.com/29ecb32ba7b5ca183782c5f5d2b7a1ae23309edb/tools/clang/scripts/update.py
,
Aug 18 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/92c4889905b14456f84a4631eafb168328b5f8a5 commit 92c4889905b14456f84a4631eafb168328b5f8a5 Author: guidou <guidou@chromium.org> Date: Thu Aug 18 10:14:35 2016 Revert of Roll clang 277962:278861 (patchset #5 id:80001 of https://codereview.chromium.org/2241413003/ ) Reason for revert: This is suspect of breaking Mac 10.9 builder. See https://build.chromium.org/p/chromium.mac/builders/Mac10.9%20Tests%20%28dbg%29/builds/28850 Sample logs: Retrying 1 test (retry #1) [0817/224954:ERROR:kill_posix.cc(84)] Unable to terminate process group 8687: No such process [ RUN ] ThreadTest.StartWithOptions_StackSize [2060/2060] ThreadTest.StartWithOptions_StackSize (CRASHED) Retrying 1 test (retry #2) [0817/224955:ERROR:kill_posix.cc(84)] Unable to terminate process group 8688: No such process [ RUN ] ThreadTest.StartWithOptions_StackSize [2061/2061] ThreadTest.StartWithOptions_StackSize (CRASHED) Retrying 1 test (retry #3) [0817/224955:ERROR:kill_posix.cc(84)] Unable to terminate process group 8689: No such process [ RUN ] ThreadTest.StartWithOptions_StackSize [2062/2062] ThreadTest.StartWithOptions_StackSize (CRASHED) 1 test crashed: ThreadTest.StartWithOptions_StackSize (../../base/threading/thread_unittest.cc:135) Tests took 21 seconds. Additional test environment: CHROME_DEVEL_SANDBOX=/opt/chromium/chrome_sandbox LANG=en_US.UTF-8 Command: ./base_unittests --brave-new-test-launcher --test-launcher-bot-mode --test-launcher-summary-output=/b/swarm_slave/w/ionk6XHB/output.json Original issue's description: > Roll clang 277962:278861 > > BUG= 636558 , 637866 > > Committed: https://crrev.com/29ecb32ba7b5ca183782c5f5d2b7a1ae23309edb > Cr-Commit-Position: refs/heads/master@{#412745} TBR=rnk@chromium.org,dcheng@chromium.org,dpranke@chromium.org,inferno@chromium.org,hans@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG= 636558 , 637866 Review-Url: https://codereview.chromium.org/2249163007 Cr-Commit-Position: refs/heads/master@{#412785} [modify] https://crrev.com/92c4889905b14456f84a4631eafb168328b5f8a5/base/threading/thread_unittest.cc [modify] https://crrev.com/92c4889905b14456f84a4631eafb168328b5f8a5/build/config/compiler/BUILD.gn [modify] https://crrev.com/92c4889905b14456f84a4631eafb168328b5f8a5/tools/clang/scripts/update.py
,
Aug 18 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/df7510f96322254211068a52c2b459c0fe0b2090 commit df7510f96322254211068a52c2b459c0fe0b2090 Author: hans <hans@chromium.org> Date: Thu Aug 18 21:49:41 2016 Roll clang 277962:278861 BUG= 636558 , 637866 Review-Url: https://codereview.chromium.org/2241413003 Cr-Commit-Position: refs/heads/master@{#412943} [modify] https://crrev.com/df7510f96322254211068a52c2b459c0fe0b2090/base/threading/thread_unittest.cc [modify] https://crrev.com/df7510f96322254211068a52c2b459c0fe0b2090/build/config/compiler/BUILD.gn [modify] https://crrev.com/df7510f96322254211068a52c2b459c0fe0b2090/tools/clang/scripts/update.py
,
Aug 19 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b557e12b23a45aaff07428fc164934020e10c4ff commit b557e12b23a45aaff07428fc164934020e10c4ff Author: hans <hans@chromium.org> Date: Fri Aug 19 01:24:02 2016 Revert of Roll clang 277962:278861 (patchset #6 id:100001 of https://codereview.chromium.org/2241413003/ ) Reason for revert: This broke the build on a certain Mac builder: https://build.chromium.org/p/chromium/builders/Mac/builds/18669 It looks like the build is timing out maybe? All other Mac builders seem happy though, which makes this confusing, but the buildbot logs very clearly implicate the roll: the same kind of error happened last time it landed, and it went away on the previous revert. Original issue's description: > Roll clang 277962:278861 > > BUG= 636558 , 637866 > > Committed: https://crrev.com/df7510f96322254211068a52c2b459c0fe0b2090 > Cr-Commit-Position: refs/heads/master@{#412943} TBR=rnk@chromium.org,dcheng@chromium.org,dpranke@chromium.org,inferno@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG= 636558 , 637866 Review-Url: https://codereview.chromium.org/2257333002 Cr-Commit-Position: refs/heads/master@{#412999} [modify] https://crrev.com/b557e12b23a45aaff07428fc164934020e10c4ff/base/threading/thread_unittest.cc [modify] https://crrev.com/b557e12b23a45aaff07428fc164934020e10c4ff/build/config/compiler/BUILD.gn [modify] https://crrev.com/b557e12b23a45aaff07428fc164934020e10c4ff/tools/clang/scripts/update.py
,
Aug 20 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3829c6ad5aeef24992be742340df28c324da9e61 commit 3829c6ad5aeef24992be742340df28c324da9e61 Author: Hans Wennborg <hans@chromium.org> Date: Sat Aug 20 14:39:20 2016 Roll clang 277962:278861 BUG= 636558 , 637866 R=dcheng@chromium.org, dpranke@chromium.org, inferno@chromium.org, rnk@chromium.org Review URL: https://codereview.chromium.org/2241413003 . Cr-Commit-Position: refs/heads/master@{#413337} [modify] https://crrev.com/3829c6ad5aeef24992be742340df28c324da9e61/base/threading/thread_unittest.cc [modify] https://crrev.com/3829c6ad5aeef24992be742340df28c324da9e61/build/config/compiler/BUILD.gn [modify] https://crrev.com/3829c6ad5aeef24992be742340df28c324da9e61/tools/clang/scripts/update.py [modify] https://crrev.com/3829c6ad5aeef24992be742340df28c324da9e61/tools/ipc_fuzzer/message_lib/message_file_reader.cc [modify] https://crrev.com/3829c6ad5aeef24992be742340df28c324da9e61/tools/ipc_fuzzer/message_lib/message_names.h
,
Aug 21 2016
,
Aug 21 2016
,
Nov 27 2016
This bug has been closed for more than 14 weeks. Removing security view restrictions. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot |
||||||||||||||
►
Sign in to add a comment |
||||||||||||||
Comment 1 by och...@chromium.org
, Aug 10 2016