Issue metadata
Sign in to add a comment
|
conent_shell: Heap-use-after-free in net::NetLog::AddEntry |
||||||||||||||||||||||
Issue descriptionDetailed report: https://clusterfuzz.com/testcase?key=5481419861393408 Fuzzer: mbarbella_js_mutation_layout Job Type: windows_asan_content_shell Platform Id: windows Crash Type: Heap-use-after-free READ 4 Crash Address: 0x1c142b8c Crash State: net::NetLog::AddEntry net::NetLogWithSource::AddEntry net::NetLogWithSource::EndEvent Sanitizer: address (ASAN) Recommended Security Severity: High Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=5481419861393408 Issue filed automatically. See https://github.com/google/clusterfuzz-tools for more information. Note: This crash might not be reproducible with the provided testcase. That said, for the past 14 days we've been seeing this crash frequently. If you are unable to reproduce this, please try a speculative fix based on the crash stacktrace in the report. The fix can be verified by looking at the crash statistics in the report, a day after the fix is deployed. If the fix resolved the issue, please close the bug by marking as Fixed.
,
Aug 14 2017
tzik, you did some work (although months and months ago) with reference counting in entry_impl.cc, so you may have some experience in this area. Unfortunately, there's not a lot to go on in these "unreproducible" (i.e. Unreliably reproducible) cases. Feel free to re-assign as appropriate. Reducing severity as unreliability is a mitigating factor.
,
Aug 14 2017
,
Aug 15 2017
Looks like an NetLog UAF on the browser shutdown phase. //net/disk_cache/blockfile stuff lives on a dedicated InternalCacheThread, and, as no one stop the thread, it seems to outlive the main thread stuff. We should probably replace it with base::TaskScheduler?
,
Aug 15 2017
,
Aug 15 2017
The thread needs TYPE_IO, so I thought scheduler stuff was no go?
,
Aug 15 2017
Anyway, this is likely a regression due to me stopping using the browser cache thread, which of course has an ordered shutdown wrt to the I/O thread. What may also matter here is timing of ~HttpCache, that really should be enough to make that thread idle unless I am missing something.
,
Aug 15 2017
Actually I am not seeing content_shell deleting the HttpCache (or the URLRequestContext) at all?
,
Aug 15 2017
@mmenke: I may need your help w/this, as it's touching way too much code I am seeing for the first time...
,
Aug 16 2017
Let me pass this to morlovich. Hm, the destruction of disk_cache::BackendImpl and its handling in disk_cache::EntryImpl look racy to me even before CACHE thread elimination. BackendImpl has a WeakPtrFactory and it's destroyed on IO thread, and EntryImpl has a WeakPtr to BackendImpl as a back pointer and checks it in the destructor (around entry_impl.cc:935) that runs on the background thread. However, if BackendImpl is destroyed after the back pointer check, that causes a BackendImpl UAF.
,
Aug 16 2017
The HttpCache is owned by ProfileIOData (Well, PRofileIOData owns the URLRequestContext which owns the cache). The problem is presumably that we destroy the NetLog (Owned by Chrome's BrowserProcess, I think?) before the task scheduler. So NetLog outlasts all named threads, but it may not outlast unnamed ones.
,
Aug 16 2017
We may have to make NetLog a Singleton...Something we've discussed before.
,
Aug 16 2017
re: comment #10:
There is a bunch of blocking in ~BackendImpl which ensures the background threads work queue is empty; and after that EntryImpl object will not destruct but rather leak due to:
void EntryImpl::Close() {
if (background_queue_.get())
background_queue_->CloseEntryImpl(this);
}
... which is called on the I/O thread, with background_queue_ a WeakPtr as well. Not sure what the check in ~EntryImpl you pointed out is for, indeed, though the code is a bit strange since it lets tests run with a different thread model, so it might only be useful when there is one thread.
Re: comment #11:
For a bit more context, I added logging output to ~HttpCache, ~URLRequestContext, ~ShellURLRequestContextGetter, and none seem to happen for the content_shell binary, though I can't be certain that's not because of some logging stuff (constructors show up fine, as does destruction with the Chrome binary).
Also not sure how ordering wrt to TaskScheduler destructor matters since it's a plain base::Thread doing things.
,
Aug 16 2017
Hadn't realized this was content shell, and not Chrome. I'm not familiar with content_shell's URLRequestContext ownership model, though skimming over it, looks like the URLRequestContextGetter, which is refcounted, owns everything. With refcounting, you kinda lose all guarantees of sanity.
,
Aug 16 2017
,
Aug 16 2017
Yeah, I tried to trace the ref count via shadowing of AddRef/Release for URLRequestContextGetter, but for some reason couldn't upcall, which is partly why I was desperate for help getting the big picture. Hmm, I should probably do it via specialization or something.
,
Aug 16 2017
Rather than focusing on URLREquestContext teardown, it may be better to focus on NetLog teardown, any why it's being destroyed before block_shutdown tasks. Even if URLRequestContext teardown were fixed, I assume we'd still have a shutdown ordering issue. Or do we block HttpCache teardown on completing all pending cache-related tasks?
,
Aug 16 2017
In case of blockfile cache, yes, ~HttpCache does some blocking with that intent, due to ~BackendImpl doing that. SimpleCache OTOH only ever uses NetLog in I/O thread, though that could have its own issues, I suppose, if NetLog is destroyed before the I/O thread is?
,
Aug 16 2017
NetLog is guaranteed to outlive the IOThread in Chrome (Less sure about content_shell, though I'd guess it's also the case there). It outlives all named threads.
,
Aug 16 2017
It would reduce test coverage, but I suppose we could just use SimpleCache in content_shell.
,
Aug 16 2017
Got refcount on the getter tracing, but for some reason base::debug::StackTrace is being flaky with symbolization... (the addresses themselves seem fine per addr2line..)
,
Aug 16 2017
preliminary: BackgroundFetchDelegateProxy::Core (which has a scoped_refptr<net::URLRequestContextGetter> request_context_) doesn't seem to be deleted even though the parent BackgroundFetchDelegateProxy is, and owns the Core helper object via: std::unique_ptr<Core, BrowserThread::DeleteOnUIThread> ui_core_;
,
Aug 16 2017
Can confirm the above as the root cause, but I am at a loss at how to proceed --- why wouldn't DeleteOnUIThread from IO -> Browser work?
,
Aug 16 2017
Maybe we've already posted the task to shutdown the IO thread at that point?
,
Aug 16 2017
Well, the I/O thread is running ~BackgroundFetchDelegateProxy, at least, it's the UI thread that doesn't seem to invoke ~BackgroundFetchDelegateProxy::Core off the DeleteOnUIThread. The states table for browser threads had them all as Running, too. Interestingly it seems like the subobject doesn't get destroyed w/the proper Chrome binary, too, just that doesn't make an obvious difference there AFAICT due to the different URLRequestContext ownership.
,
Aug 17 2017
Thinking some more, how hard would it be to make content_shell closer to Chrome when it comes to ownership of URLRequestContext?
,
Aug 17 2017
So...erm...with the network service stuff, that's kind of in flux. Going with the pre-network service route, you'd need to make the BrowserContext own a IO thread object that owns the URLRequestContext, and when it shuts down, you'd have to shut down the getter as well. You'd also have to modify other content shell embedders as well, to do the right thing (Like layout tests, which use a different URLRequestContextGetter). I don't think it would be too bad, but haven't looked into it.
,
Aug 17 2017
Thanks, that helps understand the code a lot better. So ProfileImpl is the BrowserContext for Chrome, and it has ProfileImplIOData::Handle io_data_. The getter shutdown happens like this:
UI thread:
~ProfileImpl
ProfileImplIOData::~Handle
ProfileIOData::ShutdownOnUIThread
Posted to IO thread:
NotifyContextGettersOfShutdownOnIO
ChromeURLRequestContextGetter::NotifyContextShuttingDown <--- makes getters null out the
context pointer.
DeleteSoon of ProfileImplIOData
~URLRequestContext via a bunch of other layers.
(That probably has some other ordering subtleties I am unaware of, though)
Hmm, what's the deal with NotifyContextShuttingDown() only existing in Chromeland, too?
,
Aug 17 2017
It was added to prevent URLFetchers causing crashes on shutdown. If your URLRequestContextGetter owns the URLRequestContext, you're already ensuring that requests from URLFetchers don't outlive the URLRequestContext (Since URLFetches hold onto a reference to the Getter)
,
Aug 17 2017
Hmm, maybe something like https://chromium-review.googlesource.com/c/619169 ? (With expanded description, at least)
,
Aug 25 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4a89010ed41a2838acd43bc88db7f0d0a3361315 commit 4a89010ed41a2838acd43bc88db7f0d0a3361315 Author: Maks Orlovich <morlovich@chromium.org> Date: Fri Aug 25 01:49:44 2017 Make URLRequestContext destruction in content_shell closer to that of Chrome proper, by having it fired off ~ShellBrowserContext, with URLRequestContextGetter::NotifyContextShuttingDown taking care of preventing dangling access. This avoids leaking it (and thus failing to shut down the cache in particular), when one of the many clients of URLRequestContextGetter gets leaked. Bug: 755007 Change-Id: Ibf40073407c6a30758557326aa305570b7bc5bd5 Reviewed-on: https://chromium-review.googlesource.com/619169 Reviewed-by: Matt Menke <mmenke@chromium.org> Reviewed-by: Mike West <mkwst@chromium.org> Commit-Queue: Maks Orlovich <morlovich@chromium.org> Cr-Commit-Position: refs/heads/master@{#497287} [modify] https://crrev.com/4a89010ed41a2838acd43bc88db7f0d0a3361315/content/shell/browser/shell_browser_context.cc [modify] https://crrev.com/4a89010ed41a2838acd43bc88db7f0d0a3361315/content/shell/browser/shell_url_request_context_getter.cc [modify] https://crrev.com/4a89010ed41a2838acd43bc88db7f0d0a3361315/content/shell/browser/shell_url_request_context_getter.h
,
Aug 25 2017
Should be fixed.
,
Aug 25 2017
,
Oct 16 2017
,
Dec 1 2017
This bug has been closed for more than 14 weeks. Removing security view restrictions. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by sheriffbot@chromium.org
, Aug 13 2017