New issue
Advanced search Search tips

Issue 755007 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug-Security



Sign in to add a comment

conent_shell: Heap-use-after-free in net::NetLog::AddEntry

Project Member Reported by ClusterFuzz, Aug 13 2017

Issue description

Detailed report: https://clusterfuzz.com/testcase?key=5481419861393408

Fuzzer: mbarbella_js_mutation_layout
Job Type: windows_asan_content_shell
Platform Id: windows

Crash Type: Heap-use-after-free READ 4
Crash Address: 0x1c142b8c
Crash State:
  net::NetLog::AddEntry
  net::NetLogWithSource::AddEntry
  net::NetLogWithSource::EndEvent
  
Sanitizer: address (ASAN)

Recommended Security Severity: High

Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=5481419861393408

Issue filed automatically.

See https://github.com/google/clusterfuzz-tools for more information.

Note: This crash might not be reproducible with the provided testcase. That said, for the past 14 days we've been seeing this crash frequently. If you are unable to reproduce this, please try a speculative fix based on the crash stacktrace in the report. The fix can be verified by looking at the crash statistics in the report, a day after the fix is deployed. If the fix resolved the issue, please close the bug by marking as Fixed.
 
Project Member

Comment 1 by sheriffbot@chromium.org, Aug 13 2017

Labels: Pri-1

Comment 2 by tsepez@chromium.org, Aug 14 2017

Labels: -Pri-1 -Security_Severity-High Security_Severity-Medium M-62 Pri-2
Owner: tzik@chromium.org
Status: Assigned (was: Untriaged)
tzik, you did some work (although months and months ago) with reference counting in entry_impl.cc, so you may have some experience in this area.  Unfortunately, there's not a lot to go on in these "unreproducible" (i.e. Unreliably reproducible) cases. Feel free to re-assign as appropriate.

Reducing severity as unreliability is a mitigating factor.

Comment 3 by tsepez@chromium.org, Aug 14 2017

Components: Internals>Network>Cache
Labels: Security_Impact-Stable

Comment 4 by tzik@chromium.org, Aug 15 2017

Cc: jkarlin@chromium.org mmenke@chromium.org
Looks like an NetLog UAF on the browser shutdown phase.
//net/disk_cache/blockfile stuff lives on a dedicated InternalCacheThread, and, as no one stop the thread, it seems to outlive the main thread stuff.

We should probably replace it with base::TaskScheduler?
Cc: morlovich@chromium.org
The thread needs TYPE_IO, so I thought scheduler stuff was no go?


Anyway, this is likely a regression due to me stopping using the browser cache thread, which of course has an ordered shutdown wrt to the I/O thread.

What may also matter here is timing of ~HttpCache, that really should be enough to make that thread idle unless I am missing something.

Actually I am not seeing content_shell deleting the HttpCache (or the URLRequestContext) at all? 

@mmenke: I may need your help w/this, as it's touching way too much code I am seeing for the first time...

Comment 10 by tzik@chromium.org, Aug 16 2017

Owner: morlovich@chromium.org
Let me pass this to morlovich.

Hm, the destruction of disk_cache::BackendImpl and its handling in disk_cache::EntryImpl look racy to me even before CACHE thread elimination.
BackendImpl has a WeakPtrFactory and it's destroyed on IO thread, and EntryImpl has a WeakPtr to BackendImpl as a back pointer and checks it in the destructor (around entry_impl.cc:935) that runs on the background thread.
However, if BackendImpl is destroyed after the back pointer check, that causes a BackendImpl UAF.
The HttpCache is owned by ProfileIOData (Well, PRofileIOData owns the URLRequestContext which owns the cache).  The problem is presumably that we destroy the NetLog (Owned by Chrome's BrowserProcess, I think?) before the task scheduler.

So NetLog outlasts all named threads, but it may not outlast unnamed ones.
Cc: eroman@chromium.org
We may have to make NetLog a Singleton...Something we've discussed before.
re: comment #10:
There is a bunch of blocking in ~BackendImpl which ensures the background threads work queue is empty; and after that EntryImpl object will not destruct but rather leak due to:

void EntryImpl::Close() {
  if (background_queue_.get())
    background_queue_->CloseEntryImpl(this);
}
... which is called on the I/O thread, with background_queue_ a WeakPtr as well. Not sure what the check in ~EntryImpl you pointed out is for, indeed, though the code is a bit strange since it lets tests run with a different thread model, so it might only be useful when there is one thread.

Re: comment #11:
For a bit more context, I added logging output to ~HttpCache, ~URLRequestContext, ~ShellURLRequestContextGetter, and none seem to happen for the content_shell binary, though I can't be certain that's not because of some logging stuff (constructors show up fine, as does destruction with the Chrome binary).

Also not sure how ordering wrt to TaskScheduler destructor matters since it's a plain base::Thread doing things.

Hadn't realized this was content shell, and not Chrome.  I'm not familiar with content_shell's URLRequestContext ownership model, though skimming over it, looks like the URLRequestContextGetter, which is refcounted, owns everything.  With refcounting, you kinda lose all guarantees of sanity.
Summary: conent_shell: Heap-use-after-free in net::NetLog::AddEntry (was: Heap-use-after-free in net::NetLog::AddEntry)
Yeah, I tried to trace the ref count via shadowing of AddRef/Release for URLRequestContextGetter, but for some reason couldn't upcall, which is partly why I was desperate for help getting the big picture. Hmm, I should probably do it via specialization or something. 

Rather than focusing on URLREquestContext teardown, it may be better to focus on NetLog teardown, any why it's being destroyed before block_shutdown tasks.  Even if URLRequestContext teardown were fixed, I assume we'd still have a shutdown ordering issue.  Or do we block HttpCache teardown on completing all pending cache-related tasks?
In case of blockfile cache, yes, ~HttpCache does some blocking with that intent, due to ~BackendImpl doing that.

SimpleCache OTOH only ever uses NetLog in I/O thread, though that could have its own issues, I suppose, if NetLog is destroyed before the I/O thread is?


NetLog is guaranteed to outlive the IOThread in Chrome (Less sure about content_shell, though I'd guess it's also the case there).  It outlives all named threads.
It would reduce test coverage, but I suppose we could just use SimpleCache in content_shell.
Got refcount on the getter tracing, but for some reason base::debug::StackTrace is being flaky with symbolization... (the addresses themselves seem fine per addr2line..)

preliminary: BackgroundFetchDelegateProxy::Core (which has a scoped_refptr<net::URLRequestContextGetter> request_context_) doesn't seem to be deleted even though the parent BackgroundFetchDelegateProxy is, and owns the Core helper object via: std::unique_ptr<Core, BrowserThread::DeleteOnUIThread> ui_core_;

Can confirm the above as the root cause, but I am at a loss at how to proceed --- why wouldn't DeleteOnUIThread from IO -> Browser work? 

Maybe we've already posted the task to shutdown the IO thread at that point?
Well, the I/O thread is running ~BackgroundFetchDelegateProxy, at least, it's the UI thread that doesn't seem to invoke ~BackgroundFetchDelegateProxy::Core off the DeleteOnUIThread. The states table for browser threads had them all as Running, too.

Interestingly it seems like the subobject doesn't get destroyed w/the proper Chrome binary, too, just that doesn't make an obvious difference there AFAICT due to the different URLRequestContext ownership.

Thinking some more, how hard would it be to make content_shell closer to Chrome when it comes to ownership of URLRequestContext? 
So...erm...with the network service stuff, that's kind of in flux.

Going with the pre-network service route, you'd need to make the BrowserContext own a IO thread object that owns the URLRequestContext, and when it shuts down, you'd have to shut down the getter as well.  You'd also have to modify other content shell embedders as well, to do the right thing (Like layout tests, which use a different URLRequestContextGetter).  I don't think it would be too bad, but haven't looked into it.
Thanks, that helps understand the code a lot better. So ProfileImpl is the BrowserContext for Chrome, and it has ProfileImplIOData::Handle io_data_. The getter shutdown  happens like this:

UI thread:
  ~ProfileImpl
  ProfileImplIOData::~Handle
  ProfileIOData::ShutdownOnUIThread
Posted to IO thread:
  NotifyContextGettersOfShutdownOnIO
  ChromeURLRequestContextGetter::NotifyContextShuttingDown <--- makes getters null out the
     context pointer.

  DeleteSoon of ProfileImplIOData
  ~URLRequestContext via a bunch of other layers.

(That probably has some other ordering subtleties I am unaware of, though)

Hmm, what's the deal with NotifyContextShuttingDown() only existing in Chromeland, too?

It was added to prevent URLFetchers causing crashes on shutdown.  If your URLRequestContextGetter owns the URLRequestContext, you're already ensuring that requests from URLFetchers don't outlive the URLRequestContext (Since URLFetches hold onto a reference to the Getter)
Hmm, maybe something like https://chromium-review.googlesource.com/c/619169 ? 
(With expanded description, at least)
Project Member

Comment 31 by bugdroid1@chromium.org, Aug 25 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4a89010ed41a2838acd43bc88db7f0d0a3361315

commit 4a89010ed41a2838acd43bc88db7f0d0a3361315
Author: Maks Orlovich <morlovich@chromium.org>
Date: Fri Aug 25 01:49:44 2017

Make URLRequestContext destruction in content_shell closer to that
of Chrome proper, by having it fired off ~ShellBrowserContext, with 
URLRequestContextGetter::NotifyContextShuttingDown taking care of 
preventing dangling access.

This avoids leaking it (and thus failing to shut down the cache in 
particular), when one of the many clients of URLRequestContextGetter
gets leaked.

Bug:  755007 
Change-Id: Ibf40073407c6a30758557326aa305570b7bc5bd5
Reviewed-on: https://chromium-review.googlesource.com/619169
Reviewed-by: Matt Menke <mmenke@chromium.org>
Reviewed-by: Mike West <mkwst@chromium.org>
Commit-Queue: Maks Orlovich <morlovich@chromium.org>
Cr-Commit-Position: refs/heads/master@{#497287}
[modify] https://crrev.com/4a89010ed41a2838acd43bc88db7f0d0a3361315/content/shell/browser/shell_browser_context.cc
[modify] https://crrev.com/4a89010ed41a2838acd43bc88db7f0d0a3361315/content/shell/browser/shell_url_request_context_getter.cc
[modify] https://crrev.com/4a89010ed41a2838acd43bc88db7f0d0a3361315/content/shell/browser/shell_url_request_context_getter.h

Status: Fixed (was: Assigned)
Should be fixed.
Project Member

Comment 33 by sheriffbot@chromium.org, Aug 25 2017

Labels: -Restrict-View-SecurityTeam Restrict-View-SecurityNotify
Labels: Release-0-M62
Project Member

Comment 35 by sheriffbot@chromium.org, Dec 1 2017

Labels: -Restrict-View-SecurityNotify allpublic
This bug has been closed for more than 14 weeks. Removing security view restrictions.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Sign in to add a comment