New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 814334 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug-Regression



Sign in to add a comment

Potential memory leak in browser ResourceLoader

Project Member Reported by etienneb@chromium.org, Feb 21 2018

Issue description

Out of Process Heap Profiling is a project that, for a very small set users, records heap allocations and sends back anonymized heap dumps. We've obtained a heap dump that suggests that there's a potential memory leak. We do not have repro steps. Small allocations [in size or frequency] were pruned from the heap dump before it was uploaded.

I've attached the symbolized trace. For instructions on how to use it, see https://chromium.googlesource.com/chromium/src/+/lkcr/docs/memory-infra/heap_profiler.md.

In this heap dump, there are ~276k objects created, using about 500M of memory.
We received in the last 4 days, 23 reports with this specific memory leak.
(see attachment).

A quick look to the code points to this code:
https://cs.chromium.org/chromium/src/content/browser/loader/resource_loader.cc?type=cs&q=content::ResourceLoader::OnAuthRequired&sq=package:chromium&l=450



  // Create a login dialog on the UI thread to get authentication data, or pull
  // from cache and continue on the IO thread.

  DCHECK(!login_delegate_.get())
      << "OnAuthRequired called with login_delegate pending";
  login_delegate_ = delegate_->CreateLoginDelegate(this, auth_info);
  if (!login_delegate_.get())
    request_->CancelAuth();


I suspect the DCHECK to not always be true.

 
bug.png
120 KB View Download
By looking to albratros results (crbug - label:Hotlist-Dcheck), I was not able to find instance of that DCHECK failing.
Description: Show this description
Here are the symbolized traces and some information part of the trace.

  reportid: e51731314e0249d1
  66.0.3334.4
  Report TimeThu, 15 Feb 2018 15:25:21 GMT
trace1.png
34.6 KB View Download
trace2.png
11.4 KB View Download
trace3.png
45.0 KB View Download
trace-e51731314e0249d1.gz
81.8 KB Download
Labels: -Pri-3 M-66 Pri-1
Owner: csharrison@chromium.org
Status: Assigned (was: Available)
The trace shows that there are 276k calls to new from ChromeResourceDispatcherHostDelegate::CreateLoginDelegate that are never deleted. The code is optimized, but it's clear that the objects are instances of LoginHandler.

There are another 276k calls to new in content::ResourceRequestInfoImpl::GetWebContentsGetterForRequest, and another 276k calls to new in net::HttpAuthController::PopulateAuthChallenge that are never deleted.

I tried tracing through the logic to see what could be leaking the LoginHandlers, but this is ref-counted object with many places that add and remove refs. Over to csharrison for further investigation.
Cc: csharrison@chromium.org
Components: Internals>Network>Auth
Owner: ----
Status: Available (was: Assigned)
To Internals>Network>Auth for triage. I don't really know this code very well. It looks like the LoginInterstitialDelegate holds a reference to the LoginHandler via |closure_|. I don't think that's the leak because the allocation of |closure_| wasn't in the dump.

Comment 6 by ssid@chromium.org, Feb 21 2018

Cc: xunji...@chromium.org
The diff_heap_dump script is giving us more stackframes that are leaking. See attachment.

Open the json file in a JSon viewer to see every potential leaked objects.
process_8912_malloc-leaks.json
79.4 KB View Download
stack1.png
52.0 KB View Download
stack2.png
52.3 KB View Download
stack3.png
55.1 KB View Download
stack4.png
25.2 KB View Download
Status: Untriaged (was: Available)
Not sure what's going on. I will bring this up during triage meeting today.

Comment 9 by mmenke@chromium.org, Feb 21 2018

Cc: -xunji...@chromium.org
Labels: Network-Triaged
Owner: xunji...@chromium.org
Status: Assigned (was: Untriaged)
I am not familiar with the code, but this doesn't seem like a leak to me. 
It's entirely possible that these LoginHandler and AuthInfo are alive because the callbacks are retaining the refs and the callbacks haven't been executed.

chrome/browser/ui/login/login_handler.cc
  LoginHandler* handler = LoginHandler::Create(auth_info, request);
  BrowserThread::PostTask(
      BrowserThread::UI, FROM_HERE,
      base::BindOnce(&LoginHandler::LoginDialogCallback, request->url(),
                     base::RetainedRef(auth_info), base::RetainedRef(handler),
                     is_main_frame));
  return handler;

The Json data seem to suggest so -- the count of allocation stacks for LogicHandler and AuthInfo match, but there aren't any other objects that have allocation stacks with this much "count."
Cc: asanka@chromium.org davidben@chromium.org
//net MemoryDumpProvider says there is only 1 outstanding URLRequest when the memory dump is taken, so it's rather weird that we have so many LoginHandler and AuthChallengeInfo. The heap profiler relies on the UI thread working, so presumably the user's UI thread is still responsive.

+asanka who used to work on Auth, and davidben@ who is an owner of LoginHandler: Any thought?


Screen Shot 2018-02-21 at 7.33.30 PM.png
48.3 KB View Download
Just a quick note on #10, LoginHandler has a reference to the AuthChallengeInfo, so that object doesn't necessarily have another reference in a closure.
Labels: OS-Windows
Yea, I initially thought that it was a proxy requesting auth challenges. So if a user has a large number of pending requests (e.g. during session restore), we can have all those URLRequests calling back to the UI thread, with each creating a LoginHandler. //net MemoryDumpProvider says there is exactly 1 outstanding URLRequest, so it disproved #10. 

+ Windows label, this trace is taken on windows.

If a request is cancelled, we may not pro-actively kill the auth challenge information on the UI thread.  So perhaps it could be users who are elsewhere, and a site that continuously makes XHRs in the background, cancelling them after some timeout.
Re#14. I wrote a test page to simulate this behavior, but wasn't able to repro the leak. The test page continuously issued XHRs and canceled each request after a timeout, while the test page is in a background tab. New LoginHandlers are being created but are deleted when the corresponding request is canceled.
I have a theory for how this could happen. First, observations:

We have 276k LoginHandlers. We also have 276k web_contents_getter_'s, 276k registrar_'s, and 276k auth_info_'s. However, we do not have 276k LoginHandlerViews::dialog_.

Furthermore, note that the registrar_ keeps the LoginHandler alive. This registrar is reset after the dialog_ is closed. This suggests either:

1) The dialog_ is being created, and closed, but somehow the registrar is never reset.
2) The dialog_ is never created, thus never closed, and thus the registrar is never reset.

This now leads us to a potential sequence of events that would cause this bug:

1) CreateLoginPrompt is called, creating the LoginHandler. A LoginHandler::LoginDialogCallback task is posted to the UI thread.
2) Auth is canceled, via LoginHandler::DoCancelAuth.
3) LoginHandler::LoginDialogCallback is run, and early-outs because "handler->WasAuthHandled" returns true.

(3) is supposed to create dialog_, but since (3) early outs, the dialog is never created. 
We would expect (2) to appropriately clean up "registrar_", but ...

LoginHandler::DoCancelAuth calls
LoginHandler::CloseContentsDeferred calls
LoginHandlerViews::CloseDialog, which early outs if "dialog_" doesn't exist.

 void CloseDialog() override {
    // The hosting widget may have been freed.
    if (dialog_)
      dialog_->Close();
  }

Thus, leak. I have no idea how to intentionally trigger this series of events, but this should be easy to verify in a unit test. I think the root problem [aside from confusing ownership semantics] is that if the dialog is never created, then the LoginHandler will always leak.  
Cc: vabr@chromium.org mea...@chromium.org
So looks like a LoginHandler issue?

[+davidben, +meacer, +vabr]:  Looks like you guys own chrome/browser/ui/login.  Any of you want to take this bug?  (I suspect davidben should no longer be an owner, but could be wrong).
Cc: xunji...@chromium.org
Owner: ----
Status: Available (was: Assigned)
Could you help me understand this part: If the LoginHandler goes away, the registrar goes away too (because unique_ptr). So how do we have 276k LoginHandlers in the first place?

I understand that releasing the registrar if there is no auth dialog is a potential fix, but is that actually going to fix the problem with that many LoginHandlers hanging around? Is the idea that the registrar keeps creating more LoginHandlers? Or is this just a corner case that manifested the bug?
Sorry, could you please clarify what you mean? Here's what I think is happening.

Some process [sorry, I don't know anything about the underlying system] is creating lots of LoginHandlers. There happens to be a code path that will make it so that LoginHandlers are never destroyed [if the dialog is never shown]. I suspect that is half of the problem, and fixing it will fix the memory leak.

As to what causes 276k Loginhandlers to be created...I have no idea. 
Ah, okay. So the fact that the dialog is never closed (because it's never shown) is also preventing the registrar (hence the LoginHandler) from being destroyed. You pointed this (that registrar keeps LoginHandler alive) in c16 but I missed it.
How does registrar_ keep the LoginHandler alive? It doesn't take a reference?
Ah, my mistake, I thought that we were passed a strong-reference to the registrar's observer list, but we're actually passing a raw pointer. So then that raises the question - what is keeping the Loginhandler alive? 

And the answer it - itself. 

https://cs.chromium.org/chromium/src/chrome/browser/ui/login/login_handler.cc?type=cs&l=117

The only time the object is released is when we call LoginHandler::ReleaseSoon, which is never called because we never make the dialog.
Yup. That's what I realized as well. The chain from LoginHandlerViews::CloseDialog() ->  LoginHandlerViews::DeleteDelegate() could be broken.

It appears there's also a suppression for this memory leak associated with the incorrectly closed  issue 84265 , which may explain why we didn't see it on the bots.

I noticed this reading through it, but I don't think it could cause a leak:
https://chromium-review.googlesource.com/#/c/chromium/src/+/940021
Project Member

Comment 26 by bugdroid1@chromium.org, Mar 1 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9a5ca91778f04309cb6982697ae80a2fc89fd94b

commit 9a5ca91778f04309cb6982697ae80a2fc89fd94b
Author: David Benjamin <davidben@chromium.org>
Date: Thu Mar 01 23:23:29 2018

Don't return raw pointers when creating LoginHandlers.

Keep them tagged as scoped_refptr. Returning a raw pointer sort of
works, but because some refcounts have already been acquired in the
process, we're relying on at least one of those refcounts sticking
around long enough for the raw pointer to be cashed in.

LoginHandler is rather a mess and really should get rewritten (move
UI/IO hop within //content, surface out of WebContentsDelegate,
get rid of NotificationRegistrar and ref-counting), but in the
meantime fix up this local issue.

Tagging with  https://crbug.com/814334  in case it's related, but I
don't think it is. (This isn't turning an owned raw pointer to a
scoped_refptr. It's turning a quasi-borrowed raw pointer to a
scoped_refptr.)

Bug:  814334 
Change-Id: I542799c20f7453b76f46c1cfad582a8de833a684
Reviewed-on: https://chromium-review.googlesource.com/940021
Commit-Queue: David Benjamin <davidben@chromium.org>
Reviewed-by: Scott Violet <sky@chromium.org>
Reviewed-by: Nick Carter <nick@chromium.org>
Reviewed-by: Richard Coles <torne@chromium.org>
Reviewed-by: Charlie Harrison <csharrison@chromium.org>
Cr-Commit-Position: refs/heads/master@{#540341}
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/android_webview/browser/aw_content_browser_client.cc
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/android_webview/browser/aw_content_browser_client.h
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/chrome/browser/chrome_content_browser_client.cc
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/chrome/browser/chrome_content_browser_client.h
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/chrome/browser/ui/android/login_handler_android.cc
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/chrome/browser/ui/browser_dialogs.h
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/chrome/browser/ui/cocoa/login_handler_cocoa.mm
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/chrome/browser/ui/login/login_handler.cc
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/chrome/browser/ui/login/login_handler.h
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/chrome/browser/ui/views/browser_dialogs_views.cc
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/chrome/browser/ui/views/login_handler_views.cc
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/content/browser/loader/resource_dispatcher_host_impl.cc
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/content/browser/loader/resource_dispatcher_host_impl.h
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/content/browser/loader/resource_loader_delegate.h
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/content/browser/loader/resource_loader_unittest.cc
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/content/public/browser/content_browser_client.cc
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/content/public/browser/content_browser_client.h
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/content/shell/browser/layout_test/layout_test_content_browser_client.cc
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/content/shell/browser/layout_test/layout_test_content_browser_client.h
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/content/shell/browser/shell_content_browser_client.cc
[modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/content/shell/browser/shell_content_browser_client.h

Labels: Hotlist-HeapProfilingInField
Project Member

Comment 28 by bugdroid1@chromium.org, Mar 10 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/470ff7ff522a1613d567a6aad3e66c1fb7aeb008

commit 470ff7ff522a1613d567a6aad3e66c1fb7aeb008
Author: Erik Chen <erikchen@chromium.org>
Date: Sat Mar 10 01:39:24 2018

Tentative fix for LoginHandler leak.

The ownership semantics for LoginHandler are very confusing. The class is
self-owned, and is only destroyed when LoginHandler::ReleaseSoon() is called.
But this relies on the assumption that the LoginHandlerView is shown, which isn't
always the case. This CL adds tracking for whether the LoginHandlerView has been
shown, and if it has never been shown but CancelAuth() is called, then
LoginHandler::ReleaseSoon() will also be called.

This relies on the assumption that all codes paths must either show the
LoginHandlerView or else call CancelAuth().

Ideally, the whole class should be refactored to have saner ownership
semantics.

Bug:  814334 
Change-Id: I28085ca948a5629f11e6263153938e5c87f42518
Reviewed-on: https://chromium-review.googlesource.com/957283
Reviewed-by: Mustafa Emre Acer <meacer@chromium.org>
Commit-Queue: Erik Chen <erikchen@chromium.org>
Cr-Commit-Position: refs/heads/master@{#542305}
[modify] https://crrev.com/470ff7ff522a1613d567a6aad3e66c1fb7aeb008/chrome/browser/ui/login/login_handler.cc
[modify] https://crrev.com/470ff7ff522a1613d567a6aad3e66c1fb7aeb008/chrome/browser/ui/login/login_handler.h

Owner: erikc...@chromium.org
Assigning to Erik, as he landed the most recent patch here.
Status: Fixed (was: Available)
We're no longer getting heap dumps with this signature.

Furthermore, this is a giant drop in memory usage that corresponds to the release that this fix was first included (C48). 
https://uma.googleplex.com/p/chrome/timeline_v2/?sid=f54df0d6f572300b0fbf74d0aef14a73


Marking this as fixed.
Screen Shot 2018-03-26 at 8.41.27 PM.png
127 KB View Download
I'm skeptical that the fix is the cause of that - auth just isn't that common (And when it is used, this shouldn't generally trigger that often, I believe).  Or am I missing something?
> auth just isn't that common (And when it is used, this shouldn't generally trigger that often, I believe).  Or am I missing something?

This was one of the most common memory leaks we observed via heap profiling in the wild, and it regularly consumed >500MB of live memory, with over 250k+ leaked instances.

While I don't have a local repro, I suspect that the same logic pattern that would cause the leak to occur [e.g. immediately hitting the cancellation path] happens to align with logic that causes us to attempt to reshow the dialog, thus quickly leaking many instances.
One interesting thing to note, which may be relevant to other memory UMA is that:
1. Filtering by clients with the dominant or latest version completely eliminates this drop (and %iles are reduced by a lot). This indicates that out-of-date clients are skewing this metric.
https://uma.googleplex.com/p/chrome/timeline_v2/?sid=49811baf12815ba7663da8cd177cf7cb


2. This win is not noticeable on dev channel, which should have picked up the fix by now, afaict.
It's a drop in the 95th percentile, not the 50th percentile, which is consistent with auth not being all that common. I'm actually surprised even 5% of folks see HTTP auth (I can't remember when I last hit it), but maybe?

Nice to see such a dramatic drop!
I've attached a screenshot of Windows Dev channel, 99th percentile [which is still lower in MB than 95th percentile on Canary].

The same drop is visible at the first release that included this change.

https://uma.googleplex.com/p/chrome/timeline_v2/?sid=5b45047c6c4548d143435f43a84a0dab


> This indicates that out-of-date clients are skewing this metric.

This is expected. Users that have the latest version of Canary are restarting their browser about once a day. That gives memory leaks less time to build up. Users that keep a Canary running for multiple days will have higher memory usage.

> I'm actually surprised even 5% of folks see HTTP auth

Keep in mind, this isn't 5% of users, this is 5% of UMA-pings. Users who leave their devices online for multiple days in a row will get in many more UMA-pings than users who shut down Chrome after they're done using it. [Thus allowing for updates on restart].
Screen Shot 2018-03-27 at 11.06.20 AM.png
113 KB View Download
Thanks Erik that makes sense. Hopefully we will see a more dramatic drop in dev as older clients start updating. Thanks for working on this!
> Hopefully we will see a more dramatic drop in dev as older clients start updating.

Perhaps! What I find really surprising is that Canary users [prior to this change] at 95th/99th percentiles were sitting at double the memory usage of Dev users. This suggests a rather large difference in populations being sampled. [e.g. Maybe Canary users pathologically encounter certain memory bugs like this one more frequently than Dev users]. Not sure what's going on here.

Looking at all channels, Canary uses the most memory, followed by Beta. Then Dev and Stable mirror each other almost perfectly. Weird.
Screen Shot 2018-03-27 at 11.26.38 AM.png
156 KB View Download
Cc: -vabr@chromium.org

Sign in to add a comment