Issue metadata
Sign in to add a comment
|
Potential memory leak in browser ResourceLoader |
||||||||||||||||||||||
Issue descriptionOut of Process Heap Profiling is a project that, for a very small set users, records heap allocations and sends back anonymized heap dumps. We've obtained a heap dump that suggests that there's a potential memory leak. We do not have repro steps. Small allocations [in size or frequency] were pruned from the heap dump before it was uploaded. I've attached the symbolized trace. For instructions on how to use it, see https://chromium.googlesource.com/chromium/src/+/lkcr/docs/memory-infra/heap_profiler.md. In this heap dump, there are ~276k objects created, using about 500M of memory. We received in the last 4 days, 23 reports with this specific memory leak. (see attachment). A quick look to the code points to this code: https://cs.chromium.org/chromium/src/content/browser/loader/resource_loader.cc?type=cs&q=content::ResourceLoader::OnAuthRequired&sq=package:chromium&l=450 // Create a login dialog on the UI thread to get authentication data, or pull // from cache and continue on the IO thread. DCHECK(!login_delegate_.get()) << "OnAuthRequired called with login_delegate pending"; login_delegate_ = delegate_->CreateLoginDelegate(this, auth_info); if (!login_delegate_.get()) request_->CancelAuth(); I suspect the DCHECK to not always be true.
,
Feb 21 2018
,
Feb 21 2018
Here are the symbolized traces and some information part of the trace. reportid: e51731314e0249d1 66.0.3334.4 Report TimeThu, 15 Feb 2018 15:25:21 GMT
,
Feb 21 2018
The trace shows that there are 276k calls to new from ChromeResourceDispatcherHostDelegate::CreateLoginDelegate that are never deleted. The code is optimized, but it's clear that the objects are instances of LoginHandler. There are another 276k calls to new in content::ResourceRequestInfoImpl::GetWebContentsGetterForRequest, and another 276k calls to new in net::HttpAuthController::PopulateAuthChallenge that are never deleted. I tried tracing through the logic to see what could be leaking the LoginHandlers, but this is ref-counted object with many places that add and remove refs. Over to csharrison for further investigation.
,
Feb 21 2018
To Internals>Network>Auth for triage. I don't really know this code very well. It looks like the LoginInterstitialDelegate holds a reference to the LoginHandler via |closure_|. I don't think that's the leak because the allocation of |closure_| wasn't in the dump.
,
Feb 21 2018
,
Feb 21 2018
The diff_heap_dump script is giving us more stackframes that are leaking. See attachment. Open the json file in a JSon viewer to see every potential leaked objects.
,
Feb 21 2018
Not sure what's going on. I will bring this up during triage meeting today.
,
Feb 21 2018
,
Feb 21 2018
I am not familiar with the code, but this doesn't seem like a leak to me.
It's entirely possible that these LoginHandler and AuthInfo are alive because the callbacks are retaining the refs and the callbacks haven't been executed.
chrome/browser/ui/login/login_handler.cc
LoginHandler* handler = LoginHandler::Create(auth_info, request);
BrowserThread::PostTask(
BrowserThread::UI, FROM_HERE,
base::BindOnce(&LoginHandler::LoginDialogCallback, request->url(),
base::RetainedRef(auth_info), base::RetainedRef(handler),
is_main_frame));
return handler;
The Json data seem to suggest so -- the count of allocation stacks for LogicHandler and AuthInfo match, but there aren't any other objects that have allocation stacks with this much "count."
,
Feb 22 2018
//net MemoryDumpProvider says there is only 1 outstanding URLRequest when the memory dump is taken, so it's rather weird that we have so many LoginHandler and AuthChallengeInfo. The heap profiler relies on the UI thread working, so presumably the user's UI thread is still responsive. +asanka who used to work on Auth, and davidben@ who is an owner of LoginHandler: Any thought?
,
Feb 22 2018
Just a quick note on #10, LoginHandler has a reference to the AuthChallengeInfo, so that object doesn't necessarily have another reference in a closure.
,
Feb 22 2018
Yea, I initially thought that it was a proxy requesting auth challenges. So if a user has a large number of pending requests (e.g. during session restore), we can have all those URLRequests calling back to the UI thread, with each creating a LoginHandler. //net MemoryDumpProvider says there is exactly 1 outstanding URLRequest, so it disproved #10. + Windows label, this trace is taken on windows.
,
Feb 22 2018
If a request is cancelled, we may not pro-actively kill the auth challenge information on the UI thread. So perhaps it could be users who are elsewhere, and a site that continuously makes XHRs in the background, cancelling them after some timeout.
,
Feb 22 2018
Re#14. I wrote a test page to simulate this behavior, but wasn't able to repro the leak. The test page continuously issued XHRs and canceled each request after a timeout, while the test page is in a background tab. New LoginHandlers are being created but are deleted when the corresponding request is canceled.
,
Feb 22 2018
I have a theory for how this could happen. First, observations:
We have 276k LoginHandlers. We also have 276k web_contents_getter_'s, 276k registrar_'s, and 276k auth_info_'s. However, we do not have 276k LoginHandlerViews::dialog_.
Furthermore, note that the registrar_ keeps the LoginHandler alive. This registrar is reset after the dialog_ is closed. This suggests either:
1) The dialog_ is being created, and closed, but somehow the registrar is never reset.
2) The dialog_ is never created, thus never closed, and thus the registrar is never reset.
This now leads us to a potential sequence of events that would cause this bug:
1) CreateLoginPrompt is called, creating the LoginHandler. A LoginHandler::LoginDialogCallback task is posted to the UI thread.
2) Auth is canceled, via LoginHandler::DoCancelAuth.
3) LoginHandler::LoginDialogCallback is run, and early-outs because "handler->WasAuthHandled" returns true.
(3) is supposed to create dialog_, but since (3) early outs, the dialog is never created.
We would expect (2) to appropriately clean up "registrar_", but ...
LoginHandler::DoCancelAuth calls
LoginHandler::CloseContentsDeferred calls
LoginHandlerViews::CloseDialog, which early outs if "dialog_" doesn't exist.
void CloseDialog() override {
// The hosting widget may have been freed.
if (dialog_)
dialog_->Close();
}
Thus, leak. I have no idea how to intentionally trigger this series of events, but this should be easy to verify in a unit test. I think the root problem [aside from confusing ownership semantics] is that if the dialog is never created, then the LoginHandler will always leak.
,
Feb 22 2018
So looks like a LoginHandler issue? [+davidben, +meacer, +vabr]: Looks like you guys own chrome/browser/ui/login. Any of you want to take this bug? (I suspect davidben should no longer be an owner, but could be wrong).
,
Feb 22 2018
,
Feb 27 2018
Could you help me understand this part: If the LoginHandler goes away, the registrar goes away too (because unique_ptr). So how do we have 276k LoginHandlers in the first place? I understand that releasing the registrar if there is no auth dialog is a potential fix, but is that actually going to fix the problem with that many LoginHandlers hanging around? Is the idea that the registrar keeps creating more LoginHandlers? Or is this just a corner case that manifested the bug?
,
Feb 27 2018
Sorry, could you please clarify what you mean? Here's what I think is happening. Some process [sorry, I don't know anything about the underlying system] is creating lots of LoginHandlers. There happens to be a code path that will make it so that LoginHandlers are never destroyed [if the dialog is never shown]. I suspect that is half of the problem, and fixing it will fix the memory leak. As to what causes 276k Loginhandlers to be created...I have no idea.
,
Feb 27 2018
Ah, okay. So the fact that the dialog is never closed (because it's never shown) is also preventing the registrar (hence the LoginHandler) from being destroyed. You pointed this (that registrar keeps LoginHandler alive) in c16 but I missed it.
,
Feb 27 2018
How does registrar_ keep the LoginHandler alive? It doesn't take a reference?
,
Feb 27 2018
Ah, my mistake, I thought that we were passed a strong-reference to the registrar's observer list, but we're actually passing a raw pointer. So then that raises the question - what is keeping the Loginhandler alive? And the answer it - itself. https://cs.chromium.org/chromium/src/chrome/browser/ui/login/login_handler.cc?type=cs&l=117 The only time the object is released is when we call LoginHandler::ReleaseSoon, which is never called because we never make the dialog.
,
Feb 27 2018
Yup. That's what I realized as well. The chain from LoginHandlerViews::CloseDialog() -> LoginHandlerViews::DeleteDelegate() could be broken. It appears there's also a suppression for this memory leak associated with the incorrectly closed issue 84265 , which may explain why we didn't see it on the bots.
,
Feb 27 2018
I noticed this reading through it, but I don't think it could cause a leak: https://chromium-review.googlesource.com/#/c/chromium/src/+/940021
,
Mar 1 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9a5ca91778f04309cb6982697ae80a2fc89fd94b commit 9a5ca91778f04309cb6982697ae80a2fc89fd94b Author: David Benjamin <davidben@chromium.org> Date: Thu Mar 01 23:23:29 2018 Don't return raw pointers when creating LoginHandlers. Keep them tagged as scoped_refptr. Returning a raw pointer sort of works, but because some refcounts have already been acquired in the process, we're relying on at least one of those refcounts sticking around long enough for the raw pointer to be cashed in. LoginHandler is rather a mess and really should get rewritten (move UI/IO hop within //content, surface out of WebContentsDelegate, get rid of NotificationRegistrar and ref-counting), but in the meantime fix up this local issue. Tagging with https://crbug.com/814334 in case it's related, but I don't think it is. (This isn't turning an owned raw pointer to a scoped_refptr. It's turning a quasi-borrowed raw pointer to a scoped_refptr.) Bug: 814334 Change-Id: I542799c20f7453b76f46c1cfad582a8de833a684 Reviewed-on: https://chromium-review.googlesource.com/940021 Commit-Queue: David Benjamin <davidben@chromium.org> Reviewed-by: Scott Violet <sky@chromium.org> Reviewed-by: Nick Carter <nick@chromium.org> Reviewed-by: Richard Coles <torne@chromium.org> Reviewed-by: Charlie Harrison <csharrison@chromium.org> Cr-Commit-Position: refs/heads/master@{#540341} [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/android_webview/browser/aw_content_browser_client.cc [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/android_webview/browser/aw_content_browser_client.h [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/chrome/browser/chrome_content_browser_client.cc [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/chrome/browser/chrome_content_browser_client.h [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/chrome/browser/ui/android/login_handler_android.cc [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/chrome/browser/ui/browser_dialogs.h [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/chrome/browser/ui/cocoa/login_handler_cocoa.mm [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/chrome/browser/ui/login/login_handler.cc [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/chrome/browser/ui/login/login_handler.h [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/chrome/browser/ui/views/browser_dialogs_views.cc [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/chrome/browser/ui/views/login_handler_views.cc [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/content/browser/loader/resource_dispatcher_host_impl.cc [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/content/browser/loader/resource_dispatcher_host_impl.h [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/content/browser/loader/resource_loader_delegate.h [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/content/browser/loader/resource_loader_unittest.cc [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/content/public/browser/content_browser_client.cc [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/content/public/browser/content_browser_client.h [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/content/shell/browser/layout_test/layout_test_content_browser_client.cc [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/content/shell/browser/layout_test/layout_test_content_browser_client.h [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/content/shell/browser/shell_content_browser_client.cc [modify] https://crrev.com/9a5ca91778f04309cb6982697ae80a2fc89fd94b/content/shell/browser/shell_content_browser_client.h
,
Mar 2 2018
,
Mar 10 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/470ff7ff522a1613d567a6aad3e66c1fb7aeb008 commit 470ff7ff522a1613d567a6aad3e66c1fb7aeb008 Author: Erik Chen <erikchen@chromium.org> Date: Sat Mar 10 01:39:24 2018 Tentative fix for LoginHandler leak. The ownership semantics for LoginHandler are very confusing. The class is self-owned, and is only destroyed when LoginHandler::ReleaseSoon() is called. But this relies on the assumption that the LoginHandlerView is shown, which isn't always the case. This CL adds tracking for whether the LoginHandlerView has been shown, and if it has never been shown but CancelAuth() is called, then LoginHandler::ReleaseSoon() will also be called. This relies on the assumption that all codes paths must either show the LoginHandlerView or else call CancelAuth(). Ideally, the whole class should be refactored to have saner ownership semantics. Bug: 814334 Change-Id: I28085ca948a5629f11e6263153938e5c87f42518 Reviewed-on: https://chromium-review.googlesource.com/957283 Reviewed-by: Mustafa Emre Acer <meacer@chromium.org> Commit-Queue: Erik Chen <erikchen@chromium.org> Cr-Commit-Position: refs/heads/master@{#542305} [modify] https://crrev.com/470ff7ff522a1613d567a6aad3e66c1fb7aeb008/chrome/browser/ui/login/login_handler.cc [modify] https://crrev.com/470ff7ff522a1613d567a6aad3e66c1fb7aeb008/chrome/browser/ui/login/login_handler.h
,
Mar 19 2018
Assigning to Erik, as he landed the most recent patch here.
,
Mar 27 2018
We're no longer getting heap dumps with this signature. Furthermore, this is a giant drop in memory usage that corresponds to the release that this fix was first included (C48). https://uma.googleplex.com/p/chrome/timeline_v2/?sid=f54df0d6f572300b0fbf74d0aef14a73 Marking this as fixed.
,
Mar 27 2018
I'm skeptical that the fix is the cause of that - auth just isn't that common (And when it is used, this shouldn't generally trigger that often, I believe). Or am I missing something?
,
Mar 27 2018
> auth just isn't that common (And when it is used, this shouldn't generally trigger that often, I believe). Or am I missing something? This was one of the most common memory leaks we observed via heap profiling in the wild, and it regularly consumed >500MB of live memory, with over 250k+ leaked instances. While I don't have a local repro, I suspect that the same logic pattern that would cause the leak to occur [e.g. immediately hitting the cancellation path] happens to align with logic that causes us to attempt to reshow the dialog, thus quickly leaking many instances.
,
Mar 27 2018
One interesting thing to note, which may be relevant to other memory UMA is that: 1. Filtering by clients with the dominant or latest version completely eliminates this drop (and %iles are reduced by a lot). This indicates that out-of-date clients are skewing this metric. https://uma.googleplex.com/p/chrome/timeline_v2/?sid=49811baf12815ba7663da8cd177cf7cb 2. This win is not noticeable on dev channel, which should have picked up the fix by now, afaict.
,
Mar 27 2018
It's a drop in the 95th percentile, not the 50th percentile, which is consistent with auth not being all that common. I'm actually surprised even 5% of folks see HTTP auth (I can't remember when I last hit it), but maybe? Nice to see such a dramatic drop!
,
Mar 27 2018
I've attached a screenshot of Windows Dev channel, 99th percentile [which is still lower in MB than 95th percentile on Canary]. The same drop is visible at the first release that included this change. https://uma.googleplex.com/p/chrome/timeline_v2/?sid=5b45047c6c4548d143435f43a84a0dab > This indicates that out-of-date clients are skewing this metric. This is expected. Users that have the latest version of Canary are restarting their browser about once a day. That gives memory leaks less time to build up. Users that keep a Canary running for multiple days will have higher memory usage. > I'm actually surprised even 5% of folks see HTTP auth Keep in mind, this isn't 5% of users, this is 5% of UMA-pings. Users who leave their devices online for multiple days in a row will get in many more UMA-pings than users who shut down Chrome after they're done using it. [Thus allowing for updates on restart].
,
Mar 27 2018
,
Mar 27 2018
Thanks Erik that makes sense. Hopefully we will see a more dramatic drop in dev as older clients start updating. Thanks for working on this!
,
Mar 27 2018
> Hopefully we will see a more dramatic drop in dev as older clients start updating. Perhaps! What I find really surprising is that Canary users [prior to this change] at 95th/99th percentiles were sitting at double the memory usage of Dev users. This suggests a rather large difference in populations being sampled. [e.g. Maybe Canary users pathologically encounter certain memory bugs like this one more frequently than Dev users]. Not sure what's going on here. Looking at all channels, Canary uses the most memory, followed by Beta. Then Dev and Stable mirror each other almost perfectly. Weird.
,
Nov 29
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by etienneb@chromium.org
, Feb 21 2018