New issue
Advanced search Search tips
Starred by 8 users
Status: Fixed
Owner:
Closed: Feb 7
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Chrome , Mac
Pri: 1
Type: Bug



Sign in to add a comment
Browser hang with execessive download attempts
Project Member Reported by jam@chromium.org, Feb 6 Back to list
Similar to  bug 394296 , the browser is hanging when the attached repro is run.


 
repro.html
574 bytes View Download
Cc: palmer@chromium.org roc...@chromium.org gab@chromium.org dcheng@chromium.org
With the last bug, the message loop was just not pumping UI tasks although it was free most of the time. Investigating now whether this is the case here as well. Also +cc a bunch of people.
Cc: robliao@chromium.org
Components: UI>Browser>Downloads
I would have thought that we'd at least have the download request limiter trigger for these.
https://cs.chromium.org/chromium/src/chrome/browser/download/download_request_limiter.h?rcl=5157f928d980f7913d23144c597438d4c4f65699&l=31

// DownloadRequestLimiter is responsible for determining whether a download
// should be allowed or not. It is designed to keep pages from downloading
// multiple files without user interaction.
@csharrison: it doesn't appear to be doing any downloading. The main thread is just swamped.
This smells like an IPC flood.
The UI thread message queue is full of messages (like 1000s). 

On the UI thread
0:000> ?? this->triage_tasks_.queue_.c
class base::circular_deque<base::PendingTask>
   +0x000 buffer_          : base::internal::VectorBuffer<base::PendingTask>
   +0x008 begin_           : 0x49e
   +0x00c end_             : 0x1c48  (6058 tasks)
   +0x010 generation_      : 0x27ed

Waiting to be sent to the UI thread
0:000> ?? this->incoming_queue_.c
class base::circular_deque<base::PendingTask>
   +0x000 buffer_          : base::internal::VectorBuffer<base::PendingTask>
   +0x008 begin_           : 0
   +0x00c end_             : 0xa1f   (2591 tasks)

Also, we're doing re2 on the UI thread, which also seems bad.

A sampling of the stack:
0:000> kn200
 # ChildEBP RetAddr  
00 0104bab8 77a8ff15 ntdll!RtlpEnterCriticalSectionContended+0x72
01 0104bac4 552b38ad ntdll!RtlEnterCriticalSection+0x45
02 0104bad0 552fce59 MSVCP140D!_Mtxlock+0xd
03 0104bae0 0f986184 MSVCP140D!std::_Lockit::_Lockit+0x39
04 0104baf4 11bea5ef chrome!std::_Iterator_base12::~_Iterator_base12+0x14
05 0104bb00 11bea58f chrome!std::_Tree_unchecked_const_iterator<std::_Tree_val<std::_Tree_simple_types<re2::RuneRange> >,std::_Iterator_base12>::~_Tree_unchecked_const_iterator<std::_Tree_val<std::_Tree_simple_types<re2::RuneRange> >,std::_Iterator_base12>+0xf
06 0104bb0c 11bebf23 chrome!std::_Tree_const_iterator<std::_Tree_val<std::_Tree_simple_types<re2::RuneRange> > >::~_Tree_const_iterator<std::_Tree_val<std::_Tree_simple_types<re2::RuneRange> > >+0xf
07 0104bc00 11bfa115 chrome!re2::CharClassBuilder::AddRange+0x1b3
08 0104bc40 11bf740a chrome!re2::Regexp::ParseState::PushLiteral+0xe5
09 0104bdb8 11bc051f chrome!re2::Regexp::Parse+0x22a
0a 0104bfe0 11bbebf0 chrome!re2::RE2::Init+0x16f
0b 0104c014 13c180eb chrome!re2::RE2::RE2+0x90
0c 0104c50c 13c17d09 chrome!FlashDownloadInterception::ShouldStopFlashDownloadAction+0x1eb
0d 0104c688 113e60c9 chrome!FlashDownloadInterception::MaybeCreateThrottleFor+0x209
0e 0104c728 05b41440 chrome!ChromeContentBrowserClient::CreateThrottlesForNavigation+0x59
0f 0104c740 050a4f74 content!content::WebContentsImpl::CreateThrottlesForNavigation+0x30
10 0104c7b4 050a6dde content!content::NavigationHandleImpl::RegisterNavigationThrottles+0x44
11 0104c85c 050acf73 content!content::NavigationHandleImpl::WillStartRequest+0x13e
12 0104cb48 050b7271 content!content::NavigationRequest::BeginNavigation+0x2c3
13 0104cc3c 050fd6e1 content!content::NavigatorImpl::OnBeginNavigation+0x241
14 0104cf04 03e14bf7 content!content::RenderFrameHostImpl::BeginNavigation+0x3d1
15 0104dd14 050fc38a content!content::mojom::FrameHostStubDispatch::Accept+0x8b7
16 0104dd28 51d6bea5 content!content::mojom::FrameHostStub<mojo::RawPtrImplRefTraits<content::mojom::FrameHost> >::Accept+0x3a
17 0104df1c 51d6aba6 bindings!mojo::InterfaceEndpointClient::HandleValidatedMessage+0x435
18 0104df2c 51d62506 bindings!mojo::InterfaceEndpointClient::HandleIncomingMessageThunk::Accept+0x16
19 0104e008 51d6ba43 bindings!mojo::FilterChain::Accept+0xe6
1a 0104e0d0 51c2e5eb bindings!mojo::InterfaceEndpointClient::HandleIncomingMessage+0x93
1b 0104e3c4 51c25f4e ipc!IPC::`anonymous namespace'::ChannelAssociatedGroupController::AcceptOnProxyThread+0x26b
1c 0104e408 51c263a5 ipc!base::internal::FunctorTraits<void (__thiscall IPC::`anonymous namespace'::ChannelAssociatedGroupController::*)(mojo::Message),void>::Invoke<scoped_refptr<IPC::`anonymous namespace'::ChannelAssociatedGroupController> const &,mojo::Message>+0x2e
1d 0104e420 51c2674a ipc!base::internal::InvokeHelper<0,void>::MakeItSo<void (__thiscall IPC::`anonymous namespace'::ChannelAssociatedGroupController::*const &)(mojo::Message),scoped_refptr<IPC::`anonymous namespace'::ChannelAssociatedGroupController> const &,mojo::Message>+0x35
1e 0104e46c 51c31fd4 ipc!base::internal::Invoker<base::internal::BindState<void (__thiscall IPC::`anonymous namespace'::ChannelAssociatedGroupController::*)(mojo::Message),scoped_refptr<IPC::`anonymous namespace'::ChannelAssociatedGroupController>,base::internal::PassedWrapper<mojo::Message> >,void __cdecl(void)>::RunImpl<void (__thiscall IPC::`anonymous namespace'::ChannelAssociatedGroupController::*const &)(mojo::Message),std::tuple<scoped_refptr<IPC::`anonymous namespace'::ChannelAssociatedGroupController>,base::internal::PassedWrapper<mojo::Message> > const &,0,1>+0x5a
1f 0104e488 555200c5 ipc!base::internal::Invoker<base::internal::BindState<void (__thiscall IPC::`anonymous namespace'::ChannelAssociatedGroupController::*)(mojo::Message),scoped_refptr<IPC::`anonymous namespace'::ChannelAssociatedGroupController>,base::internal::PassedWrapper<mojo::Message> >,void __cdecl(void)>::Run+0x24
20 0104e4a0 55576293 base!base::OnceCallback<void __cdecl(void)>::Run+0x35
21 0104e57c 555f2dd8 base!base::debug::TaskAnnotator::RunTask+0x1e3
22 0104e648 555fcaf0 base!base::internal::IncomingTaskQueue::RunTask+0x88
23 0104e798 555fabf8 base!base::MessageLoop::RunTask+0x200
24 0104e7b4 555fb243 base!base::MessageLoop::DeferOrRunPendingTask+0x28
25 0104e868 55601cc3 base!base::MessageLoop::DoWork+0x163
26 0104e87c 55602c1b base!base::MessagePumpForUI::DoRunLoop+0x43
27 0104e8b0 555fc7ca base!base::MessagePumpWin::Run+0x7b
28 0104ea44 556af1b4 base!base::MessageLoop::Run+0x17a
29 0104ec80 114a71bf base!base::RunLoop::Run+0x164

And as expected in the land of the renderer, lots of these:
6:048> kn20
 # ChildEBP RetAddr  
00 010fac28 04c46186 content!content::mojom::FrameHostProxy::BeginNavigation+0x13
01 010fb808 04c539a1 content!content::RenderFrameImpl::BeginNavigation+0x896
02 010fbac0 185a74a2 content!content::RenderFrameImpl::DidStartProvisionalLoad+0x281
03 010fbaec 1900c76c blink_core!blink::LocalFrameClientImpl::DispatchDidStartProvisionalLoad+0x82
04 010fbea0 190090e4 blink_core!blink::FrameLoader::StartLoad+0x4ac
05 010fc278 1887547a blink_core!blink::FrameLoader::Load+0x634
06 010fc6ac 18874f3f blink_core!blink::HTMLAnchorElement::HandleClick+0x39a
07 010fc6bc 183c28c2 blink_core!blink::HTMLAnchorElement::DefaultEventHandler+0x8f
08 010fc870 183c1ee4 blink_core!blink::EventDispatcher::DispatchEventPostProcess+0x2e2
09 010fcb30 183c3002 blink_core!blink::EventDispatcher::Dispatch+0x4e4
0a 010fcbcc 1832edbc blink_core!blink::EventDispatcher::DispatchSimulatedClick+0x1e2
0b 010fcbe8 18899dd5 blink_core!blink::Node::DispatchSimulatedClick+0x1c
0c 010fcc00 1968b8b8 blink_core!blink::HTMLElement::click+0x15
0d 010fcc10 1968b922 blink_core!blink::HTMLElementV8Internal::clickMethod+0x28
0e 010fcc5c 0bc062d3 blink_core!blink::V8HTMLElement::clickMethodCallback+0x52
0f 010fccb4 0bd1b50a v8!v8::internal::FunctionCallbackArguments::Call+0xd3
10 010fcd1c 0bd1ce58 v8!v8::internal::`anonymous namespace'::HandleApiCallHelper<0>+0x2ea
11 010fcd68 0bd1c88e v8!v8::internal::Builtin_Impl_HandleApiCall+0x198
12 010fcdc8 238062aa v8!v8::internal::Builtin_HandleApiCall+0x16e

Summary: Browser hang with execessive download attempts (was: Browser hang with execessive msSaveOrOpenBlob)
Is getting the regex compilation off the main thread a useful thing to do? That looks like a five line patch to me.
Components: Internals>SequenceManager
Here's a flame graph of the main thread while this is happening.

All the time is spent in various phases of RenderFrameHostImpl::BeginNavigation()

26% of the time is indeed spent in RE2, would probably make navigation that much faster in the general case if we can move it off the main thread.

But even if we fixed that, it wouldn't address this very issue which is that MessageLoop currently has a FIFO policy for application tasks.

As Rob said, long term we want to solve this (and other priority issues on the UI and IO threads) through sequence funneling by bringing the Blink scheduler to //base (alongside task_scheduler) and allowing multiple sequences to share a single underlying sequence (thread). Tracked in issue 793069 and issue 783309.

However, this would merely make work from this site accumulate rather than cause a DOS. It would still bloat memory and leave the CPU running at 100% though (maybe that's okay, at least the user can kill the bad site if they know how to find which one is at fault).

Ideally in this case we would rate-limit abusive renderers. But maybe this breaks valid use cases..?
main_thread_during_download_DOS.svg
2.3 MB Download
Also a significant amount of time (7%) is spent in chrome.dll!ToolbarActionsBar::Update(). Feels this should be a no-op more often than it appears to be?
Cc: tommycli@chromium.org
Owner: jam@chromium.org
Status: Started
Thanks all for the investigation.

It looks like the re2 code is mostly to blame. If I disable it, then the UI, while laggy, is still usable. I added an early optimization to that code to check for adobe.com, as all downloads are there and the problem seems to go away.

+Tommy: do we still need flash_download_interception?
Also interestingly enough, with skipping re2 the fix for 394296 doesn't seem to be needed.
For future reference (even just to me :) ), I finally got a sampling profiler to work. Using Visual Studio 2017:
1) Analyze menu -> Performance Profiler
2) Choose Target -> Performance Wizard -> Start
3) CPU sampling
4) set the path and command line args


The Call Tree view works pretty well.
Labels: ReleaseBlock-Stable M-65
 Issue 810031  has been merged into this issue.
Cc: pbomm...@chromium.org
Cc: awhalley@chromium.org
+awhalley@ (Security TPM) as FYI
Pls apply applicable OSs. Thank you.
The issue seems to affect Windows OS the most, although the user experience on Mac is also not ideal. As reported above, I did not not see DownloadRequestLimiter trigger on Mac OS. See screenshots attached.
downloads_failed.png
47.6 KB View Download
Mac.png
150 KB View Download
Labels: OS-Chrome OS-Linux OS-Mac OS-Windows
Project Member Comment 21 by bugdroid1@chromium.org, Feb 7
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/05ae570847c6db85548ed1cbda3ea376a752dc27

commit 05ae570847c6db85548ed1cbda3ea376a752dc27
Author: John Abd-El-Malek <jam@chromium.org>
Date: Wed Feb 07 18:04:41 2018

Optimize the Flash download check.

On every navigation we're calling RE2 to check if the URL is a Flash download page. Since the check is using RE2, it takes a lot of CPU (~35% of browser process). Add an early optimization to skip it if the url isn't on adobe.com or macromedia.com.

Bug:  809775 
Change-Id: I78c39b7075cadc077dc39db408adda084e9a39fd
Reviewed-on: https://chromium-review.googlesource.com/906829
Reviewed-by: Tommy Li <tommycli@chromium.org>
Commit-Queue: John Abd-El-Malek <jam@chromium.org>
Cr-Commit-Position: refs/heads/master@{#535059}
[modify] https://crrev.com/05ae570847c6db85548ed1cbda3ea376a752dc27/chrome/browser/plugins/flash_download_interception.cc

Labels: Merge-Request-65
Status: Fixed
Curious: Shouldn't the target_url_str check happen before we do the work to rewrite the source URL? Is there a reason to look in the composed url string instead of the gurl's host()?
@elwarence: yeah you're right that can be done. That call is basically free compared to re2, so in practice it won't make a difference.
I'm feeling very unsettled here.

Yes, removing the RE2 compilation makes the browser not entirely jam up. But we still have an issue where untrusted render processes have the ability to jam up the browser process. This is the second time it's come up and the second workaround we've done.

Where are we re a deeper fix in scheduling?
The deeper fix here is to allow sequence funneling. A single frame shouldn't be allowed to monopolize the queue. At the same time, if there's nothing else going on, maybe that's okay.

The idea here is that the UI MessageLoop thread would establish sources for messages and those sources are internally sequenced. However, the UI MessageLoop is free to choose different sources as it processes tasks.

UI MessageLoop <---- Internal Browser Tasks
               <---- Frame 1 Tasks
               <---- Frame 2 Tasks
               <---- etc.

The scheme above is one arrangement of the system, but we're free to rearrange task sources as we see fit.

In this world, each individual source is processed in order, but there is no ordering guarantee between sources. That allows the MessageLoop to choose expedite things like the internal browser tasks. If a single frame (like in this case) is flooding the queue, it only floods its own queue and other tasks can still make forward progress ahead of it.

The Blink Scheduler + Task Scheduler merge along with the MessageLoop redirection to the Task Scheduler are both projects that should enable this scenario.
I don't see how that solves the overwhelm problem, though, and now that I think about it I'm not sure scheduling is the best place to deal with it. Even with the best scheduling in the world, a render process, or a handful of them, can overwhelm the browser process and take it out.

We say "render processes are not to be trusted", and "validate all the data coming from them", yet if they spew 1000 qps of IPCs at the browser process we're all like "sure thing m8".

The browser process needs some way of limiting its exposure to bad renderers. Why is backpressure something we can't do? We can find a value for a channel's "number of messages in flight" knob that works well, but at some point, if a render process gets out of line, we have to find a way that it and only it is affected.
@Avi: this discussion has come up a number of times before, so perhaps we should have a FAQ. While it would be great if we can have some automatic way to stop a lot of IPCs, implementing this would be very difficult. The heuristic would have to handle both legitimate scenarios (i.e. a game like doom in the browser that's using high CPU and sending a lot of IPCs), and would have to deal with high end devices (i.e. z840) and low-end (i.e. Android One).
I'm actually OK with the scheduling approach as a long-term fix for this. I think everyone agrees that a malicious renderer (or multiple malicious renderers) shouldn't be able to jam up the browser process and make it impossible to notify and kill those bad processes. The scheduling approach ensures that the browser can make forward progress on *other* tasks, like notifying the user that a renderer is using a lot of CPU and whether or not they want to kill it. Ideally, once the process is killed, we'd be able to drain the sequences for that renderer without dispatching the rest of the pending tasks.

As a shorter-term fix, I think it is worth exploring whether or not Mojo can track the number of in-flight messages on a per-process basis and simply avoid processing additional messages off the wire if there are too many in flight. But I'm not sure how complex that would be to implement.

(There's an email thread for this, I'd suggest we move additional discussions there to keep things somewhat centralized)
Cc: sc00335...@techmahindra.com
Labels: Needs-Feedback
Tested the issue on 66.0.3343.0 using Windows 10, Ubuntu 14.04 and Mac 10.12.6 below are the observations.

In Ubuntu 14.04 and Windows 10 observed successful download of multiple file without any browser hang. -- But delay is seen to get multiple download popup.

But in Mac 10.12.6 didn't observe any file downloads and page unresponsive dialog is seen. Attaching screencast for reference.

@jam: Please check the screencast ,let us know the expected behavior and please confirm the fix.
809775.mp4
1.0 MB View Download
Labels: -Needs-Feedback
@sc00335628: this is fine, the important thing is that the tab can be closed.
Project Member Comment 32 by sheriffbot@chromium.org, Feb 8
Labels: -Merge-Request-65 Hotlist-Merge-Approved Merge-Approved-65
Your change meets the bar and is auto-approved for M65. Please go ahead and merge the CL to branch 3325 manually. Please contact milestone owner if you have questions.
Owners: cmasso@(Android), cmasso@(iOS), bhthompson@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member Comment 33 by bugdroid1@chromium.org, Feb 8
Labels: -merge-approved-65 merge-merged-3325
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1053f1db4260cb147e5867ccd74492853ad4f7b3

commit 1053f1db4260cb147e5867ccd74492853ad4f7b3
Author: John Abd-El-Malek <jam@chromium.org>
Date: Thu Feb 08 19:41:44 2018

Optimize the Flash download check.

On every navigation we're calling RE2 to check if the URL is a Flash download page. Since the check is using RE2, it takes a lot of CPU (~35% of browser process). Add an early optimization to skip it if the url isn't on adobe.com or macromedia.com.

Bug:  809775 
Change-Id: I78c39b7075cadc077dc39db408adda084e9a39fd
Reviewed-on: https://chromium-review.googlesource.com/906829
Reviewed-by: Tommy Li <tommycli@chromium.org>
Commit-Queue: John Abd-El-Malek <jam@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#535059}(cherry picked from commit 05ae570847c6db85548ed1cbda3ea376a752dc27)
Reviewed-on: https://chromium-review.googlesource.com/909351
Reviewed-by: John Abd-El-Malek <jam@chromium.org>
Cr-Commit-Position: refs/branch-heads/3325@{#385}
Cr-Branched-From: bc084a8b5afa3744a74927344e304c02ae54189f-refs/heads/master@{#530369}
[modify] https://crrev.com/1053f1db4260cb147e5867ccd74492853ad4f7b3/chrome/browser/plugins/flash_download_interception.cc

If the cost is in compiling the regex, rather than evaluating it, it might make sense to stuff the regex into a LazyInstance<RE2>. (RE2::PartialMatch actually take a const RE2&, not a string. There's just an implicit RE2 constructor being called there.)
Cc: -sc00335...@techmahindra.com sindhu.chelamcherla@chromium.org
Labels: Needs-Feedback
Tested the issue on 65.0.3325.73 using Windows 10, Ubuntu 14.04 and Mac 10.12.6 below are the observations.

In Ubuntu 14.04 and Mac 10.12.6 observed successful download of multiple file without any browser hang. As per comment#31 able to close tab successfully.

But in Windows 10 didn't observe any file downloads and unable to close tab as well. Waited for 2 minutes. Attaching screencast for reference.

@jam: Please check the screencast and help in confirming the fix.
809775 in win.mp4
1.4 MB View Download
809775 in Mac.mp4
1.9 MB View Download
Unable to reproduce the issue on Chrome 65.0.3325.70/CrOS 10323.32.0 - kip.
While running the repro.html, I'm able to open/switch/close the tabs without any browser hang.  
Sign in to add a comment