A hung renderer process with a hidden OOPIF never shows hung renderer dialog |
||||||||
Issue descriptionA hung renderer dialog is shown when a page doesn't respond to input events in some duration of time (30 seconds?). The dialog works for OOPIFs, but only if the OOPIFs are visible and receive some input events that are not ack'd. 0-size or hidden OOPIFs might hang and never show a hung renderer dialog. (Unlike issue 810633, though, the process does go away when closing the tab.) We should look for ways to recover from a hung process when it doesn't receive input events. Maybe the timer shouldn't be input event specific, or maybe we should have timers on other events (e.g., attempts to commit a navigation in the process, which currently just hang).
,
Jun 1 2018
,
Jun 1 2018
CC'ing falken@ from issue 810633, in case we're able to find any common solution to hidden OOPIFs and ServiceWorkers.
,
Jun 1 2018
To be fair, this is can happen even with visible OOPIFs when the user doesn't interact with them. The timer only starts when an input event is targeted at the iframe. We probably want to have more types of events that can trigger the dialog, including attempts to put a new page into that process. It's especially bad in that case, because even the input event timer doesn't appear to kick in when the user interacts with an iframe before its page commits (which it won't when the process is hung).
,
Jun 4 2018
We've chatted a bit and think that it probably makes sense to start a timer when a navigation starts to commit in a process, to make sure we hear the commit in a reasonable amount of time. If not, we can put up the same hung renderer dialog, allowing the user to recover from a hung renderer when there's no input events involved (including hidden OOPIFs, OOPIFs that haven't been interacted with, and ServiceWorkers). There's other long term possibilities (e.g., general watchdog timers, trying to avoid putting a new page in a known unresponsive process, marking processes as hung in the task manager even when no dialog is shown), but this seems like the simplest and most effective. I think lukasza@ is taking a look. Thanks!
,
Jun 6 2018
WIP CL @ https://crrev.com/c/1089797 Status: - Done: - Implemented 10s timeout from ready-to-commit until commit - Manually tested (using steps from https://crbug.com/810633#c14) that a "hung renderer dialog" appears - Added a regression test (testing via WebContentsObserverOnRendererUnresponsive) - Not yet done: - The regression test shouldn't actually require waiting 10 seconds (need to either mock advancing the time, or introduce a SetCommitTimeoutForTesting; I am asking about this on chromium-dev@) - Even after killing the hung renderer, the navigation spinner keeps spinning during the manual test - I am guessing that the main frame never finishes loading.
,
Jun 7 2018
Thanks lukasza@! To follow up about issue 810633, this CL also sounds like it's the right fix for the steps in comment 14 of issue 810633, where a navigation doesn't commit when it tries to join an unresponsive ServiceWorker process.
,
Jun 8 2018
I just ran across issue 716431 which is very similar problem, though it manifests in the frame not going away when the tab is closed. In addition committing navigations as a trigger to timers, I wonder if we should also add some form of ack on deleting frames and hook it in the same hung renderer detection.
,
Jun 11 2018
Just leaving a breadcrumb for myself - the spinner keeps spinning, because while FrameTreeNode::DidStopLoading gets called for the killed subframe, the main frame's FrameTreeNode continues thinking that it is still loading.
void FrameTreeNode::DidStopLoading() {
// Set final load progress and update overall progress. This will notify
// the WebContents of the load progress change.
DidChangeLoadProgress(kLoadingProgressDone);
// Notify the WebContents.
if (!frame_tree_->IsLoading()) // <- MAIN FRAME STILL THINKS THAT IT IS LOADING HERE
navigator()->GetDelegate()->DidStopLoading();
,
Jun 20 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b commit 52b9372f38d405bfa566b9c3c4df4f2a26a0c69b Author: Lukasz Anforowicz <lukasza@chromium.org> Date: Wed Jun 20 16:11:39 2018 Refactor RWH::RestartHang...Timeout... into a generic base::Closure. There is a desire to add more kinds of hang timeouts - in addition to having an input-event-ack-timeout we also want to have navigation-commit-timeout and possibly other timeouts. This means that HungRendererDialogView::Accept (handling "Wait" button in the "hung renderer dialog") shouldn't be hardcoded to always call RenderWidgetH::RestartHangMonitorTimeoutIfNecessary, but should instead call a generic callback that will vary depending on the original source of the dialog. This CL enables future CLs (e.g. https://crrev.com/c/1089797) that need to provide different ways of restarting the hang timeout. This CL refactors RWH::RestartHangMonitorTimeoutIfNecessary into a generic base::RepeatingClosure passed as an argument of WebContentsDelegate::RendererUnresponsive method. I've tested this CL manually (on Linux) by: 1. Navigating 1st tab to https://peteblois.github.io/tmp/iframe_scrolling2/ 2. Opening a second tab, navigating it to https://peteblois.github.io/tmp/iframe_kill/ and switching back to the 1st tab. 3. Interacting with the OOPIF (for example double-clicking to select some of the words) and waiting 30 seconds for the hung renderer dialog to appear. 4. Clicking "Wait" button in the dialog 5. Verifying that the dialog reappears after 30 seconds Bug: 848821 Change-Id: I2d133483832ca3d640a74c9f23fc9677f478721d Reviewed-on: https://chromium-review.googlesource.com/1096362 Commit-Queue: Łukasz Anforowicz <lukasza@chromium.org> Reviewed-by: Trent Apted <tapted@chromium.org> Reviewed-by: Lucas Gadani <lfg@chromium.org> Reviewed-by: Bo <boliu@chromium.org> Reviewed-by: Alex Moshchuk <alexmos@chromium.org> Cr-Commit-Position: refs/heads/master@{#568861} [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/chrome/browser/ui/browser.cc [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/chrome/browser/ui/browser.h [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/chrome/browser/ui/cocoa/hung_renderer_controller.h [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/chrome/browser/ui/cocoa/hung_renderer_controller.mm [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/chrome/browser/ui/cocoa/tab_dialogs_cocoa.h [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/chrome/browser/ui/cocoa/tab_dialogs_cocoa.mm [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/chrome/browser/ui/hung_renderer/hung_renderer_interactive_uitest.cc [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/chrome/browser/ui/tab_dialogs.h [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/chrome/browser/ui/views/hung_renderer_view.cc [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/chrome/browser/ui/views/hung_renderer_view.h [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/chrome/browser/ui/views/hung_renderer_view_browsertest.cc [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/chrome/browser/ui/views/tab_dialogs_views.cc [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/chrome/browser/ui/views/tab_dialogs_views.h [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/components/web_contents_delegate_android/web_contents_delegate_android.cc [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/components/web_contents_delegate_android/web_contents_delegate_android.h [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/content/browser/devtools/devtools_manager_unittest.cc [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/content/browser/renderer_host/render_widget_host_delegate.h [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/content/browser/renderer_host/render_widget_host_impl.cc [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/content/browser/renderer_host/render_widget_host_impl.h [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/content/browser/renderer_host/render_widget_host_unittest.cc [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/content/browser/web_contents/web_contents_impl.cc [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/content/browser/web_contents/web_contents_impl.h [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/content/public/browser/render_widget_host.h [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/content/public/browser/web_contents_delegate.h [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/content/public/test/browser_test_utils.cc [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/content/shell/browser/shell.cc [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/content/shell/browser/shell.h [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/extensions/browser/guest_view/web_view/web_view_guest.cc [modify] https://crrev.com/52b9372f38d405bfa566b9c3c4df4f2a26a0c69b/extensions/browser/guest_view/web_view/web_view_guest.h
,
Aug 6
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5527323788b6dc575acbd601778350ca0b221699 commit 5527323788b6dc575acbd601778350ca0b221699 Author: Lukasz Anforowicz <lukasza@chromium.org> Date: Mon Aug 06 19:23:29 2018 Show "hung renderer dialog" after a navigation commit timeout (10s). In addition to adding a content_browsertest, I've also manually tested the changes using steps from https://crbug.com/810633#c14. Note that after this CL a "hang renderer dialog" will appear if a navigation cannot commit because it tries to reuse a hung renderer. This is definitely an improvement. But also note that after using the dialog to kill the renderer, the navigation spinner will keep spinning - there is additional work needed to fix this. Bug: 848821 Change-Id: I9d6801dbd8b2532f2427509c938db5556a4fb133 Reviewed-on: https://chromium-review.googlesource.com/1089797 Commit-Queue: Łukasz Anforowicz <lukasza@chromium.org> Reviewed-by: Avi Drissman <avi@chromium.org> Reviewed-by: Charlie Reis <creis@chromium.org> Cr-Commit-Position: refs/heads/master@{#580945} [modify] https://crrev.com/5527323788b6dc575acbd601778350ca0b221699/chrome/browser/ui/hung_renderer/hung_renderer_core.cc [modify] https://crrev.com/5527323788b6dc575acbd601778350ca0b221699/content/browser/frame_host/navigation_handle_impl.cc [modify] https://crrev.com/5527323788b6dc575acbd601778350ca0b221699/content/browser/frame_host/navigation_handle_impl.h [modify] https://crrev.com/5527323788b6dc575acbd601778350ca0b221699/content/browser/renderer_host/render_widget_host_impl.cc [modify] https://crrev.com/5527323788b6dc575acbd601778350ca0b221699/content/browser/renderer_host/render_widget_host_impl.h [modify] https://crrev.com/5527323788b6dc575acbd601778350ca0b221699/content/browser/site_per_process_browsertest.cc
,
Aug 8
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3d62b3b2040506828111f59bad9f03a42c01018a commit 3d62b3b2040506828111f59bad9f03a42c01018a Author: Łukasz Anforowicz <lukasza@chromium.org> Date: Wed Aug 08 00:40:15 2018 Revert "Show "hung renderer dialog" after a navigation commit timeout (10s)." This reverts commit 5527323788b6dc575acbd601778350ca0b221699. Reason for revert: Caused https://crbug.com/871704 and https://crbug.com/871730 Original change's description: > Show "hung renderer dialog" after a navigation commit timeout (10s). > > In addition to adding a content_browsertest, I've also manually tested > the changes using steps from https://crbug.com/810633#c14. > > Note that after this CL a "hang renderer dialog" will appear if a > navigation cannot commit because it tries to reuse a hung renderer. > This is definitely an improvement. But also note that after using the > dialog to kill the renderer, the navigation spinner will keep spinning - > there is additional work needed to fix this. > > Bug: 848821 > Change-Id: I9d6801dbd8b2532f2427509c938db5556a4fb133 > Reviewed-on: https://chromium-review.googlesource.com/1089797 > Commit-Queue: Łukasz Anforowicz <lukasza@chromium.org> > Reviewed-by: Avi Drissman <avi@chromium.org> > Reviewed-by: Charlie Reis <creis@chromium.org> > Cr-Commit-Position: refs/heads/master@{#580945} TBR=avi@chromium.org,creis@chromium.org,lukasza@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: 848821, 871730, 871704 Change-Id: I2dee152ba283fbba332437d5c470205604ea4691 Reviewed-on: https://chromium-review.googlesource.com/1165902 Commit-Queue: Łukasz Anforowicz <lukasza@chromium.org> Reviewed-by: Łukasz Anforowicz <lukasza@chromium.org> Cr-Commit-Position: refs/heads/master@{#581414} [modify] https://crrev.com/3d62b3b2040506828111f59bad9f03a42c01018a/chrome/browser/ui/hung_renderer/hung_renderer_core.cc [modify] https://crrev.com/3d62b3b2040506828111f59bad9f03a42c01018a/content/browser/frame_host/navigation_handle_impl.cc [modify] https://crrev.com/3d62b3b2040506828111f59bad9f03a42c01018a/content/browser/frame_host/navigation_handle_impl.h [modify] https://crrev.com/3d62b3b2040506828111f59bad9f03a42c01018a/content/browser/renderer_host/render_widget_host_impl.cc [modify] https://crrev.com/3d62b3b2040506828111f59bad9f03a42c01018a/content/browser/renderer_host/render_widget_host_impl.h [modify] https://crrev.com/3d62b3b2040506828111f59bad9f03a42c01018a/content/browser/site_per_process_browsertest.cc
,
Aug 8
I have a WIP CL with (for now) just a browser test that shows the navigation spinner problem: https://crrev.com/c/1164466. This CL will have to wait until r580945 can be relanded (after fixing the YouTube problem raised in issue 871704 ).
,
Aug 9
,
Aug 10
Status update: - On its way to CQ later today: https://crrev.com/c/1169865: Avoid leaking NavigationRequest when committing an error page. - Ready to reland after Dev cut next week: https://crrev.com/c/1171381: [reland] Show "hung renderer dialog" after a navigation commit timeout. - WIP CL for fixing the spinner: https://crrev.com/c/1164466: Stop the navigation spinner if pending navigation's target process dies.
,
Aug 15
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5a92f05c6270f88307116012a1f7e3488bfd67e0 commit 5a92f05c6270f88307116012a1f7e3488bfd67e0 Author: Lukasz Anforowicz <lukasza@chromium.org> Date: Wed Aug 15 20:38:47 2018 [reland] Show "hung renderer dialog" after a navigation commit timeout. This is a reland of r580945 that is possible, because https://crbug.com/872803 is fixed now. Original CL description follows below... The newly introduced commit timeout defaults to 10 seconds. In addition to adding a content_browsertest, I've also manually tested the changes using steps from https://crbug.com/810633#c14. Note that after this CL a "hang renderer dialog" will appear if a navigation cannot commit because it tries to reuse a hung renderer. This is definitely an improvement. But also note that after using the dialog to kill the renderer, the navigation spinner will keep spinning - there is additional work needed to fix this (see https://crrev.com/c/1164466). Bug: 848821 Change-Id: I85ec7042e2dfe782e7f925dcede0ed81be633709 Tbr: Avi Drissman <avi@chromium.org> Tbr: Charlie Reis <creis@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/1171381 Commit-Queue: Łukasz Anforowicz <lukasza@chromium.org> Reviewed-by: Avi Drissman <avi@chromium.org> Reviewed-by: Łukasz Anforowicz <lukasza@chromium.org> Cr-Commit-Position: refs/heads/master@{#583378} [modify] https://crrev.com/5a92f05c6270f88307116012a1f7e3488bfd67e0/chrome/browser/ui/hung_renderer/hung_renderer_core.cc [modify] https://crrev.com/5a92f05c6270f88307116012a1f7e3488bfd67e0/content/browser/frame_host/navigation_handle_impl.cc [modify] https://crrev.com/5a92f05c6270f88307116012a1f7e3488bfd67e0/content/browser/frame_host/navigation_handle_impl.h [modify] https://crrev.com/5a92f05c6270f88307116012a1f7e3488bfd67e0/content/browser/renderer_host/render_widget_host_impl.cc [modify] https://crrev.com/5a92f05c6270f88307116012a1f7e3488bfd67e0/content/browser/renderer_host/render_widget_host_impl.h [modify] https://crrev.com/5a92f05c6270f88307116012a1f7e3488bfd67e0/content/browser/site_per_process_browsertest.cc
,
Aug 24
Status update: 1. The new commit timeout resulted in a considerable (~3x) spike in CrashExitCodes.Renderer/RESULT_CODE_HUNG. 1.1. nasko@ and me discussed this and for M70 we think it might be prudent to conservatively increase the timeout from the current 10s to 30s (i.e. matching the input event ack timeout that used to be the only source of the hang renderer dialog). 1.2. In M71 we should change the bucketing for Navigation.ReadyToCommitUntilCommit (currently the overflow bucket starts at 10s) and based on the newly gathered data we can tweak the commit timeout further. 2. I am still working in the background on making sure the loading spinner stops spinning after the hung renderer is killed (this doesn't work for subframes). WIP CL @ https://crrev.com/c/1164466 helps with the spinner, but apparently makes quite a bit of other tests red... :-(
,
Aug 24
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3f72c41f07be3fb6ea2947b0235777f169fee20e commit 3f72c41f07be3fb6ea2947b0235777f169fee20e Author: Lukasz Anforowicz <lukasza@chromium.org> Date: Fri Aug 24 23:34:25 2018 Change commit timeout from 10s to 30s. Bug: 848821 Change-Id: I5e0dd225cdf7163e159286352a096c9cffe1d32b Reviewed-on: https://chromium-review.googlesource.com/1188965 Reviewed-by: Nasko Oskov <nasko@chromium.org> Commit-Queue: Łukasz Anforowicz <lukasza@chromium.org> Cr-Commit-Position: refs/heads/master@{#586053} [modify] https://crrev.com/3f72c41f07be3fb6ea2947b0235777f169fee20e/content/browser/frame_host/navigation_handle_impl.cc
,
Sep 5
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1b020ae4a4dc50495aeffeed80108a1211a5a95c commit 1b020ae4a4dc50495aeffeed80108a1211a5a95c Author: Lukasz Anforowicz <lukasza@chromium.org> Date: Wed Sep 05 22:20:09 2018 Change UMA overflow bucket from 10s to 180s for Navigation histograms. The overflow bucket for Navigation.ReadyToCommitUntilCommit accounted for around half a percent of all data - this seems like too much. Additionally, some Navigation.* metrics (like Navigation.StartToCommit) used a 3 minutes cut-off, while others (like Navigation.ReadyToCommitUntilCommit and Navigation.TimeToReadyToCommit) used a 10s cut-off - this seems needlessly inconsistent (given that both TimeToReadyToCommit and StartToCommit will likely spend most of their time in network-related activities - e.g. waiting for responses from remote hosts). Because of the above, this CL consistently sets the overflow bucket for Navigation histograms to 180s (which is based on UMA_HISTOGRAM_MEDIUM_TIMES). Note that this CL retains Navigation-specific underflow bucket which is set at 1ms (like UMA_HISTOGRAM_TIMES or UMA_HISTOGRAM_LONG_TIMES and unlike UMA_HISTOGRAM_MEDIUM_TIMES). Bug: 848821 Change-Id: I4765d8c7dd6da076fb01e209cd02c524b5da9dbf Reviewed-on: https://chromium-review.googlesource.com/1188977 Commit-Queue: Łukasz Anforowicz <lukasza@chromium.org> Reviewed-by: Nasko Oskov <nasko@chromium.org> Cr-Commit-Position: refs/heads/master@{#589029} [modify] https://crrev.com/1b020ae4a4dc50495aeffeed80108a1211a5a95c/content/browser/frame_host/navigation_handle_impl.cc
,
Sep 10
lukasza@: please take a look at all renderer hangs that might be caused by r583378 (e.g., summary:"[Renderer hang]" opened>2018-08-22) and comment on them. I believe that two I've been investigating have been caused by the CL with the short timeout. It'd be very helpful if you could comment on or even close out any that you think are related. It'll save folks like me a whole bunch of time. Thanks.
,
Jan 4
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a44a4625465dfe0706fea085698350e461ec9786 commit a44a4625465dfe0706fea085698350e461ec9786 Author: Lukasz Anforowicz <lukasza@chromium.org> Date: Fri Jan 04 23:06:56 2019 Re-bucketed navigation-timing histograms need to be renamed. r589029 changed bucketing of some navigation-timing histograms, and broke data continuity. To fix things going forward, the re-bucketed histograms need to be renamed. The old CL changed the overflow-bucket boundary of the following histograms to 3 minutes: 1. Navigation.TimeToReadyToCommit.* 2. Navigation.StartToCommit.* 3. Navigation.ReadyToCommitUntilCommit.* Histogram #2 above wasn't affected (the old overflow-bucket boundary was already at 3 minutes). Histograms #1 and #3 need to have their names changed - the current CL is adding a "2" suffix: - Navigation.TimeToReadyToCommit2.* - Navigation.ReadyToCommitUntilCommit2.* Bug: 848821 Change-Id: I6029d88df863ba3ed3edd2327c46f4ec20e6ce6d Reviewed-on: https://chromium-review.googlesource.com/c/1394783 Reviewed-by: Alexei Svitkine <asvitkine@chromium.org> Reviewed-by: Nasko Oskov <nasko@chromium.org> Commit-Queue: Łukasz Anforowicz <lukasza@chromium.org> Cr-Commit-Position: refs/heads/master@{#620098} [modify] https://crrev.com/a44a4625465dfe0706fea085698350e461ec9786/content/browser/frame_host/navigation_handle_impl.cc [modify] https://crrev.com/a44a4625465dfe0706fea085698350e461ec9786/content/browser/frame_host/navigation_handle_impl_browsertest.cc [modify] https://crrev.com/a44a4625465dfe0706fea085698350e461ec9786/tools/metrics/histograms/histograms.xml |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by creis@chromium.org
, Jun 1 2018Components: UI>Browser>Navigation
Labels: OS-Chrome OS-Linux OS-Mac OS-Windows
Summary: A hung renderer process with a hidden OOPIF never shows hung renderer dialog (was: An unusable renderer process with a hung OOPIF can live forever)