New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 810633 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Chrome , Mac
Pri: 3
Type: Bug

Blocked on:
issue 716609



Sign in to add a comment

Closing tabs does not kill render processes with service workers, leaving them spinning for 5 minutes

Project Member Reported by blois@google.com, Feb 9 2018

Issue description

UserAgent: Mozilla/5.0 (X11; CrOS x86_64 10032.86.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.140 Safari/537.36

Steps to reproduce the problem:
1. On Linux, navigate to https://peteblois.github.io/tmp/iframe_kill/
2. Open Chrome's task manager, note the process ID of the task using 100% of the CPU
3. Close the tab
4. Observe that the task no longer appears in task manager.
5. Use 'top' or ps and observe that the task is still running.

What is the expected behavior?
When the page is closed, the task is either terminated or continues to appear in Chrome's task manager.

What went wrong?
The process appears to run for at least a few minutes until it's terminated. During this time, it won't appear in Task Manager.

Did this work before? N/A 

Chrome version: 63.0.3239.140  Channel: stable
OS Version: 
Flash Version: 

This could be related to site isolation.
 

Comment 1 by nick@chromium.org, Feb 9 2018

Components: UI>TaskManager
Owner: nick@chromium.org
This is something that "--task-manager-show-extra-renderers" would probably help with: https://cs.chromium.org/chromium/src/chrome/common/chrome_switches.cc?type=cs&q=task-manager-show-extra-renderers&sq=package:chromium&l=689

I can try to revive the CLs necessary to enable that mode by default (it introduces some ordering chaos in its current state).

Comment 2 by nick@chromium.org, Feb 9 2018

Also, there are two things going wrong here: the existence of the orphaned process, and the fact that the orphaned process doesn't show in the task manager.
Labels: Needs-Triage-M63
Components: -Platform>DevTools
Cc: sindhu.chelamcherla@chromium.org
Labels: Triaged-ET Needs-Feedback
@nick:Any updates on this issue? Does this issue require any triage from TE end? Else please change the status of bug to Untriaged/Assigned to remove this bug from Triaging bucket.

Thanks!

Comment 6 by a...@chromium.org, Mar 23 2018

OP: to be clear in your original steps, "3. Close the tab" means click the close button on the tab itself, not the "end process" button in the task manager window?

Nick, what was wrong with the show-extra-renderers patch? Adding a bit of churn would be much preferred to leaving out information we have but don't show.

I could look at that as well, but I'll poke at the first issue of the close button not actually killing the process.

Comment 7 by blois@google.com, Mar 23 2018

Yes- step 3 is to click the Close button on the tab.
Project Member

Comment 8 by sheriffbot@chromium.org, Mar 23 2018

Labels: -Needs-Feedback
Thank you for providing more feedback. Adding the requester to the cc list.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 9 by a...@chromium.org, Mar 23 2018

This isn't an oopif issue per se. In a local Chromium build, I don't have oopif turned on, so loading that page in the original description dumps all the frames in the renderer, and then locks up that render process. Closing that page leaves the render process running. Sometimes that render process expires in a few seconds, sometimes it runs for minutes and minutes until it dies.

This is disturbing. If the user closes the last tab of a hung renderer, that render process should die. If the browser needs to kill it to make it die, it should. But this isn't an oopif blocker, as this happens oopif or not.

Comment 10 by a...@chromium.org, Mar 23 2018

This is really really weird.

I instrumented RenderProcessHostImpl::~RenderProcessHostImpl and ChildProcessLauncher::~ChildProcessLauncher. Then:

1. Open the page.
2. Open the Task Manager.
3. Find the PID of the render process; verify that it's 100% CPU in the Task Manager and the Activity Monitor.
4. Close the tab.
** Those destructors were NOT hit **
5. Watch the process not appear in the Task Manager, but still appear in the Activity Monitor.
6. Kill the process in the Activity Monitor.
** Instantly the RenderProcessHostImpl and ChildProcessLauncher destructors logged **

This is very not good.

Comment 11 by a...@chromium.org, Mar 23 2018

Labels: OS-Mac
Summary: Closing tabs does not kill hung render processes (was: Running tasks may not be displayed by Task Manager)
The fact that the Task Manager doesn't show all processes is bug 739782; follow that bug for progress there. The fact that closing a tab with a hung process doesn't actually kill that hung process is very disturbing and needs to be fixed. This bug will be that.

Comment 12 by a...@chromium.org, Mar 23 2018

Status: Untriaged (was: Unconfirmed)

Comment 13 by a...@chromium.org, Mar 23 2018

Summary: Closing tabs does not kill render processes with service workers, leaving them spinning forever (was: Closing tabs does not kill hung render processes)
Nick points out that this might be a service worker.

If we go to chrome://serviceworker-internals/ then we see that there is indeed a serviceworker keeping the process alive. If, on that page, we click the stop button, it spews an error ("Error: {"columnNumber":-1,"lineNumber":-1,"message":"DETACH_STALLED_IN_STOPPING","sourceURL":""}") but then the process spinning goes away.

Can we make a smaller change to the Task Manager to show service workers?
Cc: a...@chromium.org nick@chromium.org alex...@chromium.org lukasza@chromium.org creis@chromium.org
Components: -UI>TaskManager Blink>ServiceWorker UI>Browser>Navigation Internals>Core
Labels: -Pri-2 OS-Chrome OS-Windows Pri-1
Owner: falken@chromium.org
Status: Assigned (was: Untriaged)
Sorry that this fell off the radar-- trying to revive it now after a recent ping from blois@.

Additional repro steps:
1) Visit https://peteblois.github.io/tmp/iframe_kill/ (which starts a ServiceWorker running at 100% CPU).
2) In another tab, visit https://peteblois.github.io/tmp/iframe_scrolling2/ (which tries to put a subframe in the hung process).

The second tab never finishes loading, even if you close the first tab and reload the second tab.  It only works if you kill the hung process.

I think this boils down to a ServiceWorker problem, where we don't have a way to recover from hung ServiceWorker processes.  falken@, do you know if there's something that's supposed to catch that case?  Can we add something?

For comparison, hung renderer processes (including OOPIF processes) will put up a hung renderer dialog to the user if an input event isn't acked within some time period (30 seconds, I think?).  ServiceWorkers don't get input events and won't get coverage from this timer.  Maybe there should be some other timer that can either kill the worker or put up a similar dialog?  (I note that even the "Stop" button in chrome://serviceworker-internals/ doesn't work in this case, showing a DETACH_STALLED_IN_STOPPING error.)

It's true that Chrome's Task Manager should do a better job showing the Service Worker process, which is tracked in issue 716609.  However, I don't think that's the real solution here, since users shouldn't have to go hunting through Chrome's Task Manager to recover from a hung worker.  (For reference, that's assigned to nick@, who is currently on paternity leave.)

Also, as Avi notes in comment 9, this isn't a Site Isolation specific issue-- it seems like it can affect any case where a SeviceWorker becomes unresponsive, especially when it shares a process with another tab.

falken@, can we look for a way to recover on the ServiceWorker side?  Thanks!
For reference, lukasza@ filed a followup bug for cases with hung processes in hidden OOPIFs, which probably face a similar problem (given that they don't have input events to trigger the dialog).  See issue 848821.

Not sure what options we have there and whether we want to handle the ServiceWorker case specially here, or find something that works for both.
There's a lot of info here. But a service worker is expected to eventually time out (it may take a few minutes) and stop automatically and then release its process reference. If that's not happening it's a bug.
I was able to close the tab and reload, so I think the SW is releasing the process. I did wait several minutes.

DETACH_STALLED_IN_STOPPING isn't a hard error: it means the browser process sent the "Stop" IPC to the service worker, and the SW never ack'd back. That's expected since the process is hung. In that case the browser forcibly drops the SW ("detaches"), so the process ref held by the SW is dropped. After that I would expect reloading the second tab to work.
Thanks.  For reference, do you know where that timeout is defined, just so we know how to test it?
See kPingTimeout and kRequestTimeout in ServiceWorkerVersion, i.e., code around here:
https://cs.chromium.org/chromium/src/content/browser/service_worker/service_worker_version.cc?l=288&rcl=c7f3b63534f3688ed73190104f1ce7dfabda3132

For each running service worker, the browser process has a timer fire every 30 seconds. On each heartbeat, it sends a "Ping" IPC. If the service worker didn't respond with "Pong" by the next heartbeat, it's considered unresponsive and terminated.

Termination involves sending a "Stop" IPC. If the service worker doesn't ACK that it stopped within a short delay (I think 10 seconds), the browser does the "detach" operation seen above.

Also, the first time the service worker is run (before it is installed), it is granted a longer time to run, since installation can take a while (it has to be fetched from network). In this case it's granted 5 minutes of end-to-end startup time (kStartNewWorkerTimeout) but we still apply the 30 second responsive check once service worker reports that the script finished loading. So depending on how the busy SW is implemented it might around 5 minutes to see the timeout.

Note that we don't terminate a service worker if DevTools is attached to it, which might be why others didn't see the service worker terminate.

We also detect service workers that aren't busy spinning but staying alive by keeping a promise open via ExtendableEvent.waitUntil() or similar API. Each request to the service worker extends the service worker's lifetime by a max of kRequestTimeout by default. When kRequestTimeout expires, the browser no longer respects waitUntil() and is free to terminate the service worker if it has no more requests coming in.
Labels: -Pri-1 Pri-3
Hi creis, blos: does my comment match your experience? Should this be WontFixed or is there any follow-up work to do?
Blockedon: 716609
Summary: Closing tabs does not kill render processes with service workers, leaving them spinning for 5 minutes (was: Closing tabs does not kill render processes with service workers, leaving them spinning forever)
falken@: Thanks for the extra details.  The 5 minute timeout and the DevTools disabling of the timeout may explain the behavior seen here.

For the original repro steps in this bug, it sounds like the request boils down to showing the service worker process in the task manager (issue 716609).  As noted in comment 1, running Chrome with --task-manager-show-extra-renderers is a temporary workaround for seeing the ServiceWorker process until issue 716609 is fixed.  (I confirmed that the service worker process does goes away 5 minutes after the tab is closed, using the --task-manager-show-extra-renderers flag to see it happen.)

For the steps in comment 14 from blois@, that should be resolved in issue 848821 by giving users a dialog with a shorter (10s?) timeout when a navigation tries to join an unresponsive process, whether due to a ServiceWorker or anything else.

blois@: Does that sound reasonable to you?  If so, I'll dupe this into issue 716609 instead of WontFix'ing it.

Sign in to add a comment