New issue
Advanced search Search tips

Issue 683047 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Add UMA to see how long service workers run for.

Project Member Reported by falken@chromium.org, Jan 20 2017

Issue description

Spin off of  issue 648836 . Still need UMA to track the problem in the wild.

- We don't want the web to start relying on being able to keep a sw alive indefinitely.
- We don't want people writing botnet like service workers.
 

Comment 1 by falken@chromium.org, Feb 20 2017

Cc: ojan@chromium.org
Status: Started (was: Assigned)
+ojan would the following UMA make sense?

There are two problems we want to detect:
- A worker that never stops.
- A worker that stops but is soon able to restart itself.

We don’t care about lifetime of SW with controllees. The controllee can keep them running as long and often as they want. So in the following UMA, "running" means "running while there is no controllee".

Proposed UMA:

ServiceWorker.RunningTime (seconds)
- Recorded when a SW stops: the time it was running.
- Also recorded every 5 minutes for a SW that’s been running over 5 minutes (so this will double-count a SW running for e.g., 1 hour)
ServiceWorker.CumulativeRunningTime (percent)
- Every 30 minutes, record the percent of time the worker was running, for each worker that ever ran in the past 30 min.

Comment 2 by falken@chromium.org, Feb 22 2017

Some teammates expressed concern that the proposed UMA is too complex and if implemented we still don't have a way to distinguish between valid and invalid use of SW. For example, a heavy push notification worker (maybe Messenger implemented using SW) could conceivably be running for a long time without a controlled page.

I think we will start simpler. New proposal:
ServiceWorker.RunningTime
  - recorded when a SW stops
ServiceWorker.StillRunningTime
  - recorded every 5 minutes for SWs that have been running over 5 minutes

We could refine these later and try to surface how long it's been running without a controllee or somehow characterize the history of events that kept it alive.

Comment 3 by ojan@chromium.org, Feb 22 2017

Do you have targets in mind for what values we should be targeting? I ask because UMA doesn't deal well with unpopular/malicious sites that are doing abusive things unless you set goals like: ServiceWorkers should *never* run more than XX minutes or something like that. The usage in popular sites drowns out the long tail unless you care about the 100th percentile.

Comment 4 by falken@chromium.org, Feb 23 2017

That's a good point. Five minutes was chosen because it's the default time limit for a single event. However there is no value XX for which SW should never run for more than XX time, since SW can be kept alive as long as there are events to process. For example if a controlled page continuously does network requests, the SW will not be terminated until the page is closed. Similarly if the browser continuously gets push notifications for a SW, that SW can be kept alive indefinitely.

I guess the root issue is there is no known way to distinguish abuse vs valid use. Adding UMA to characterize the history of events for long running workers may be useful but is probably still insufficient.
Project Member

Comment 5 by bugdroid1@chromium.org, Mar 17 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f93cb21f636207e7f0573bf99681bfce4951e0f7

commit f93cb21f636207e7f0573bf99681bfce4951e0f7
Author: falken <falken@chromium.org>
Date: Fri Mar 17 03:05:30 2017

Add UMA for how long service workers run for.

We don't have any UMA for how long a SW stays alive, so this
is a step toward fixing that. This adds UMA:
* ServiceWorker.Runtime
  - How long the worker ran for (wall time). Recorded when
a service worker stops.

Ideally we'd additionally like to know 1) about SW's that
never stop, and 2) why a SW stays alive particularly
if it's for a long time, and also be able to distinguish
valid use from abusive use, but there's some complexity
there so we're starting simple.

BUG= 683047 

Review-Url: https://codereview.chromium.org/2706923003
Cr-Commit-Position: refs/heads/master@{#457669}

[modify] https://crrev.com/f93cb21f636207e7f0573bf99681bfce4951e0f7/content/browser/BUILD.gn
[modify] https://crrev.com/f93cb21f636207e7f0573bf99681bfce4951e0f7/content/browser/service_worker/embedded_worker_instance.cc
[modify] https://crrev.com/f93cb21f636207e7f0573bf99681bfce4951e0f7/content/browser/service_worker/embedded_worker_instance.h
[modify] https://crrev.com/f93cb21f636207e7f0573bf99681bfce4951e0f7/content/browser/service_worker/embedded_worker_instance_unittest.cc
[modify] https://crrev.com/f93cb21f636207e7f0573bf99681bfce4951e0f7/content/browser/service_worker/embedded_worker_registry.cc
[modify] https://crrev.com/f93cb21f636207e7f0573bf99681bfce4951e0f7/content/browser/service_worker/embedded_worker_registry.h
[add] https://crrev.com/f93cb21f636207e7f0573bf99681bfce4951e0f7/content/browser/service_worker/service_worker_lifetime_tracker.cc
[add] https://crrev.com/f93cb21f636207e7f0573bf99681bfce4951e0f7/content/browser/service_worker/service_worker_lifetime_tracker.h
[add] https://crrev.com/f93cb21f636207e7f0573bf99681bfce4951e0f7/content/browser/service_worker/service_worker_lifetime_tracker_unittest.cc
[modify] https://crrev.com/f93cb21f636207e7f0573bf99681bfce4951e0f7/content/browser/service_worker/service_worker_metrics.cc
[modify] https://crrev.com/f93cb21f636207e7f0573bf99681bfce4951e0f7/content/browser/service_worker/service_worker_metrics.h
[modify] https://crrev.com/f93cb21f636207e7f0573bf99681bfce4951e0f7/content/browser/service_worker/service_worker_version.cc
[modify] https://crrev.com/f93cb21f636207e7f0573bf99681bfce4951e0f7/content/test/BUILD.gn
[modify] https://crrev.com/f93cb21f636207e7f0573bf99681bfce4951e0f7/tools/metrics/histograms/histograms.xml

Comment 6 by falken@chromium.org, Apr 13 2017

Labels: M-59
Status: Fixed (was: Started)
Marking this one fixed from comment 5.

Comment 7 by ojan@chromium.org, May 8 2017

Is there a bug to track next steps? A quick look at the UMA suggests 1/10000 live for 18 hours and 1/1000 live for 3 hours. Perhaps UKM so we can figure out which sites?

Sign in to add a comment