New issue
Advanced search Search tips

Issue 671084 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android , Windows
Pri: 1
Type: Bug



Sign in to add a comment

TIMEOUT on ServiceWorker.StartWorker.Status is increased

Project Member Reported by shimazu@chromium.org, Dec 5 2016

Issue description

The number of TIMEOUT is now about 200 though that was less than 15 before Nov 27.
The number of unique users is also increated from around 10 to 85.
https://uma.googleplex.com/timeline_v2?sid=195ad8ebbe3bcfebcd5ee77be91fced3

The total number is not so big, but something might happen between 57.0.2931..2936
diff: https://chromium.googlesource.com/chromium/src/+log/57.0.2931.0..57.0.2936.0?pretty=fuller&n=10000

 
I'm now suspecting CHECK for debugging might be possible to be the culprit:
https://codereview.chromium.org/2528333002
FetchEvent.MainResource.Status looks also increased at the same position
https://uma.googleplex.com/timeline_v2?sid=74f21a064da4a87d6506976fe0f6fcaf


Labels: OS-Windows
We have several patches in this term: 

  ServiceWorker: Change DCHECK to CHECK for debugging process allocation
  Worker: Use per-frame task runners from a worker thread 
  Worker: Deprecate PostTaskToMainExecutionContext 
  Worker: Provide a way to access parent frame task runners from a worker thread 
  ServiceWorker: mojofy ipcs sent from SWNetworkProvider 
  Remove InterfaceRegistry/Provider from service workers 
  service worker: Persist NavigationPreloadState 
https://crrev.com/2563583002 will revert "Change DCHECK to CHECK for debugging process allocation"

Comment 6 Deleted

I forgot it has not landed yet due to trybot failure... I'm trying to commit it again.
This graph shows the range of failure seems 2935..2936 because the total StartWorker count were the same but the number of failures increased.
https://uma.googleplex.com/timeline_v2?sid=ce9f04bc4d45929ef43df579d94c3ce9
diff: https://chromium.googlesource.com/chromium/src/+log/57.0.2935.0..57.0.2936.0?pretty=fuller&n=10000

This release also has several patches related to workers:
  ServiceWorker: Change DCHECK to CHECK for debugging process allocation
  Worker: Use per-frame task runners from a worker thread 
  Worker: Deprecate PostTaskToMainExecutionContext 
  Worker: Provide a way to access parent frame task runners from a worker thread 
  ServiceWorker: mojofy ipcs sent from SWNetworkProvider


Additionally, ServiceWorker.StartWorker.TimeoutPhase shows SENT_START_WORKER and SCRIPT_READ_FINISHED started increasing from 2936 and THREAD_STARTED and SENT_START_WORKER are the two top of failure reason from 2944. I guess another thing happened between 2943 .. 2944
https://chromium.googlesource.com/chromium/src/+log/57.0.2943.0..57.0.2944.0?pretty=fuller&n=10000

The culprit of failures after 57.0.2936 might be https://crrev.com/2529833002. 
The number of failures before Dec 6 looks larger when MemoryCoordinator v0 is enabled by finch experiment:
https://uma.googleplex.com/p/chrome/variations/?sid=cd7e5af10a1b3d87149708841b3052e9

However, failures on Control seems increasing after Dec 7. It's the same when 57.0.2944.0 was released. There might be another reason.

Labels: Restrict-View-Google
Cc: bashi@chromium.org
bashi-san: Is it possible to cause this issue by your patch?
Currently starting up seems failing when loading worker's script. I'm guessing something suspends a renderer process which has the worker thread while the thread is trying to load the worker's script from the browser process. 

Comment 12 by bashi@chromium.org, Dec 13 2016

Cc: tasak@google.com
Yeah, I guess it's possible.

Sakamoto-san: does purge+suspend stops worker threads?

I'm not familiar with service workers but shouldn't we suspend a renderer when there are service workers on the renderer?
This graph shows 57.0.2943 has less timeouts than other versions.
https://uma.googleplex.com/timeline_v2?sid=80401f0cca68d6a43d9796887d3eb437

This means failure might have started from 57.0.2943. I'll take a look the following too.
https://chromium.googlesource.com/chromium/src/+log/57.0.2942.0..57.0.2943.0?pretty=oneline&n=10000
Hmm, less timeouts but mostly the same number of usage. 

Comment 15 by horo@chromium.org, Dec 13 2016

The timeout after THREAD_STARTED looks like caused by this change.
https://codereview.chromium.org/2543723005
This landed in 57.0.2944.0.

Comment 16 by horo@chromium.org, Dec 13 2016

Cc: tzik@chromium.org
+tzik@ the author of https://codereview.chromium.org/2543723005.
[CURRENT SUMMARY OF THIS ISSUE]

Currently there are 2 known issues: (1) worker tasks can accidentally be suspended by MemoryCoordinator, and (2) throttled by task runners after [1]. Reg (1), shimazu-san has promising fixes and they're under reviews [2, 3]. Reg (2), tzik-san is now starting fixing it.

[1] https://codereview.chromium.org/2543723005/
[2] https://codereview.chromium.org/2569963002/
[3] https://codereview.chromium.org/2574663002/

Comment 18 by bashi@chromium.org, Dec 13 2016

Out of curiously but what happens when a tab is discarded? Is there the same problem when a tab is discarded?
These 2 issues would be entangled. This is just my reasoning...

(1) A frame of worker's shadow page has accidentally used non-empty scheduler that can be throttled.
(2) MemoryCoordinator started suspending tasks associated backgrounded tabs. This is done by calling RendererSchedulerImpl::SuspendRenderer() that makes the timer queue suspended. The tasks posted to the timer task runner started to be suspended and the TIMEOUT rate increased (TIMEOUT after SENT_START_WORKER and SCRIPT_READ_FINISHED)
(3) Unspecified tasks started to be suspended by [1]. This changes timing of TIMEOUT (TIMEOUT after THRAD_STARTED)
Addendum to comment 17: (1) is caused by https://codereview.chromium.org/2529833002/ and (2) is caused by https://codereview.chromium.org/2543723005/
REG (1) on c#19, this should be the root cause and fixed by introducing ParentFrameTaskRunners for WorkerLoaderProxyProvider::postTaskToLoader ( issue 667310 , this is now blocked by other issues to remove ExecutionContextTask). ParentFrameTaskRunners never throttles ServiceWorker/SharedWorker's tasks.
Re. #19: I'm still wondering why it happens even when the finch flag of MemoryCoordinator is disabled (Control). Is the scheduler possible to be suspended by other reasons?
c#22: IIUC, that's still possible when a process is backgrounded via a following path:

RenderThreadImpl::OnProcessBackgrounded
  RendererSchedulerImpl::OnRendererBackgrounded
    // Called as a delayed task
    RendererSchedulerImpl::SuspendTimerQueueWhenBackgrounded
bashi@: Are there any differences between tab close and tab discard? Does tab discard de-prioritize a renderer process, throttle task runners, etc?
Project Member

Comment 25 by bugdroid1@chromium.org, Dec 13 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/200baa21d49f1a2688a33bb64f4b689fdae69119

commit 200baa21d49f1a2688a33bb64f4b689fdae69119
Author: tzik <tzik@chromium.org>
Date: Tue Dec 13 12:59:34 2016

Use ParentFrameTaskRunners in WorkerLoaderProxyProvider::postTaskToLoader impls

Implementations of WorkerLoaderProxyProvider::postTaskToLoader have used
ExecutionContext::postTask to post tasks to the main thread. However,
the posted tasks are unexpectedly throttled, and that delays the startup
time of service workers.
This CL replaces ExecutionContext::postTask there with ParentFrameTaskRunners
and WebTaskRunner::postTask, so that the posted task is routed to suitable
task runners.

BUG= 667310 ,  671084 

Review-Url: https://codereview.chromium.org/2574703002
Cr-Commit-Position: refs/heads/master@{#438146}

[modify] https://crrev.com/200baa21d49f1a2688a33bb64f4b689fdae69119/third_party/WebKit/Source/core/dom/ExecutionContextTask.h
[modify] https://crrev.com/200baa21d49f1a2688a33bb64f4b689fdae69119/third_party/WebKit/Source/core/loader/ThreadableLoaderTest.cpp
[modify] https://crrev.com/200baa21d49f1a2688a33bb64f4b689fdae69119/third_party/WebKit/Source/core/workers/ThreadedMessagingProxyBase.cpp
[modify] https://crrev.com/200baa21d49f1a2688a33bb64f4b689fdae69119/third_party/WebKit/Source/web/WebEmbeddedWorkerImpl.cpp
[modify] https://crrev.com/200baa21d49f1a2688a33bb64f4b689fdae69119/third_party/WebKit/Source/web/WebSharedWorkerImpl.cpp

The number of timeouts seems decreasing after tzik-san's patch (57.0.2951):
https://uma.googleplex.com/timeline_v2?sid=fa8522c5c879cbead922f886a288aba4
Yes comment 25 (200baa21d4) landed in 57.0.2951.0 and helped a lot:

57.0.2950.0 => 57.0.2950.1
Win ServiceWorker.StartWorker.Time (50p): 48.04 => 34.50
Win ServiceWorker.StartWorker.Status == TIMEOUT: 00.26% => 00.34%
Android ServiceWorker.StartWorker.Time (50p): 633.1 => 200.5
Android ServiceWorker.StartWorker.Status == TIMEOUT: 03.70% => 00.61%
Project Member

Comment 28 by bugdroid1@chromium.org, Dec 19 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/14169abdc6f1d31468cb1de6212ff256c89b92b2

commit 14169abdc6f1d31468cb1de6212ff256c89b92b2
Author: shimazu <shimazu@chromium.org>
Date: Mon Dec 19 03:48:05 2016

MemoryCoordinator checks if ServiceWorker exists on the suspending process

If the suspending process has a worker thread, currently the worker won't
respond with events any more. This patch is to keep the process running which
has workers.

BUG= 671084 

Review-Url: https://codereview.chromium.org/2569963002
Cr-Commit-Position: refs/heads/master@{#439399}

[modify] https://crrev.com/14169abdc6f1d31468cb1de6212ff256c89b92b2/content/browser/memory/memory_coordinator.cc
[modify] https://crrev.com/14169abdc6f1d31468cb1de6212ff256c89b92b2/content/browser/memory/memory_coordinator.h
[modify] https://crrev.com/14169abdc6f1d31468cb1de6212ff256c89b92b2/content/browser/renderer_host/render_process_host_impl.cc
[modify] https://crrev.com/14169abdc6f1d31468cb1de6212ff256c89b92b2/content/browser/renderer_host/render_process_host_impl.h
[modify] https://crrev.com/14169abdc6f1d31468cb1de6212ff256c89b92b2/content/browser/service_worker/service_worker_browsertest.cc
[modify] https://crrev.com/14169abdc6f1d31468cb1de6212ff256c89b92b2/content/browser/service_worker/service_worker_process_manager_unittest.cc
[modify] https://crrev.com/14169abdc6f1d31468cb1de6212ff256c89b92b2/content/public/browser/render_process_host.h
[modify] https://crrev.com/14169abdc6f1d31468cb1de6212ff256c89b92b2/content/public/test/mock_render_process_host.cc
[modify] https://crrev.com/14169abdc6f1d31468cb1de6212ff256c89b92b2/content/public/test/mock_render_process_host.h

Cc: shimazu@chromium.org
Issue 674345 has been merged into this issue.
Status: Fixed (was: Started)
As seen in the following UMA, c#25 (57.0.2951) and c#28 (57.0.2957) fix the problem. :)
https://uma.googleplex.com/timeline_v2?sid=c0b9c67f8301190864572b221aaf26bb

Labels: M-57
Great! Nice work handling this regression everyone.
Labels: -Restrict-View-Google

Sign in to add a comment