TIMEOUT on ServiceWorker.StartWorker.Status is increased |
|||||||||
Issue descriptionThe number of TIMEOUT is now about 200 though that was less than 15 before Nov 27. The number of unique users is also increated from around 10 to 85. https://uma.googleplex.com/timeline_v2?sid=195ad8ebbe3bcfebcd5ee77be91fced3 The total number is not so big, but something might happen between 57.0.2931..2936 diff: https://chromium.googlesource.com/chromium/src/+log/57.0.2931.0..57.0.2936.0?pretty=fuller&n=10000
,
Dec 5 2016
FetchEvent.MainResource.Status looks also increased at the same position https://uma.googleplex.com/timeline_v2?sid=74f21a064da4a87d6506976fe0f6fcaf
,
Dec 5 2016
,
Dec 8 2016
We have several patches in this term: ServiceWorker: Change DCHECK to CHECK for debugging process allocation Worker: Use per-frame task runners from a worker thread Worker: Deprecate PostTaskToMainExecutionContext Worker: Provide a way to access parent frame task runners from a worker thread ServiceWorker: mojofy ipcs sent from SWNetworkProvider Remove InterfaceRegistry/Provider from service workers service worker: Persist NavigationPreloadState
,
Dec 8 2016
https://crrev.com/2563583002 will revert "Change DCHECK to CHECK for debugging process allocation"
,
Dec 12 2016
I forgot it has not landed yet due to trybot failure... I'm trying to commit it again.
,
Dec 12 2016
This graph shows the range of failure seems 2935..2936 because the total StartWorker count were the same but the number of failures increased. https://uma.googleplex.com/timeline_v2?sid=ce9f04bc4d45929ef43df579d94c3ce9 diff: https://chromium.googlesource.com/chromium/src/+log/57.0.2935.0..57.0.2936.0?pretty=fuller&n=10000 This release also has several patches related to workers: ServiceWorker: Change DCHECK to CHECK for debugging process allocation Worker: Use per-frame task runners from a worker thread Worker: Deprecate PostTaskToMainExecutionContext Worker: Provide a way to access parent frame task runners from a worker thread ServiceWorker: mojofy ipcs sent from SWNetworkProvider Additionally, ServiceWorker.StartWorker.TimeoutPhase shows SENT_START_WORKER and SCRIPT_READ_FINISHED started increasing from 2936 and THREAD_STARTED and SENT_START_WORKER are the two top of failure reason from 2944. I guess another thing happened between 2943 .. 2944 https://chromium.googlesource.com/chromium/src/+log/57.0.2943.0..57.0.2944.0?pretty=fuller&n=10000
,
Dec 12 2016
The culprit of failures after 57.0.2936 might be https://crrev.com/2529833002. The number of failures before Dec 6 looks larger when MemoryCoordinator v0 is enabled by finch experiment: https://uma.googleplex.com/p/chrome/variations/?sid=cd7e5af10a1b3d87149708841b3052e9 However, failures on Control seems increasing after Dec 7. It's the same when 57.0.2944.0 was released. There might be another reason.
,
Dec 13 2016
,
Dec 13 2016
bashi-san: Is it possible to cause this issue by your patch? Currently starting up seems failing when loading worker's script. I'm guessing something suspends a renderer process which has the worker thread while the thread is trying to load the worker's script from the browser process.
,
Dec 13 2016
Yeah, I guess it's possible. Sakamoto-san: does purge+suspend stops worker threads? I'm not familiar with service workers but shouldn't we suspend a renderer when there are service workers on the renderer?
,
Dec 13 2016
This graph shows 57.0.2943 has less timeouts than other versions. https://uma.googleplex.com/timeline_v2?sid=80401f0cca68d6a43d9796887d3eb437 This means failure might have started from 57.0.2943. I'll take a look the following too. https://chromium.googlesource.com/chromium/src/+log/57.0.2942.0..57.0.2943.0?pretty=oneline&n=10000
,
Dec 13 2016
Hmm, less timeouts but mostly the same number of usage.
,
Dec 13 2016
The timeout after THREAD_STARTED looks like caused by this change. https://codereview.chromium.org/2543723005 This landed in 57.0.2944.0.
,
Dec 13 2016
,
Dec 13 2016
[CURRENT SUMMARY OF THIS ISSUE] Currently there are 2 known issues: (1) worker tasks can accidentally be suspended by MemoryCoordinator, and (2) throttled by task runners after [1]. Reg (1), shimazu-san has promising fixes and they're under reviews [2, 3]. Reg (2), tzik-san is now starting fixing it. [1] https://codereview.chromium.org/2543723005/ [2] https://codereview.chromium.org/2569963002/ [3] https://codereview.chromium.org/2574663002/
,
Dec 13 2016
Out of curiously but what happens when a tab is discarded? Is there the same problem when a tab is discarded?
,
Dec 13 2016
These 2 issues would be entangled. This is just my reasoning... (1) A frame of worker's shadow page has accidentally used non-empty scheduler that can be throttled. (2) MemoryCoordinator started suspending tasks associated backgrounded tabs. This is done by calling RendererSchedulerImpl::SuspendRenderer() that makes the timer queue suspended. The tasks posted to the timer task runner started to be suspended and the TIMEOUT rate increased (TIMEOUT after SENT_START_WORKER and SCRIPT_READ_FINISHED) (3) Unspecified tasks started to be suspended by [1]. This changes timing of TIMEOUT (TIMEOUT after THRAD_STARTED)
,
Dec 13 2016
Addendum to comment 17: (1) is caused by https://codereview.chromium.org/2529833002/ and (2) is caused by https://codereview.chromium.org/2543723005/
,
Dec 13 2016
REG (1) on c#19, this should be the root cause and fixed by introducing ParentFrameTaskRunners for WorkerLoaderProxyProvider::postTaskToLoader ( issue 667310 , this is now blocked by other issues to remove ExecutionContextTask). ParentFrameTaskRunners never throttles ServiceWorker/SharedWorker's tasks.
,
Dec 13 2016
Re. #19: I'm still wondering why it happens even when the finch flag of MemoryCoordinator is disabled (Control). Is the scheduler possible to be suspended by other reasons?
,
Dec 13 2016
c#22: IIUC, that's still possible when a process is backgrounded via a following path:
RenderThreadImpl::OnProcessBackgrounded
RendererSchedulerImpl::OnRendererBackgrounded
// Called as a delayed task
RendererSchedulerImpl::SuspendTimerQueueWhenBackgrounded
,
Dec 13 2016
bashi@: Are there any differences between tab close and tab discard? Does tab discard de-prioritize a renderer process, throttle task runners, etc?
,
Dec 13 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/200baa21d49f1a2688a33bb64f4b689fdae69119 commit 200baa21d49f1a2688a33bb64f4b689fdae69119 Author: tzik <tzik@chromium.org> Date: Tue Dec 13 12:59:34 2016 Use ParentFrameTaskRunners in WorkerLoaderProxyProvider::postTaskToLoader impls Implementations of WorkerLoaderProxyProvider::postTaskToLoader have used ExecutionContext::postTask to post tasks to the main thread. However, the posted tasks are unexpectedly throttled, and that delays the startup time of service workers. This CL replaces ExecutionContext::postTask there with ParentFrameTaskRunners and WebTaskRunner::postTask, so that the posted task is routed to suitable task runners. BUG= 667310 , 671084 Review-Url: https://codereview.chromium.org/2574703002 Cr-Commit-Position: refs/heads/master@{#438146} [modify] https://crrev.com/200baa21d49f1a2688a33bb64f4b689fdae69119/third_party/WebKit/Source/core/dom/ExecutionContextTask.h [modify] https://crrev.com/200baa21d49f1a2688a33bb64f4b689fdae69119/third_party/WebKit/Source/core/loader/ThreadableLoaderTest.cpp [modify] https://crrev.com/200baa21d49f1a2688a33bb64f4b689fdae69119/third_party/WebKit/Source/core/workers/ThreadedMessagingProxyBase.cpp [modify] https://crrev.com/200baa21d49f1a2688a33bb64f4b689fdae69119/third_party/WebKit/Source/web/WebEmbeddedWorkerImpl.cpp [modify] https://crrev.com/200baa21d49f1a2688a33bb64f4b689fdae69119/third_party/WebKit/Source/web/WebSharedWorkerImpl.cpp
,
Dec 19 2016
The number of timeouts seems decreasing after tzik-san's patch (57.0.2951): https://uma.googleplex.com/timeline_v2?sid=fa8522c5c879cbead922f886a288aba4
,
Dec 19 2016
Yes comment 25 (200baa21d4) landed in 57.0.2951.0 and helped a lot: 57.0.2950.0 => 57.0.2950.1 Win ServiceWorker.StartWorker.Time (50p): 48.04 => 34.50 Win ServiceWorker.StartWorker.Status == TIMEOUT: 00.26% => 00.34% Android ServiceWorker.StartWorker.Time (50p): 633.1 => 200.5 Android ServiceWorker.StartWorker.Status == TIMEOUT: 03.70% => 00.61%
,
Dec 19 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/14169abdc6f1d31468cb1de6212ff256c89b92b2 commit 14169abdc6f1d31468cb1de6212ff256c89b92b2 Author: shimazu <shimazu@chromium.org> Date: Mon Dec 19 03:48:05 2016 MemoryCoordinator checks if ServiceWorker exists on the suspending process If the suspending process has a worker thread, currently the worker won't respond with events any more. This patch is to keep the process running which has workers. BUG= 671084 Review-Url: https://codereview.chromium.org/2569963002 Cr-Commit-Position: refs/heads/master@{#439399} [modify] https://crrev.com/14169abdc6f1d31468cb1de6212ff256c89b92b2/content/browser/memory/memory_coordinator.cc [modify] https://crrev.com/14169abdc6f1d31468cb1de6212ff256c89b92b2/content/browser/memory/memory_coordinator.h [modify] https://crrev.com/14169abdc6f1d31468cb1de6212ff256c89b92b2/content/browser/renderer_host/render_process_host_impl.cc [modify] https://crrev.com/14169abdc6f1d31468cb1de6212ff256c89b92b2/content/browser/renderer_host/render_process_host_impl.h [modify] https://crrev.com/14169abdc6f1d31468cb1de6212ff256c89b92b2/content/browser/service_worker/service_worker_browsertest.cc [modify] https://crrev.com/14169abdc6f1d31468cb1de6212ff256c89b92b2/content/browser/service_worker/service_worker_process_manager_unittest.cc [modify] https://crrev.com/14169abdc6f1d31468cb1de6212ff256c89b92b2/content/public/browser/render_process_host.h [modify] https://crrev.com/14169abdc6f1d31468cb1de6212ff256c89b92b2/content/public/test/mock_render_process_host.cc [modify] https://crrev.com/14169abdc6f1d31468cb1de6212ff256c89b92b2/content/public/test/mock_render_process_host.h
,
Dec 27 2016
Issue 674345 has been merged into this issue.
,
Jan 5 2017
As seen in the following UMA, c#25 (57.0.2951) and c#28 (57.0.2957) fix the problem. :) https://uma.googleplex.com/timeline_v2?sid=c0b9c67f8301190864572b221aaf26bb
,
Jan 6 2017
Great! Nice work handling this regression everyone.
,
Mar 1 2017
|
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by shimazu@chromium.org
, Dec 5 2016