Issue metadata
Sign in to add a comment
|
Increase in ABORT errors for starting a new service worker |
||||||||||||||||||||
Issue descriptionRegression between 56.0.2888.0 and 56.0.2889.0 that has remained. 2888: ABORT 0.02%, EXISTS 93.80% 2889: ABORT 93.36%, EXISTS 0.02% https://uma.googleplex.com/p/chrome/histograms/?endDate=20161017&dayCount=7&histograms=ServiceWorker.StartNewWorker.Status&fixupData=true&showMax=true&filters=channel%2Ceq%2C1%2Csimple_version%2Ceq%2C56.0.2888.0%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial https://uma.googleplex.com/p/chrome/timeline_v2/?sid=3c3c51483bfb5c6bdf6c21c36a5622ec I guess the EXISTS errors became ABORT errors. Probably doesn't have real user impact but the metrics have gotten confused. Change log: https://chromium.googlesource.com/chromium/src/+log/56.0.2888.0..56.0.2889.0?pretty=fuller&n=10000
,
Dec 22 2016
I guess nothing ever guaranteed it was going to stay alive. ServiceWorkerRegisterJob calls Stop() in Complete() and then releases the reference when its destructed.
,
Dec 22 2016
I think I see the cause. We switched the order of "WorkerStopped" and "ReleaseProcessRefCount" messages in https://codereview.chromium.org/2307543002 at EmbeddedWorkerDispatcher::WorkerContextDestroyed. The SWProviderHost for the SW would keep it alive until it stopped, but since the ref count hits 0 before that, the SWProviderHost itself gets destroyed and destructs the worker. These ordering assumptions are really fragile...
,
Dec 22 2016
More notes. Previously the order was: (1) EmbeddedWorkerHostMsg_WorkerStopped (2) ChildProcessHostMsg_ShutdownRequest Now the order is reversed. It appears ShutdownRequest is triggering ServiceWorkerProviderHost to be destructed before WorkerStopped arrives. But we can't fix this just by changing the ordering, since WorkerStopped is now sent via Mojo. We would need some "AssociatedInterface" work. It might not even be enough to guarantee the ordering. If the start or stop callbacks are not invoked synchronously or involve async tasks, the ServiceWorkerVersion can unexpectedly be destroyed once the ShutdownRequest message arrives. I'm not 100% sure the ordering was already guaranteed actually. WorkerStopped is sent via RenderThreadImpl::current()->thread_safe_sender() and ShutdownRequest is sent via ChildThreadImpl::Send(). Actually now I'm wondering why ScopedChildProcessReference is needed for service worker, since SWProcessManager also keeps track of running workers and keeps a ref count on the RenderProcessHost.
,
Dec 22 2016
Hum, SW's ScopedChildProcessReference was added in Nov 2013 https://codereview.chromium.org/54573002 and ServiceWorkerProcessManager in Apr 2014 https://codereview.chromium.org/238043002/
,
Dec 22 2016
OK ScopedChildProcessReference is a red herring. What's actually happening is the previous order:
(1) EmbeddedWorkerHostMsg_WorkerStopped
(2) ServiceWorkerHostMsg_ProviderDestroyed
got reversed. Now it's:
WorkerContextDestroyed()
- unregisters the worker
- WorkerWrapper is destroyed
- WebEmbeddedWorkerImpl is destroyed
- ServiceWorkerNetworkProvider is destroyed
SWNetworkProvider's dtor sends ServiceWorkerHostMsg_ProviderDestroyed in IPC or calls SWDispatcherHost::OnProviderDestroyed() in Mojo.
We can't easily guarantee the ordering in Mojo since EmbeddedWorkerInstanceClientImpl's |stop_callback_| uses a different pipe than ServiceWorkerNetworkProvider's |dispatcher_host_|.
,
Dec 22 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/af96c9a9e8d491929f32f2e693fd61fd1b93557d commit af96c9a9e8d491929f32f2e693fd61fd1b93557d Author: falken <falken@chromium.org> Date: Thu Dec 22 07:48:03 2016 service worker: Attempt to send WorkerStopped before ProviderDestroyed. Previously in IPC when a worker stops, the renderer sends the browser a WorkerStopped message followed by a ProviderDestroyed message. This ordering got reversed in https://crrev.com/6ea839368fd4dbf, causing an unexpected case where a ServiceWorkerProviderHost is destroyed, releasing a ServiceWorkerVersion before OnStopped can be invoked on it. This patch returns to the previous ordering. However even with this patch, for Mojo the ordering is still not guaranteed, since the pipes are different. Since the IPC land is no longer the default, the real fix is to figure out how to do this in Mojo. This patch is just a mitigation, so no tests are added yet. BUG= 629701 , 676526 Review-Url: https://codereview.chromium.org/2596153002 Cr-Commit-Position: refs/heads/master@{#440356} [modify] https://crrev.com/af96c9a9e8d491929f32f2e693fd61fd1b93557d/content/renderer/service_worker/embedded_worker_dispatcher.cc [modify] https://crrev.com/af96c9a9e8d491929f32f2e693fd61fd1b93557d/content/renderer/service_worker/embedded_worker_instance_client_impl.cc
,
Dec 26 2016
UMA is back to normal after af96c9a9e8d491929f32f. But there's still no guarantee that WorkerStopped is sent before ProviderDestroyed since the Mojo pipes are different. I spun off Issue 676983 to track that.
,
Dec 26 2016
Request merge to 56. This is a simple mitigation that cycled through canary with no crashes. After the change landed our UMA is back to normal.
,
Dec 26 2016
Your change meets the bar and is auto-approved for M56 (branch: 2924)
,
Dec 26 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a0e0b3cc4d1a3318da54a62f84274919583fd3cb commit a0e0b3cc4d1a3318da54a62f84274919583fd3cb Author: Matt Falkenhagen <falken@chromium.org> Date: Mon Dec 26 04:03:26 2016 (M56) service worker: Attempt to send WorkerStopped before ProviderDestroyed. Previously in IPC when a worker stops, the renderer sends the browser a WorkerStopped message followed by a ProviderDestroyed message. This ordering got reversed in https://crrev.com/6ea839368fd4dbf, causing an unexpected case where a ServiceWorkerProviderHost is destroyed, releasing a ServiceWorkerVersion before OnStopped can be invoked on it. This patch returns to the previous ordering. However even with this patch, for Mojo the ordering is still not guaranteed, since the pipes are different. Since the IPC land is no longer the default, the real fix is to figure out how to do this in Mojo. This patch is just a mitigation, so no tests are added yet. BUG= 629701 , 676526 Review-Url: https://codereview.chromium.org/2596153002 Cr-Commit-Position: refs/heads/master@{#440356} (cherry picked from commit af96c9a9e8d491929f32f2e693fd61fd1b93557d) Review-Url: https://codereview.chromium.org/2605593002 . Cr-Commit-Position: refs/branch-heads/2924@{#618} Cr-Branched-From: 3a87aecc31cd1ffe751dd72c04e5a96a1fc8108a-refs/heads/master@{#433059} [modify] https://crrev.com/a0e0b3cc4d1a3318da54a62f84274919583fd3cb/content/renderer/service_worker/embedded_worker_dispatcher.cc [modify] https://crrev.com/a0e0b3cc4d1a3318da54a62f84274919583fd3cb/content/renderer/service_worker/embedded_worker_instance_client_impl.cc |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by falken@chromium.org
, Dec 22 2016