New issue
Advanced search Search tips

Issue 676526 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Dec 2016
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug-Regression



Sign in to add a comment

Increase in ABORT errors for starting a new service worker

Project Member Reported by falken@chromium.org, Dec 22 2016

Issue description

Comment 1 by falken@chromium.org, Dec 22 2016

SeviceWorkerVersion's destructor is getting called before the start callbacks are called. So it calls RecordResult(ABORT). Somehow previously it stayed alive long enough for the callbacks to be called.

Comment 2 by falken@chromium.org, Dec 22 2016

I guess nothing ever guaranteed it was going to stay alive. ServiceWorkerRegisterJob calls Stop() in Complete() and then releases the reference  when its destructed.

Comment 3 by falken@chromium.org, Dec 22 2016

I think I see the cause. We switched the order of "WorkerStopped" and "ReleaseProcessRefCount" messages in https://codereview.chromium.org/2307543002 at EmbeddedWorkerDispatcher::WorkerContextDestroyed. The SWProviderHost for the SW would keep it alive until it stopped, but since the ref count hits 0 before that, the SWProviderHost itself gets destroyed and destructs the worker.

These ordering assumptions are really fragile...


Comment 4 by falken@chromium.org, Dec 22 2016

More notes. Previously the order was:
(1) EmbeddedWorkerHostMsg_WorkerStopped
(2) ChildProcessHostMsg_ShutdownRequest

Now the order is reversed. It appears ShutdownRequest is triggering ServiceWorkerProviderHost to be destructed before WorkerStopped arrives.

But we can't fix this just by changing the ordering, since WorkerStopped is now sent via Mojo. We would need some "AssociatedInterface" work.

It might not even be enough to guarantee the ordering. If the start or stop callbacks are not invoked synchronously or involve async tasks, the ServiceWorkerVersion can unexpectedly be destroyed once the ShutdownRequest message arrives.

I'm not 100% sure the ordering was already guaranteed actually. WorkerStopped is sent via RenderThreadImpl::current()->thread_safe_sender() and ShutdownRequest is sent via ChildThreadImpl::Send().

Actually now I'm wondering why ScopedChildProcessReference is needed for service worker, since SWProcessManager also keeps track of running workers and keeps a ref count on the RenderProcessHost.

Comment 5 by falken@chromium.org, Dec 22 2016

Hum, SW's ScopedChildProcessReference was added in Nov 2013 https://codereview.chromium.org/54573002 and ServiceWorkerProcessManager in Apr 2014 https://codereview.chromium.org/238043002/

Comment 6 by falken@chromium.org, Dec 22 2016

OK ScopedChildProcessReference is a red herring. What's actually happening is the previous order:
(1) EmbeddedWorkerHostMsg_WorkerStopped
(2) ServiceWorkerHostMsg_ProviderDestroyed

got reversed. Now it's:
WorkerContextDestroyed()
- unregisters the worker
- WorkerWrapper is destroyed
  - WebEmbeddedWorkerImpl is destroyed
    - ServiceWorkerNetworkProvider is destroyed

SWNetworkProvider's dtor sends ServiceWorkerHostMsg_ProviderDestroyed in IPC or calls SWDispatcherHost::OnProviderDestroyed() in Mojo.

We can't easily guarantee the ordering in Mojo since EmbeddedWorkerInstanceClientImpl's |stop_callback_| uses a different pipe than ServiceWorkerNetworkProvider's |dispatcher_host_|.
Project Member

Comment 7 by bugdroid1@chromium.org, Dec 22 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/af96c9a9e8d491929f32f2e693fd61fd1b93557d

commit af96c9a9e8d491929f32f2e693fd61fd1b93557d
Author: falken <falken@chromium.org>
Date: Thu Dec 22 07:48:03 2016

service worker: Attempt to send WorkerStopped before ProviderDestroyed.

Previously in IPC when a worker stops, the renderer sends the browser a
WorkerStopped message followed by a ProviderDestroyed message. This
ordering got reversed in https://crrev.com/6ea839368fd4dbf, causing an
unexpected case where a ServiceWorkerProviderHost is destroyed,
releasing a ServiceWorkerVersion before OnStopped can be invoked on it.
This patch returns to the previous ordering.

However even with this patch, for Mojo the ordering is still not
guaranteed, since the pipes are different. Since the IPC land is no
longer the default, the real fix is to figure out how to do this in
Mojo. This patch is just a mitigation, so no tests are added yet.

BUG= 629701 , 676526 

Review-Url: https://codereview.chromium.org/2596153002
Cr-Commit-Position: refs/heads/master@{#440356}

[modify] https://crrev.com/af96c9a9e8d491929f32f2e693fd61fd1b93557d/content/renderer/service_worker/embedded_worker_dispatcher.cc
[modify] https://crrev.com/af96c9a9e8d491929f32f2e693fd61fd1b93557d/content/renderer/service_worker/embedded_worker_instance_client_impl.cc

Comment 8 by falken@chromium.org, Dec 26 2016

Status: Fixed (was: Started)
UMA is back to normal after af96c9a9e8d491929f32f. But there's still no guarantee that WorkerStopped is sent before ProviderDestroyed since the Mojo pipes are different. I spun off  Issue 676983  to track that.

Comment 9 by falken@chromium.org, Dec 26 2016

Labels: Merge-Request-56
Request merge to 56. This is a simple mitigation that cycled through canary with no   crashes. After the change landed our UMA is back to normal.

Comment 10 by dimu@chromium.org, Dec 26 2016

Labels: -Merge-Request-56 Merge-Approved-56 Hotlist-Merge-Approved
Your change meets the bar and is auto-approved for M56 (branch: 2924)
Project Member

Comment 11 by bugdroid1@chromium.org, Dec 26 2016

Labels: -merge-approved-56 merge-merged-2924
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a0e0b3cc4d1a3318da54a62f84274919583fd3cb

commit a0e0b3cc4d1a3318da54a62f84274919583fd3cb
Author: Matt Falkenhagen <falken@chromium.org>
Date: Mon Dec 26 04:03:26 2016

(M56) service worker: Attempt to send WorkerStopped before ProviderDestroyed.

Previously in IPC when a worker stops, the renderer sends the browser a
WorkerStopped message followed by a ProviderDestroyed message. This
ordering got reversed in https://crrev.com/6ea839368fd4dbf, causing an
unexpected case where a ServiceWorkerProviderHost is destroyed,
releasing a ServiceWorkerVersion before OnStopped can be invoked on it.
This patch returns to the previous ordering.

However even with this patch, for Mojo the ordering is still not
guaranteed, since the pipes are different. Since the IPC land is no
longer the default, the real fix is to figure out how to do this in
Mojo. This patch is just a mitigation, so no tests are added yet.

BUG= 629701 , 676526 

Review-Url: https://codereview.chromium.org/2596153002
Cr-Commit-Position: refs/heads/master@{#440356}
(cherry picked from commit af96c9a9e8d491929f32f2e693fd61fd1b93557d)

Review-Url: https://codereview.chromium.org/2605593002 .
Cr-Commit-Position: refs/branch-heads/2924@{#618}
Cr-Branched-From: 3a87aecc31cd1ffe751dd72c04e5a96a1fc8108a-refs/heads/master@{#433059}

[modify] https://crrev.com/a0e0b3cc4d1a3318da54a62f84274919583fd3cb/content/renderer/service_worker/embedded_worker_dispatcher.cc
[modify] https://crrev.com/a0e0b3cc4d1a3318da54a62f84274919583fd3cb/content/renderer/service_worker/embedded_worker_instance_client_impl.cc

Sign in to add a comment