New issue
Advanced search Search tips

Issue 738310 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug

Blocking:
issue 629701
issue 668633
issue 676983
issue 683037



Sign in to add a comment

Create ServiceWorkerProviderHost for the service worker on the browser process before starting the worker

Project Member Reported by shimazu@chromium.org, Jun 30 2017

Issue description

The patch was landed (https://crrev.com/2779763004) but it caused 
 an Issue 736649.

This is an issue to track the fix of the bug and reland the patch.

 
I've added a CHECK https://crbug.com/736649#c9 , but it doesn't cause any crashes. 
The problem is that there is no dispatcher_host at SendStartWorker. 

Comment 2 by falken@chromium.org, Jun 30 2017

I believe it also caused issue 737070.
Project Member

Comment 4 by bugdroid1@chromium.org, Jul 5 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/28232bff5f48813f417f3d808a68a0858f1765ad

commit 28232bff5f48813f417f3d808a68a0858f1765ad
Author: Matt Falkenhagen <falken@chromium.org>
Date: Wed Jul 05 10:57:32 2017

service worker: Handle renderer crash by checking for SWDispacherHost

Before this patch, if the renderer crash was detected by
EmbeddedWorkerRegistry in between OnProcessAllocated and SendStartWorker,
the embedded worker would never know because
EmbeddedWorkerRegistry::RemoveProcess would not know to call
EmbeddedWorkerInstance::OnDetach() because the worker was not yet
registered for the process via EmbeddedWorkerRegistry::BindWorkerToProcess.
See bug for more details.

The solution is to check for the existence of ServiceWorkerDispatcherHost
in SendStartWorker. My reasoning is as follows:

1) If the ServiceWorkerDispatcherHost is removed before SendStartWorker,
we’ll discover it in SendStartWorker and bail.

2) If the ServiceWorkerDispatcherHost is removed after SendStartWorker,
then EmbeddedWorkerRegistry will call Detach() since SendStartWorker
called BindWorkerToProcess.

3) We expect that ServiceWorkerDispatcherHost be created and registered
to ServiceWorkerContextCore by the time SendStartWorker() is called.
This is because we expect that RenderProcessHostImpl::Init() to have
been called already, which constructs the DispatcherHost and posts a
task to the IO thread to initialize the DispatcherHost. So by the time
OnProcessAllocated() is called on the IO thread, which eventually leads
to SendStartWorker() on the IO thread, the DispatcherHost should have
been created.

Bug:  732729 ,  738310 
Change-Id: I4e46a2265c059724849a90bfbfc68b3bfdbf5be8
Reviewed-on: https://chromium-review.googlesource.com/558599
Commit-Queue: Matt Falkenhagen <falken@chromium.org>
Reviewed-by: Tsuyoshi Horo <horo@chromium.org>
Reviewed-by: Kinuko Yasuda <kinuko@chromium.org>
Reviewed-by: Makoto Shimazu <shimazu@chromium.org>
Cr-Commit-Position: refs/heads/master@{#484238}
[modify] https://crrev.com/28232bff5f48813f417f3d808a68a0858f1765ad/content/browser/service_worker/embedded_worker_instance.cc
[modify] https://crrev.com/28232bff5f48813f417f3d808a68a0858f1765ad/content/browser/service_worker/embedded_worker_instance.h
[modify] https://crrev.com/28232bff5f48813f417f3d808a68a0858f1765ad/content/browser/service_worker/embedded_worker_instance_unittest.cc

Owner: falken@chromium.org
Project Member

Comment 6 by bugdroid1@chromium.org, Jul 7 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/daf95c19c43619e2b49d807e2b31e771512f2a4b

commit daf95c19c43619e2b49d807e2b31e771512f2a4b
Author: Matt Falkenhagen <falken@chromium.org>
Date: Fri Jul 07 04:23:01 2017

service worker: Remove some unnecessary thread hops during startup.

Before this patch embedded worker startup did "allocate process" (UI),
"handle result" (IO), "register to devtools" (UI), "complete" (IO).

Now it does: "allocate process and register to devtools" (UI),
"complete" (IO).

This fixes a failing CHECK(rph) when the allocated RenderProcessHost has
been destroyed by the time we return to the UI thread to check it.

Bug:  738310 ,  732729 , 561209
Change-Id: I70b9b2512a01997e56db64121a5e9f3543d3f37a
Reviewed-on: https://chromium-review.googlesource.com/561027
Commit-Queue: Matt Falkenhagen <falken@chromium.org>
Reviewed-by: Makoto Shimazu <shimazu@chromium.org>
Cr-Commit-Position: refs/heads/master@{#484831}
[modify] https://crrev.com/daf95c19c43619e2b49d807e2b31e771512f2a4b/content/browser/service_worker/embedded_worker_instance.cc
[modify] https://crrev.com/daf95c19c43619e2b49d807e2b31e771512f2a4b/content/browser/service_worker/embedded_worker_instance.h
[modify] https://crrev.com/daf95c19c43619e2b49d807e2b31e771512f2a4b/content/browser/service_worker/embedded_worker_instance_unittest.cc
[modify] https://crrev.com/daf95c19c43619e2b49d807e2b31e771512f2a4b/content/browser/service_worker/service_worker_process_manager.cc
[modify] https://crrev.com/daf95c19c43619e2b49d807e2b31e771512f2a4b/content/browser/service_worker/service_worker_process_manager.h
[modify] https://crrev.com/daf95c19c43619e2b49d807e2b31e771512f2a4b/content/browser/service_worker/service_worker_process_manager_unittest.cc
[modify] https://crrev.com/daf95c19c43619e2b49d807e2b31e771512f2a4b/tools/metrics/histograms/enums.xml

The NextAction date has arrived: 2017-07-07
Blocking: 683037
This possibly blocks script streaming, per shimazu@.
Status: Started (was: Assigned)
Still investigating why browser-side provider host caused the crash regression in issue 737070.

It looks like even before the patch, we would hit that UAF.

One theory is that the patch allowed holding on the SWVersion earlier by setting the hosted version in SWProviderHost earlier. So the Version would normally be destructed before getting stalled in stopping.

But when I apply the patch and exercise the stopped in stopping code path, a crash occurs under ASAN that doesn't happen without the patch. So something did seem to regress here, somehow.
I'm picking this up again.

I'm confused why the RemoveProviderHost Mojo connection error callback is called upon EmbeddedWorkerInstance destruction after the browser-side provider host creation patch, but not before the patch.

It's called synchronously in EmbeddedWorkerInstance's destructor, so it's not about the renderer process dying and triggering the callback.

My mental model was that the SWProviderHost's |binding_| is tied to the renderer, so when the renderer disconnects it, the error callback is called. But according to this crash, the browser side is calling the callback right when EWInstance gets destroyed.

#1 0x7fbefa2c5c47 content::(anonymous namespace)::RemoveProviderHost()
#2 0x7fbefa2b9604 _ZN4base8internal7InvokerINS0_9BindStateIPFvNS_7WeakPtrIN7content24ServiceWorkerInternalsUIEEEiNS4_23ServiceWorkerStatusCodeEEJS6_iS7_EEEFvvEE3RunEPNS0_13BindStateBaseE
#3 0x7fbefbcce4ed mojo::InterfaceEndpointClient::NotifyError()
#4 0x7fbefbcd783f mojo::internal::MultiplexRouter::ProcessNotifyErrorTask()
#5 0x7fbefbcd4f5f mojo::internal::MultiplexRouter::ProcessTasks()
#6 0x7fbefbcd377e mojo::internal::MultiplexRouter::OnPipeConnectionError()
#7 0x7fbefbcd602b mojo::internal::MultiplexRouter::CloseMessagePipe()
#8 0x7fbefbccfa1b mojo::internal::InterfacePtrStateBase::~InterfacePtrStateBase()
#9 0x7fbefa264a35 content::EmbeddedWorkerInstance::~EmbeddedWorkerInstance()
#10 0x7fbefa264ad9 content::EmbeddedWorkerInstance::~EmbeddedWorkerInstance()

I guess somehow when EWInstance did this:
  client_->StartWorker(*params, std::move(pending_dispatcher_request_),
                       std::move(pending_installed_scripts_info_),
                       std::move(host_ptr_info),
                       std::move(provider_info));

The |provider_info| starts "living" in |client_|, so when |client_| gets destroyed it triggers the callback?

Ah yep that was it! Manually resetting client_ causes the callback to be called.

So that's probably why it happens with this patch and not without. |client_| starts owning |provider_info| with this patch. Whereas without this patch, we must have the ServiceWorkerNetworkProvider die or render process get killed which isn't going to synchronously happen just by destructing EmbeddedWorkerInstance.

Comment 12 by leon....@intel.com, Jul 16 2017

Another possible crash path from EmbeddedWorkerInstance dtor:

EmbeddedWorkerInstance has a member ProviderInfoGetter provider_info_getter_, which may be non-null for some while(be set at Start() and used on SendStartWorker()), during this period if EmbeddedWorkerInstance got destroyed, provider_info_getter_ will destroy and the tied ServiceWorkerProviderHost instance will also destroy, thus ServiceWorkerProviderHost's |binding_| will destroy, triggering the connection  error handler RemoveProviderHost().

The above OnceCallback is created by following codes:
"
std::unique_ptr<ServiceWorkerProviderHost> pending_provider_host =
    ServiceWorkerProviderHost::PreCreateForController(context());

base::BindOnce(&CompleteProviderHostPreparation, base::Unretained(this),
    base::Passed(&pending_provider_host), context()),
"
I suppose we should avoid binding ServiceWorkerProviderHost's |binding_| inside ServiceWorkerProviderHost::PreCreateForController()?
Seems the binding should be delayed until CompleteProviderHostPreparation() execution.

The lifetimes and bindings are so complicated :(

Is the crash the DCHECK in RemoveProviderHost()?

  if (!context->GetProviderHost(process_id, provider_id)) {
    DCHECK(IsBrowserSideNavigationEnabled() &&
           ServiceWorkerUtils::IsBrowserAssignedProviderId(provider_id));
  }

I think we could just remove that DCHECK and say sometimes the function is called even though the provider host isn't added yet.

For the crash where provider host destruction causes service worker version destruction, I think we just have to be careful that whenever resetting SWVersion's |embedded_worker_|, we protect |this| with a ref ptr.
FYI, I tried to repro the crash described by destructing the EWInstance between Start() and SendStartWorker() and could not get it to occur.
Labels: -M-61 M-62
NextAction: ----
Planning to reland in M62.

Comment 16 by shimazu@google.com, Jul 24 2017

Re c#12:
I might be missing something, but IIUC |binding_| is connected in CompleteStartWorkerPreparation which is called in SWVersion::CompleteProviderHostPreparation (and it's the |provider_info_getter|.). That might be just what you mentioned. 

Comment 18 by leon....@intel.com, Jul 26 2017

Sorry for my misunderstanding about c#12, actually ServiceWorkerProviderHost::PreCreateForController() does not bind ServiceWorkerProviderHost's |binding_| because of following codes inside ServiceWorkerProviderHost ctor:

  // |client_| and |binding_| will be bound on CompleteNavigationInitialized
  // (PlzNavigate) or on CompleteStartWorkerPreparation (providers for
  // controller).
  if (!info_.client_ptr_info.is_valid() && !info_.host_request.is_pending()) {
    DCHECK(IsBrowserSideNavigationEnabled() ||
           info_.type == SERVICE_WORKER_PROVIDER_FOR_CONTROLLER);
    return;
  }
NextAction: 2017-07-31
There was a spike of unrelated crashes probably due to issue 748996.

One possible regression: there are suddenly 2 crashes from 2 clients for:[Renderer kill 143] content::ServiceWorkerDispatcherHost::OnEnableNavigationPreload:

https://crash.corp.google.com/browse?q=Product.name%20contains%20%27Chrome%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27%5BRenderer%20kill%20143%5D%20content%3A%3AServiceWorkerDispatcherHost%3A%3AOnEnableNavigationPreload%27&ignore_case=false&enable_rewrite=false&omit_field_name=&omit_field_value=&omit_field_opt=

Still going to watch the dashboard.
The NextAction date has arrived: 2017-07-31
NextAction: ----
I think the reland is sticking. I'm mostly worried about the OnEnableNavigationPreload crashes trickling in. but that can be tracked in another bug.
Status: Fixed (was: Started)
Thanks a lot for relanding it with fixing weird bugs!

Cc: falken@chromium.org
Owner: shimazu@chromium.org
No worries. Ah, shimazu@ was the real fixer of the bug with an impressive patch, so updating owner :)

Sign in to add a comment