Create ServiceWorkerProviderHost for the service worker on the browser process before starting the worker |
|||||||||
Issue descriptionThe patch was landed (https://crrev.com/2779763004) but it caused an Issue 736649. This is an issue to track the fix of the bug and reland the patch.
,
Jun 30 2017
I believe it also caused issue 737070.
,
Jul 5 2017
Actually the CHECK(rph) is in fact firing in a couple crashes: https://crash.corp.google.com/browse?q=product.version%20%3E%3D%20%2761.0.3143.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27content%3A%3A%60anonymous%20namespace%5C%27%3A%3ASetupOnUI%27%20OMIT%20RECORD%20IF%20SUM(CrashedStackTrace.StackFrame.FunctionName%20CONTAINS%20%27SetupOnUI%27)%20%3D%200&ignore_case=false&enable_rewrite=false&omit_field_name=&omit_field_value=&omit_field_opt=&stbtiq=&reportid=&index=0 Verified that CHECK(pending_dispatcher_request_.is_pending()) has never fired in a crash report. https://crash.corp.google.com/browse?q=product.version%20%3E%3D%20%2761.0.3143.0%27%20OMIT%20RECORD%20IF%20SUM(CrashedStackTrace.StackFrame.FunctionName%20CONTAINS%20%27SendStartWorker%27)%20%3D%200&ignore_case=false&enable_rewrite=false&omit_field_name=&omit_field_value=&omit_field_opt=&stbtiq=&reportid=&index=0
,
Jul 5 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/28232bff5f48813f417f3d808a68a0858f1765ad commit 28232bff5f48813f417f3d808a68a0858f1765ad Author: Matt Falkenhagen <falken@chromium.org> Date: Wed Jul 05 10:57:32 2017 service worker: Handle renderer crash by checking for SWDispacherHost Before this patch, if the renderer crash was detected by EmbeddedWorkerRegistry in between OnProcessAllocated and SendStartWorker, the embedded worker would never know because EmbeddedWorkerRegistry::RemoveProcess would not know to call EmbeddedWorkerInstance::OnDetach() because the worker was not yet registered for the process via EmbeddedWorkerRegistry::BindWorkerToProcess. See bug for more details. The solution is to check for the existence of ServiceWorkerDispatcherHost in SendStartWorker. My reasoning is as follows: 1) If the ServiceWorkerDispatcherHost is removed before SendStartWorker, we’ll discover it in SendStartWorker and bail. 2) If the ServiceWorkerDispatcherHost is removed after SendStartWorker, then EmbeddedWorkerRegistry will call Detach() since SendStartWorker called BindWorkerToProcess. 3) We expect that ServiceWorkerDispatcherHost be created and registered to ServiceWorkerContextCore by the time SendStartWorker() is called. This is because we expect that RenderProcessHostImpl::Init() to have been called already, which constructs the DispatcherHost and posts a task to the IO thread to initialize the DispatcherHost. So by the time OnProcessAllocated() is called on the IO thread, which eventually leads to SendStartWorker() on the IO thread, the DispatcherHost should have been created. Bug: 732729 , 738310 Change-Id: I4e46a2265c059724849a90bfbfc68b3bfdbf5be8 Reviewed-on: https://chromium-review.googlesource.com/558599 Commit-Queue: Matt Falkenhagen <falken@chromium.org> Reviewed-by: Tsuyoshi Horo <horo@chromium.org> Reviewed-by: Kinuko Yasuda <kinuko@chromium.org> Reviewed-by: Makoto Shimazu <shimazu@chromium.org> Cr-Commit-Position: refs/heads/master@{#484238} [modify] https://crrev.com/28232bff5f48813f417f3d808a68a0858f1765ad/content/browser/service_worker/embedded_worker_instance.cc [modify] https://crrev.com/28232bff5f48813f417f3d808a68a0858f1765ad/content/browser/service_worker/embedded_worker_instance.h [modify] https://crrev.com/28232bff5f48813f417f3d808a68a0858f1765ad/content/browser/service_worker/embedded_worker_instance_unittest.cc
,
Jul 6 2017
,
Jul 7 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/daf95c19c43619e2b49d807e2b31e771512f2a4b commit daf95c19c43619e2b49d807e2b31e771512f2a4b Author: Matt Falkenhagen <falken@chromium.org> Date: Fri Jul 07 04:23:01 2017 service worker: Remove some unnecessary thread hops during startup. Before this patch embedded worker startup did "allocate process" (UI), "handle result" (IO), "register to devtools" (UI), "complete" (IO). Now it does: "allocate process and register to devtools" (UI), "complete" (IO). This fixes a failing CHECK(rph) when the allocated RenderProcessHost has been destroyed by the time we return to the UI thread to check it. Bug: 738310 , 732729 , 561209 Change-Id: I70b9b2512a01997e56db64121a5e9f3543d3f37a Reviewed-on: https://chromium-review.googlesource.com/561027 Commit-Queue: Matt Falkenhagen <falken@chromium.org> Reviewed-by: Makoto Shimazu <shimazu@chromium.org> Cr-Commit-Position: refs/heads/master@{#484831} [modify] https://crrev.com/daf95c19c43619e2b49d807e2b31e771512f2a4b/content/browser/service_worker/embedded_worker_instance.cc [modify] https://crrev.com/daf95c19c43619e2b49d807e2b31e771512f2a4b/content/browser/service_worker/embedded_worker_instance.h [modify] https://crrev.com/daf95c19c43619e2b49d807e2b31e771512f2a4b/content/browser/service_worker/embedded_worker_instance_unittest.cc [modify] https://crrev.com/daf95c19c43619e2b49d807e2b31e771512f2a4b/content/browser/service_worker/service_worker_process_manager.cc [modify] https://crrev.com/daf95c19c43619e2b49d807e2b31e771512f2a4b/content/browser/service_worker/service_worker_process_manager.h [modify] https://crrev.com/daf95c19c43619e2b49d807e2b31e771512f2a4b/content/browser/service_worker/service_worker_process_manager_unittest.cc [modify] https://crrev.com/daf95c19c43619e2b49d807e2b31e771512f2a4b/tools/metrics/histograms/enums.xml
,
Jul 7 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/16a04784ff29ac7f8caf4c6fc52031807f6683ae commit 16a04784ff29ac7f8caf4c6fc52031807f6683ae Author: Matt Falkenhagen <falken@chromium.org> Date: Fri Jul 07 08:13:54 2017 service worker: Cleanups in EmbeddedWorkerInstance::StartTask Refactoring following https://chromium-review.googlesource.com/561027 Bug: 738310 , 732729 Change-Id: Ia3f7a7bf561cad7c405bf2b9e899f8d9e42aded7 Reviewed-on: https://chromium-review.googlesource.com/562938 Commit-Queue: Matt Falkenhagen <falken@chromium.org> Reviewed-by: Kinuko Yasuda <kinuko@chromium.org> Reviewed-by: Makoto Shimazu <shimazu@chromium.org> Reviewed-by: Hiroki Nakagawa <nhiroki@chromium.org> Cr-Commit-Position: refs/heads/master@{#484857} [modify] https://crrev.com/16a04784ff29ac7f8caf4c6fc52031807f6683ae/content/browser/service_worker/embedded_worker_instance.cc [modify] https://crrev.com/16a04784ff29ac7f8caf4c6fc52031807f6683ae/content/browser/service_worker/embedded_worker_instance.h [modify] https://crrev.com/16a04784ff29ac7f8caf4c6fc52031807f6683ae/content/browser/service_worker/service_worker_process_manager.cc [modify] https://crrev.com/16a04784ff29ac7f8caf4c6fc52031807f6683ae/content/browser/service_worker/service_worker_process_manager.h
,
Jul 7 2017
The NextAction date has arrived: 2017-07-07
,
Jul 7 2017
,
Jul 10 2017
Still investigating why browser-side provider host caused the crash regression in issue 737070. It looks like even before the patch, we would hit that UAF. One theory is that the patch allowed holding on the SWVersion earlier by setting the hosted version in SWProviderHost earlier. So the Version would normally be destructed before getting stalled in stopping. But when I apply the patch and exercise the stopped in stopping code path, a crash occurs under ASAN that doesn't happen without the patch. So something did seem to regress here, somehow.
,
Jul 14 2017
I'm picking this up again.
I'm confused why the RemoveProviderHost Mojo connection error callback is called upon EmbeddedWorkerInstance destruction after the browser-side provider host creation patch, but not before the patch.
It's called synchronously in EmbeddedWorkerInstance's destructor, so it's not about the renderer process dying and triggering the callback.
My mental model was that the SWProviderHost's |binding_| is tied to the renderer, so when the renderer disconnects it, the error callback is called. But according to this crash, the browser side is calling the callback right when EWInstance gets destroyed.
#1 0x7fbefa2c5c47 content::(anonymous namespace)::RemoveProviderHost()
#2 0x7fbefa2b9604 _ZN4base8internal7InvokerINS0_9BindStateIPFvNS_7WeakPtrIN7content24ServiceWorkerInternalsUIEEEiNS4_23ServiceWorkerStatusCodeEEJS6_iS7_EEEFvvEE3RunEPNS0_13BindStateBaseE
#3 0x7fbefbcce4ed mojo::InterfaceEndpointClient::NotifyError()
#4 0x7fbefbcd783f mojo::internal::MultiplexRouter::ProcessNotifyErrorTask()
#5 0x7fbefbcd4f5f mojo::internal::MultiplexRouter::ProcessTasks()
#6 0x7fbefbcd377e mojo::internal::MultiplexRouter::OnPipeConnectionError()
#7 0x7fbefbcd602b mojo::internal::MultiplexRouter::CloseMessagePipe()
#8 0x7fbefbccfa1b mojo::internal::InterfacePtrStateBase::~InterfacePtrStateBase()
#9 0x7fbefa264a35 content::EmbeddedWorkerInstance::~EmbeddedWorkerInstance()
#10 0x7fbefa264ad9 content::EmbeddedWorkerInstance::~EmbeddedWorkerInstance()
I guess somehow when EWInstance did this:
client_->StartWorker(*params, std::move(pending_dispatcher_request_),
std::move(pending_installed_scripts_info_),
std::move(host_ptr_info),
std::move(provider_info));
The |provider_info| starts "living" in |client_|, so when |client_| gets destroyed it triggers the callback?
Ah yep that was it! Manually resetting client_ causes the callback to be called.
So that's probably why it happens with this patch and not without. |client_| starts owning |provider_info| with this patch. Whereas without this patch, we must have the ServiceWorkerNetworkProvider die or render process get killed which isn't going to synchronously happen just by destructing EmbeddedWorkerInstance.
,
Jul 16 2017
Another possible crash path from EmbeddedWorkerInstance dtor:
EmbeddedWorkerInstance has a member ProviderInfoGetter provider_info_getter_, which may be non-null for some while(be set at Start() and used on SendStartWorker()), during this period if EmbeddedWorkerInstance got destroyed, provider_info_getter_ will destroy and the tied ServiceWorkerProviderHost instance will also destroy, thus ServiceWorkerProviderHost's |binding_| will destroy, triggering the connection error handler RemoveProviderHost().
The above OnceCallback is created by following codes:
"
std::unique_ptr<ServiceWorkerProviderHost> pending_provider_host =
ServiceWorkerProviderHost::PreCreateForController(context());
base::BindOnce(&CompleteProviderHostPreparation, base::Unretained(this),
base::Passed(&pending_provider_host), context()),
"
I suppose we should avoid binding ServiceWorkerProviderHost's |binding_| inside ServiceWorkerProviderHost::PreCreateForController()?
Seems the binding should be delayed until CompleteProviderHostPreparation() execution.
,
Jul 20 2017
The lifetimes and bindings are so complicated :(
Is the crash the DCHECK in RemoveProviderHost()?
if (!context->GetProviderHost(process_id, provider_id)) {
DCHECK(IsBrowserSideNavigationEnabled() &&
ServiceWorkerUtils::IsBrowserAssignedProviderId(provider_id));
}
I think we could just remove that DCHECK and say sometimes the function is called even though the provider host isn't added yet.
For the crash where provider host destruction causes service worker version destruction, I think we just have to be careful that whenever resetting SWVersion's |embedded_worker_|, we protect |this| with a ref ptr.
,
Jul 20 2017
FYI, I tried to repro the crash described by destructing the EWInstance between Start() and SendStartWorker() and could not get it to occur.
,
Jul 20 2017
Planning to reland in M62.
,
Jul 24 2017
Re c#12: I might be missing something, but IIUC |binding_| is connected in CompleteStartWorkerPreparation which is called in SWVersion::CompleteProviderHostPreparation (and it's the |provider_info_getter|.). That might be just what you mentioned.
,
Jul 26 2017
Hah, I had three bugs associated with the CL but not this one. Landed in 62.0.3166.0 by https://chromium-review.googlesource.com/c/564860/ So far crash dashboard seems OK, will check back tomorrow: https://crash.corp.google.com/browse?q=Product.name%20contains%20%27Chrome%27%20AND%20Product.Version%20%3E%3D%20%2762.0.3166.0%27%20OMIT%20RECORD%20IF%20SUM(CrashedStackTrace.StackFrame.FunctionName%20CONTAINS%20%27ServiceWorker%27)%20%3D%200&ignore_case=false&enable_rewrite=false&omit_field_name=&omit_field_value=&omit_field_opt=#-property-selector,-samplereports:5,magicsignature,+magicsignature2,magicsignaturesorted
,
Jul 26 2017
Sorry for my misunderstanding about c#12, actually ServiceWorkerProviderHost::PreCreateForController() does not bind ServiceWorkerProviderHost's |binding_| because of following codes inside ServiceWorkerProviderHost ctor:
// |client_| and |binding_| will be bound on CompleteNavigationInitialized
// (PlzNavigate) or on CompleteStartWorkerPreparation (providers for
// controller).
if (!info_.client_ptr_info.is_valid() && !info_.host_request.is_pending()) {
DCHECK(IsBrowserSideNavigationEnabled() ||
info_.type == SERVICE_WORKER_PROVIDER_FOR_CONTROLLER);
return;
}
,
Jul 27 2017
There was a spike of unrelated crashes probably due to issue 748996. One possible regression: there are suddenly 2 crashes from 2 clients for:[Renderer kill 143] content::ServiceWorkerDispatcherHost::OnEnableNavigationPreload: https://crash.corp.google.com/browse?q=Product.name%20contains%20%27Chrome%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27%5BRenderer%20kill%20143%5D%20content%3A%3AServiceWorkerDispatcherHost%3A%3AOnEnableNavigationPreload%27&ignore_case=false&enable_rewrite=false&omit_field_name=&omit_field_value=&omit_field_opt= Still going to watch the dashboard.
,
Jul 31 2017
The NextAction date has arrived: 2017-07-31
,
Aug 2 2017
I think the reland is sticking. I'm mostly worried about the OnEnableNavigationPreload crashes trickling in. but that can be tracked in another bug.
,
Aug 2 2017
,
Aug 2 2017
Thanks a lot for relanding it with fixing weird bugs!
,
Aug 2 2017
No worries. Ah, shimazu@ was the real fixer of the bug with an impressive patch, so updating owner :) |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by shimazu@chromium.org
, Jun 30 2017