New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 620845 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
please use my google.com address
Closed: Jun 2016
Cc:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug



Sign in to add a comment

Move GPU process mojo initialization from GpuChildThread::Init to GpuChildThread::OnInitialize

Project Member Reported by piman@chromium.org, Jun 16 2016

Issue description

https://codereview.chromium.org/1297953004 added mojo initialization to GpuChildThread::Init so that mojo can be used with the GPU process.

The problem is that it is too early. Among others, we haven't received the GpuPreferences yet, so we can't fully initialize yet. This could cause races, for example ArcGpuVideoDecodeAccelerator::Initialize could be called over mojo, which in turns calls GpuVideoDecodeAcceleratorFactory::CreateVDA, which in turns calls GpuChildThread::current()->gpu_preferences() - this may or may not be valid depending on the order between the mojo request and the GpuMsg_Initialize message.

There's other reasons to wait for full initialization, for example we may be unable to initialize GL, at which point we will terminate the GPU process - starting processing mojo requests before we know whether we'll be staying alive is counter-productive - delays the fallback.


I do not fully understand the implications of moving this code: will mojo requests be delayed until the GpuProcessControlImpl is created? Or do we need to delay creation of the MojoApplication and/or the MojoShellConnectionImpl initialization that is done in ChildThreadImpl?
 

Comment 1 by xhw...@chromium.org, Jun 20 2016

Cc: sande...@chromium.org
At the browser side, we try to connect to the service through ServiceRegistry() [1] immediately after calling GpuProcessHost::Get() [2]. On the GPU side, since MojoApplication is created in ChildThreadImpl::Init() [3], and MojoApplicaiton hosts the ServiceRegistryImpl [4], I suppose the request could arrive any time after ChildThreadImpl::Init() is called. If the service is not registered, the request will be dropped, causing failure [5]. (I vaguely remember that previously ServiceRegistryImpl could save the requests and handle then later, but it seems not the case today.)

That's why we currently register the service in GpuChildThead::Init().

To fix it, we can require that all mojo service requests have to be connected after the GPU process is fully initialized (i.e. OnInitialize()). This means that instead of connecting the service immediately after calling GpuProcessHost::Get(), we should wait and/or check whether the GPU process has been fully initialized (after GpuProcessHOst::OnInitialized()). But on the current API I don't see how we can wait for OnInitialized()...

If we decide to do that, maybe GpuProcessHost::GetServiceRegistry() should only return a non-null value after GpuProcessHOst::OnInitialized() so that we can enforce this rule.

[1] https://cs.chromium.org/chromium/src/content/browser/mojo/mojo_shell_context.cc?rcl=0&l=106
[2] https://cs.chromium.org/chromium/src/content/browser/mojo/mojo_shell_context.cc?rcl=0&l=94
[3] https://cs.chromium.org/chromium/src/content/child/child_thread_impl.cc?rcl=1466421383&l=409
[4] https://cs.chromium.org/chromium/src/content/child/mojo/mojo_application.h?rcl=1466421383&l=31
[5] https://cs.chromium.org/chromium/src/content/common/mojo/service_registry_impl.cc?rcl=0&l=84

Comment 2 by piman@chromium.org, Jun 20 2016

Ok, from reading the code, it looks like:
1- ServiceRegistryImpl::ConnectToRemoteService (therefore RequestGpuProcessControl) is asynchronous
2- ServiceRegistryImpl::ConnectToRemoteService doesn't forward the request until it has a remote_provider_
3- remote_provider_ is setup when ServiceRegistryImpl::BindRemoteServiceProvider is called
4- ServiceRegistryImpl::BindRemoteServiceProvider is called when ApplicationSetupImpl::ExchangeInterfaceProviders is called, where ApplicationSetupImpl is created by the MojoApplicationHost
5- ApplicationSetupImpl::ExchangeInterfaceProviders is a mojo RPC, called by MojoApplication::InitWithToken, which is, currently, called from ChildThreadImpl::Init

So there's 2 possibilities:
A- RequestGpuProcessControl being asynchronous, it could delay the request until after OnInitialize is called.
B- If we delay the MojoApplication::InitWithToken until GpuChildThread::OnInitialize, then it should ensure the connection request isn't sent too early.


I think I prefer B, because it makes the calling code more scalable as we expose more services through mojo. Thoughts?


There is a remote third possibility, which is that if we ensure the ordering of mojo requests relative to IPC messages (both sent from the IO thread), then we're essentially safe, because GpuMsg_Initialize is sent from GpuProcessHost::Init, therefore would be sent before GpuProcessHost::Get returns, and the ordering would guarantee that the mojo requests would only happen after GpuChildThread::OnInitialize is called. @rockot: is that a valid assumption?

Comment 3 by roc...@chromium.org, Jun 20 2016

It's not currently a valid assumption since we aren't using a the mojo IPC::Channel implementation for the GPU process. If we were, it would technically be a valid assumption, but it would rely on a subtle implementation detail -- namely that both types of messages are going over the same OS pipe.

However until we have additional support for associating interfaces (i.e. sharing a message pipe) with the IPC Channel, which I'm hoping to land this week, it still isn't enough just to guarantee that the IPCs are *sent* in the right order. They may still be dispatched in a different order if they're on different logical message pipes.

I may not fully understand the problem, but to me it seems like a reasonable solution would be to only change the gpu side of things: when RequestGpuProcessControl is dispatched, if the process has been sufficiently initialized it can bind the request immediately. Otherwise it can stash the request and bind it once initialization is complete.

Comment 4 by xhw...@chromium.org, Jun 20 2016

mojo_application_->InitWithToken() is done in ChildThreadImpl, so changing that would change all processes using ChildThreadImpl, e.g. Render process. I am not sure whether we want to do that.

It seems to me (A) would be the simplest solution. Though I agree it's not as generic as (B).

Another possibility is for individual services running in the GPU process to check whether the process has been initialized, and/or whether GpuChildThread::current()->gpu_preferences() is available. If not, then it could wait for the process to be initialized to start or initialize. Is that possible?

Comment 5 by piman@chromium.org, Jun 20 2016

It looks like there are many callers to ConnectToRemoteService for the GPU process, so option (A) seems bad. Each of them would need a special fix... In particular, content::GetGpuServiceRegistry could not work as such - if we can't delay requests, we cannot satisfy the ServiceRegistry contract.

So I suggest moving forward with option (B).

Comment 6 by piman@chromium.org, Jun 20 2016

There's only a very limited number of process types. Each of them can make sure to call a ChildThreadImpl::InitializeMojo (which would do the mojo_application_->InitWithToken() ) explicitly.

It's also a possibility to add a delay_initialize_mojo flag to ChildThreadImpl::Init so that only the GPU process is affected.

I do not want every single service to have to do special handling. This is a sure way to miss things and cause races, doubly so because there's no test. It doesn't scale. Let's make sure the low-level infrastructure gives the right semantics from the get go, instead of pushing complexity to the higher levels.

Comment 7 by xhw...@chromium.org, Jun 20 2016

I agree a generic solution would be nicer if it's not too complicated.

I won't have time to work on this any time soon. piman@, would you like to take this issue?

Comment 8 by piman@chromium.org, Jun 21 2016

xhwang: I don't really have cycles either. Who is the maintainer for mojo-in-gpu-process?

Comment 9 by xhw...@chromium.org, Jun 21 2016

I don't think there's an official maintainer for this. I added it because we need it to run media mojo services in the GPU process.

Actually how will ArcGpuVideoDecodeAccelerator be blocked by this? Is this related to VDAv2 (mojo Video decoder) work that's being implemented by sandersd@? I wonder how blocked you are and what's the timeline for the fix?

Comment 10 by piman@chromium.org, Jun 21 2016

1- There is a race in production code. We need to fix it.
2- We need to have a good foundation to be able to move GPU code to mojo. Both ArcGpuVideoDecodeAccelerator and VDAv2 are affected, as well as anything else that wants to build on mojo.
3- We have several GPU process startup races (e.g.  crbug.com/613229 ), so I would like to be able to remove causes all causes of flakiness. This is one.
4- There is refactoring that depends on it, e.g.  crbug.com/597150  , which blocks further refactoring (I want to remove GpuChildThread::current() because it's a bad pattern and is a suspected cause of some other bugs).

> I added it because we need it to run media mojo services in the GPU process.
Well, there is a bug in the implementation. Can you please take ownership of your code, or find someone else who can?
Cc: -roc...@chromium.org xhw...@chromium.org
Owner: roc...@chromium.org
Status: Assigned (was: Untriaged)
I will take a look at this within ~a week if nobody else has cycles. I haven't looked too closely at it yet and I'd like to understand what's going on anyway. I suspect a suitable fix is not very complicated at all.
Labels: OS-All
Thanks rockot for taking this!

Re #10: Ah, now I see what the problem is (in production). I totally had no idea of the existence of ArcGpuVideoDecodeAccelerator. It's registered as a mojo service in GpuChildThread's ServiceRegistry directly [1]. So this is totally in parallel with the ProcessControl service I added. (I got confused since GpuProcessControlImpl was mentioned in the original bug and I was assigned.)

In other words, the root issue is that ServiceRegistryImpl is initialized during ChildThreadImpl::Init() time, and for GPU thread, this is too early for some services, e.g GpuArcVideoService.

This has nothing to do with GpuProcessControlImpl, and I don't think currently we have similar issues in mojo:media services in production. But as piman@ said, we could run into similar issues in the future given we'll run VDAv2 with GpuProcessControlImpl, which could have similar requirements as ArcGpuVideoDecodeAccelerator.

With that, fixing GpuProcessControlImpl won't work at all. (B) seems the only viable solution that at least for some process type, ServiceRegistryImpl initialization should be delayed until the process is fully initialized.

[1] https://cs.chromium.org/chromium/src/chrome/gpu/chrome_content_gpu_client.cc?rcl=0&l=37
Cc: j.iso...@samsung.com
I took a look at this and there is indeed a very simple solution, which is to simply not bind the ServiceRegistry's InterfaceProvider pipe until we're actually ready to start servicing interface requests. This doesn't require changing anything about how/when interfaces are registered in the GPU process or when connection requests are issued from GPU clients.

Unless this is really urgent though -- and it doesn't sound like it is unless we're branching unexpectedly early -- I'm going to wait to fix this until Ben lands a relatively large refactoring patch that's nearly ready. So a day or two probably.
Project Member

Comment 15 by bugdroid1@chromium.org, Jun 29 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/caccf78eb490815d2cf6f0daea68b6cf4c153302

commit caccf78eb490815d2cf6f0daea68b6cf4c153302
Author: rockot <rockot@chromium.org>
Date: Wed Jun 29 23:39:16 2016

Defer Mojo interface binding in the gpu process until OnInitialize

Prevents the GPU process's browser-facing InterfaceRegistry from
dispatching incoming interface requests until
GpuChildThread::OnInitialize has been invoked.

BUG= 620845 
R=ben@chromium.org

Review-Url: https://codereview.chromium.org/2108343002
Cr-Commit-Position: refs/heads/master@{#402996}

[modify] https://crrev.com/caccf78eb490815d2cf6f0daea68b6cf4c153302/content/gpu/gpu_child_thread.cc
[modify] https://crrev.com/caccf78eb490815d2cf6f0daea68b6cf4c153302/content/gpu/gpu_child_thread.h
[modify] https://crrev.com/caccf78eb490815d2cf6f0daea68b6cf4c153302/services/shell/public/cpp/interface_registry.h
[modify] https://crrev.com/caccf78eb490815d2cf6f0daea68b6cf4c153302/services/shell/public/cpp/lib/interface_registry.cc

Status: Fixed (was: Assigned)
Project Member

Comment 17 by bugdroid1@chromium.org, Jun 30 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/baef2ce0540b155429c40995bd88000bb9b6487f

commit baef2ce0540b155429c40995bd88000bb9b6487f
Author: kcwu <kcwu@chromium.org>
Date: Thu Jun 30 15:41:00 2016

Fix racing which may pause gpu process's interface registry forever

AcceptConnection and OnInitialize are racing. Only pause the interface
registry if OnInitialize is not invoked.

TEST=arc video playback well
BUG= 620845 

Review-Url: https://codereview.chromium.org/2112493003
Cr-Commit-Position: refs/heads/master@{#403185}

[modify] https://crrev.com/baef2ce0540b155429c40995bd88000bb9b6487f/content/gpu/gpu_child_thread.cc

Project Member

Comment 18 by bugdroid1@chromium.org, Jun 30 2016

Labels: merge-merged-2784
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/380808c5d58d14b23a3c4673e8c7bc6940ccaafc

commit 380808c5d58d14b23a3c4673e8c7bc6940ccaafc
Author: Grace Kihumba <gkihumba@google.com>
Date: Thu Jun 30 19:21:22 2016

Fix racing which may pause gpu process's interface registry forever

AcceptConnection and OnInitialize are racing. Only pause the interface
registry if OnInitialize is not invoked.

TEST=arc video playback well
BUG= 620845 

Review-Url: https://codereview.chromium.org/2112493003
Cr-Commit-Position: refs/heads/master@{#403185}
(cherry picked from commit baef2ce0540b155429c40995bd88000bb9b6487f)

Review URL: https://codereview.chromium.org/2116623002 .

Cr-Commit-Position: refs/branch-heads/2784@{#4}
Cr-Branched-From: 642ae4ba1dc7b04a1d480d733cdced28de720e00-refs/heads/master@{#403038}

[modify] https://crrev.com/380808c5d58d14b23a3c4673e8c7bc6940ccaafc/content/gpu/gpu_child_thread.cc

Sign in to add a comment