Move GPU process mojo initialization from GpuChildThread::Init to GpuChildThread::OnInitialize |
||||||
Issue descriptionhttps://codereview.chromium.org/1297953004 added mojo initialization to GpuChildThread::Init so that mojo can be used with the GPU process. The problem is that it is too early. Among others, we haven't received the GpuPreferences yet, so we can't fully initialize yet. This could cause races, for example ArcGpuVideoDecodeAccelerator::Initialize could be called over mojo, which in turns calls GpuVideoDecodeAcceleratorFactory::CreateVDA, which in turns calls GpuChildThread::current()->gpu_preferences() - this may or may not be valid depending on the order between the mojo request and the GpuMsg_Initialize message. There's other reasons to wait for full initialization, for example we may be unable to initialize GL, at which point we will terminate the GPU process - starting processing mojo requests before we know whether we'll be staying alive is counter-productive - delays the fallback. I do not fully understand the implications of moving this code: will mojo requests be delayed until the GpuProcessControlImpl is created? Or do we need to delay creation of the MojoApplication and/or the MojoShellConnectionImpl initialization that is done in ChildThreadImpl?
,
Jun 20 2016
Ok, from reading the code, it looks like: 1- ServiceRegistryImpl::ConnectToRemoteService (therefore RequestGpuProcessControl) is asynchronous 2- ServiceRegistryImpl::ConnectToRemoteService doesn't forward the request until it has a remote_provider_ 3- remote_provider_ is setup when ServiceRegistryImpl::BindRemoteServiceProvider is called 4- ServiceRegistryImpl::BindRemoteServiceProvider is called when ApplicationSetupImpl::ExchangeInterfaceProviders is called, where ApplicationSetupImpl is created by the MojoApplicationHost 5- ApplicationSetupImpl::ExchangeInterfaceProviders is a mojo RPC, called by MojoApplication::InitWithToken, which is, currently, called from ChildThreadImpl::Init So there's 2 possibilities: A- RequestGpuProcessControl being asynchronous, it could delay the request until after OnInitialize is called. B- If we delay the MojoApplication::InitWithToken until GpuChildThread::OnInitialize, then it should ensure the connection request isn't sent too early. I think I prefer B, because it makes the calling code more scalable as we expose more services through mojo. Thoughts? There is a remote third possibility, which is that if we ensure the ordering of mojo requests relative to IPC messages (both sent from the IO thread), then we're essentially safe, because GpuMsg_Initialize is sent from GpuProcessHost::Init, therefore would be sent before GpuProcessHost::Get returns, and the ordering would guarantee that the mojo requests would only happen after GpuChildThread::OnInitialize is called. @rockot: is that a valid assumption?
,
Jun 20 2016
It's not currently a valid assumption since we aren't using a the mojo IPC::Channel implementation for the GPU process. If we were, it would technically be a valid assumption, but it would rely on a subtle implementation detail -- namely that both types of messages are going over the same OS pipe. However until we have additional support for associating interfaces (i.e. sharing a message pipe) with the IPC Channel, which I'm hoping to land this week, it still isn't enough just to guarantee that the IPCs are *sent* in the right order. They may still be dispatched in a different order if they're on different logical message pipes. I may not fully understand the problem, but to me it seems like a reasonable solution would be to only change the gpu side of things: when RequestGpuProcessControl is dispatched, if the process has been sufficiently initialized it can bind the request immediately. Otherwise it can stash the request and bind it once initialization is complete.
,
Jun 20 2016
mojo_application_->InitWithToken() is done in ChildThreadImpl, so changing that would change all processes using ChildThreadImpl, e.g. Render process. I am not sure whether we want to do that. It seems to me (A) would be the simplest solution. Though I agree it's not as generic as (B). Another possibility is for individual services running in the GPU process to check whether the process has been initialized, and/or whether GpuChildThread::current()->gpu_preferences() is available. If not, then it could wait for the process to be initialized to start or initialize. Is that possible?
,
Jun 20 2016
It looks like there are many callers to ConnectToRemoteService for the GPU process, so option (A) seems bad. Each of them would need a special fix... In particular, content::GetGpuServiceRegistry could not work as such - if we can't delay requests, we cannot satisfy the ServiceRegistry contract. So I suggest moving forward with option (B).
,
Jun 20 2016
There's only a very limited number of process types. Each of them can make sure to call a ChildThreadImpl::InitializeMojo (which would do the mojo_application_->InitWithToken() ) explicitly. It's also a possibility to add a delay_initialize_mojo flag to ChildThreadImpl::Init so that only the GPU process is affected. I do not want every single service to have to do special handling. This is a sure way to miss things and cause races, doubly so because there's no test. It doesn't scale. Let's make sure the low-level infrastructure gives the right semantics from the get go, instead of pushing complexity to the higher levels.
,
Jun 20 2016
I agree a generic solution would be nicer if it's not too complicated. I won't have time to work on this any time soon. piman@, would you like to take this issue?
,
Jun 21 2016
xhwang: I don't really have cycles either. Who is the maintainer for mojo-in-gpu-process?
,
Jun 21 2016
I don't think there's an official maintainer for this. I added it because we need it to run media mojo services in the GPU process. Actually how will ArcGpuVideoDecodeAccelerator be blocked by this? Is this related to VDAv2 (mojo Video decoder) work that's being implemented by sandersd@? I wonder how blocked you are and what's the timeline for the fix?
,
Jun 21 2016
1- There is a race in production code. We need to fix it. 2- We need to have a good foundation to be able to move GPU code to mojo. Both ArcGpuVideoDecodeAccelerator and VDAv2 are affected, as well as anything else that wants to build on mojo. 3- We have several GPU process startup races (e.g. crbug.com/613229 ), so I would like to be able to remove causes all causes of flakiness. This is one. 4- There is refactoring that depends on it, e.g. crbug.com/597150 , which blocks further refactoring (I want to remove GpuChildThread::current() because it's a bad pattern and is a suspected cause of some other bugs). > I added it because we need it to run media mojo services in the GPU process. Well, there is a bug in the implementation. Can you please take ownership of your code, or find someone else who can?
,
Jun 21 2016
I will take a look at this within ~a week if nobody else has cycles. I haven't looked too closely at it yet and I'd like to understand what's going on anyway. I suspect a suitable fix is not very complicated at all.
,
Jun 21 2016
Thanks rockot for taking this! Re #10: Ah, now I see what the problem is (in production). I totally had no idea of the existence of ArcGpuVideoDecodeAccelerator. It's registered as a mojo service in GpuChildThread's ServiceRegistry directly [1]. So this is totally in parallel with the ProcessControl service I added. (I got confused since GpuProcessControlImpl was mentioned in the original bug and I was assigned.) In other words, the root issue is that ServiceRegistryImpl is initialized during ChildThreadImpl::Init() time, and for GPU thread, this is too early for some services, e.g GpuArcVideoService. This has nothing to do with GpuProcessControlImpl, and I don't think currently we have similar issues in mojo:media services in production. But as piman@ said, we could run into similar issues in the future given we'll run VDAv2 with GpuProcessControlImpl, which could have similar requirements as ArcGpuVideoDecodeAccelerator. With that, fixing GpuProcessControlImpl won't work at all. (B) seems the only viable solution that at least for some process type, ServiceRegistryImpl initialization should be delayed until the process is fully initialized. [1] https://cs.chromium.org/chromium/src/chrome/gpu/chrome_content_gpu_client.cc?rcl=0&l=37
,
Jun 24 2016
,
Jun 24 2016
I took a look at this and there is indeed a very simple solution, which is to simply not bind the ServiceRegistry's InterfaceProvider pipe until we're actually ready to start servicing interface requests. This doesn't require changing anything about how/when interfaces are registered in the GPU process or when connection requests are issued from GPU clients. Unless this is really urgent though -- and it doesn't sound like it is unless we're branching unexpectedly early -- I'm going to wait to fix this until Ben lands a relatively large refactoring patch that's nearly ready. So a day or two probably.
,
Jun 29 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/caccf78eb490815d2cf6f0daea68b6cf4c153302 commit caccf78eb490815d2cf6f0daea68b6cf4c153302 Author: rockot <rockot@chromium.org> Date: Wed Jun 29 23:39:16 2016 Defer Mojo interface binding in the gpu process until OnInitialize Prevents the GPU process's browser-facing InterfaceRegistry from dispatching incoming interface requests until GpuChildThread::OnInitialize has been invoked. BUG= 620845 R=ben@chromium.org Review-Url: https://codereview.chromium.org/2108343002 Cr-Commit-Position: refs/heads/master@{#402996} [modify] https://crrev.com/caccf78eb490815d2cf6f0daea68b6cf4c153302/content/gpu/gpu_child_thread.cc [modify] https://crrev.com/caccf78eb490815d2cf6f0daea68b6cf4c153302/content/gpu/gpu_child_thread.h [modify] https://crrev.com/caccf78eb490815d2cf6f0daea68b6cf4c153302/services/shell/public/cpp/interface_registry.h [modify] https://crrev.com/caccf78eb490815d2cf6f0daea68b6cf4c153302/services/shell/public/cpp/lib/interface_registry.cc
,
Jun 29 2016
,
Jun 30 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/baef2ce0540b155429c40995bd88000bb9b6487f commit baef2ce0540b155429c40995bd88000bb9b6487f Author: kcwu <kcwu@chromium.org> Date: Thu Jun 30 15:41:00 2016 Fix racing which may pause gpu process's interface registry forever AcceptConnection and OnInitialize are racing. Only pause the interface registry if OnInitialize is not invoked. TEST=arc video playback well BUG= 620845 Review-Url: https://codereview.chromium.org/2112493003 Cr-Commit-Position: refs/heads/master@{#403185} [modify] https://crrev.com/baef2ce0540b155429c40995bd88000bb9b6487f/content/gpu/gpu_child_thread.cc
,
Jun 30 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/380808c5d58d14b23a3c4673e8c7bc6940ccaafc commit 380808c5d58d14b23a3c4673e8c7bc6940ccaafc Author: Grace Kihumba <gkihumba@google.com> Date: Thu Jun 30 19:21:22 2016 Fix racing which may pause gpu process's interface registry forever AcceptConnection and OnInitialize are racing. Only pause the interface registry if OnInitialize is not invoked. TEST=arc video playback well BUG= 620845 Review-Url: https://codereview.chromium.org/2112493003 Cr-Commit-Position: refs/heads/master@{#403185} (cherry picked from commit baef2ce0540b155429c40995bd88000bb9b6487f) Review URL: https://codereview.chromium.org/2116623002 . Cr-Commit-Position: refs/branch-heads/2784@{#4} Cr-Branched-From: 642ae4ba1dc7b04a1d480d733cdced28de720e00-refs/heads/master@{#403038} [modify] https://crrev.com/380808c5d58d14b23a3c4673e8c7bc6940ccaafc/content/gpu/gpu_child_thread.cc |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by xhw...@chromium.org
, Jun 20 2016