New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 904556 link

Starred by 2 users

Issue metadata

Status: Available
Owner: ----
Cc:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Chrome , Mac
Pri: 2
Type: Task



Sign in to add a comment

Move BrowserChildProcessHost to UI thread

Project Member Reported by jam@chromium.org, Nov 12

Issue description

The browser-side representation of renderers, RenderProcessHost, lives on the UI thread.

All other child processes are represented by BrowserChildProcessHost, which lives on the IO thread. This is a legacy of windowed NPAPI plugins to avoid deadlocks. We haven't had those for many years.

Making all child processes' objects live on the UI thread would simplify the codebase. As an example, there are 11 callers of BrowserChildProcessHostIterator() that need to iterate on all processes. All of these have code to get data on the UI thread for renderers, and then go to the IO thread to get it for other types, then go back to the UI thread. If everything lives on the UI thread, we'd avoid the thread hops and multiple PostTasks.

I suspect this is too big to do in one cl, so we'd have to have some incremental way of making progress. Perhaps have a command line switch that decides where the objects live, so that changes can land incrementally?

CCing content/owners and a few other folks in case anyone is interested or knows who might be.
 
When the renderer wants a channel to the GPU process, today this all happens without involving the UI thread. Hopping to the UI thread to connect to the GPU process would likely cause regressions (including startup), and actually cause soft deadlocks (we synchronously block the UI thread to the renderer, with a timeout, in certain conditions, to try to avoid flashing on tab restore or resize).
How do you envision keeping that property (connecting to the GPU process without involving the UI thread) if BrowserChildProcessHost needed to (re-)start the GPU process (e.g. after a GPU crash), lives on the UI thread?
Thanks, I'm not as familiar in this specific case.

In the common case of no-gpu-crash, the GPU process would always be alive before renderers try to connect to it right? So maybe we can try to get an estimate of the latency that the UI thread hop would add, by timing how long a IO->UI->IO hop takes today in the IPC message dispatcher (to be clear, we'd only time the hops, but not delay the work).

How frequently do GPU processes crash per 1M page loads?
At startup it's possible that renderers try to connect before the GPU process has started. We'd want to handle this case without involving a hop to the UI thread (as the latency in those cases are 100s of ms to seconds).

Taking a step back, why do we want to potentially regress the critical paths in production to change how BrowserChildProcessHostIterator works? It doesn't even appear to be used for any critical-path subsystem (e.g. heap profiler, task manager, hung page/plugin, UMA, logging, etc.)?
> We'd want to handle this case without involving a hop to the UI thread (as the latency in those cases are 100s of ms to seconds).

Could you expand on this a bit? If the UI thread is hosed, Chrome is already in a pretty bad state as much of core functionality [profile, prefs, ...] requires a functional UI thread.

> Taking a step back, why do we want to potentially regress the critical paths in production to change how BrowserChildProcessHostIterator works? 

I think this would be a great code clean up. The current code has unnecessary (?) complexity [I'll have to think carefully about the BrowserChildProcessHost/GPU case]. If we can reduce this complexity, that would be a small step towards having a simpler codebase. 

We often add complexity to Chrome. If we do not remove complexity when it becomes unnecessary, we slowly end up with a very-difficult-to-handle codebase. This is less of a problem for engineers who've been working on Chrome for a long time. It can be a very high burden for newer engineers.
To be clear, I'm not advocating taking any perf regression to do this. If we can find a way to handle the GPU case which you've pointed out, then we would get code simplification by removing the UI/IO split on how processes are launched/managed.

It could also be that we gate this on the UI scheduler work, as that would allow us to give highest priority to these messages from renderers so that they never experience high latency.
>> We'd want to handle this case without involving a hop to the UI thread (as the latency in those cases are 100s of ms to seconds).
> Could you expand on this a bit? If the UI thread is hosed, Chrome is already in a pretty bad state as much of core functionality [profile, prefs, ...] requires a functional UI thread.

I was talking specifically of startup, where the UI thread is constantly busy. On a cold start with a large session to restore it can be this long before the UI thread can be expected to be available. I don't think that should be a surprise to anyone (take a startup trace?).


> I think this would be a great code clean up. The current code has unnecessary (?) complexity [I'll have to think carefully about the BrowserChildProcessHost/GPU case]. If we can reduce this complexity, that would be a small step towards having a simpler codebase. 

I'm all for reducing complexity, but not at the cost of product excellence. In this particular case, because the IO thread still needs to be involved (beyond the startup case, the soft deadlock cases I mentioned above), I'm not super convinced that you're not going to trade complexity in one area with more complexity in another. I'm happy to be proven wrong.
Some rough brainstorming: GpuProcessHost can live on the UI thread, but GpuHostImpl::gpu_service_ptr_ can be bound on the IO thread. That would allow mojo calls that are dispatched by RenderProcessHost::gpu_client_ to be serviced on the IO thread as today.

Undoing ~11 years of code assumptions about the thread lifetime is probably a lot of work. At the same time, I think everyone is on the same page that this can't have negative perf effects. So once someone is interested in this, we can sketch out a detailed design and make sure the concerns above are addressed.

Sign in to add a comment