New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 779211 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug



Sign in to add a comment

chrome on android should avoid launching gpu process when in the background

Project Member Reported by boliu@chromium.org, Oct 27 2017

Issue description

Main reason is that android can kill the gpu process to free memory, we don't want to then immediately recreate that process and basically go against that android.

Other reasons: hopefully, this means chrome browser stays alive longer in the background with only the browser process alive. And this means doesn't need to have a higher crash limit than desktop (still want reset though), if chrome can promise to never try to launch gpu process while in background.

Down side assuming this works is that it takes longer to resume things when app is brought to foreground again, but feels like the correct trade off here.


Now.. is this actually possible? I assume our own compositing things can be told to stop trying to recreate context when invisible (they kinda do this already, but details matter). But what about other things like media/webgl/whatever else I'm missing? Is it safe to have a blanket rule for android to not launch gpu process while in background?

Thoughts?
 

Comment 1 by danakj@chromium.org, Oct 27 2017

For compositor should work - when not visible scheduler won't make a context (unless a request is in flight already when becoming not visible.. but then we remove the frame sink and handle that case in compositorimpl already I think).

For media and stuff, for each one that makes a context, it would need some kind of setter to say "dont make contexts" "ok make them again now" and it would need additional states where it maybe used to assume it had a context and now has to defer work? I can't comment on how easy/hard that is without looking at case by case. Other option is to block the renderer when they try make a context.

Comment 2 by boliu@chromium.org, Oct 27 2017

For compositor, the "details matter" is the retry on transient failure; probably should stop that on background as well. Relatively minor I guess, since that's only if process is killed before establish channel I think.

Maybe it's not worth worrying about the other cases because they are "not common" :/

Comment 3 by danakj@chromium.org, Oct 27 2017

Ah, ok yeah in compositorimpl we'd have to check that on retries, you're right, but sounds straightforward. Good thought.

Comment 4 by piman@chromium.org, Oct 27 2017

Also, races where we get backgrounded but haven't had a chance to tell anyone yet (processes/threads). Do we need a "kBackgrounded" failure mode where e.g. the renderer could back off but retry "later" (do we need a signal for that)?
And lastly, there are places where we want the GpuFeatureInfo to know if features are enabled, where (I think) we don't care if we're backgrounded. The GpuFeatureInfo is gathered on the GPU process, so we need to launch it in that case. Or at least have a fallback for the backgrounded mode where we would return a cached GpuFeatureInfo (handwaving)?
Labels: -Pri-3 Pri-2
Status: Available (was: Untriaged)
Cc: vmi...@chromium.org
This would help bring browser crash rate down on android (https://bugs.chromium.org/p/chromium/issues/detail?id=777601#c6).

Comment 8 by boliu@chromium.org, Nov 2 2017

I do want to work on the easy cases soon-ish, ie stop these retries for establish channel failures when in background:
https://cs.chromium.org/chromium/src/content/browser/gpu/browser_gpu_channel_host_factory.cc?rcl=69f1dd00a0df187dcea7c2099eb4b59800810477&l=154
https://cs.chromium.org/chromium/src/content/browser/gpu/gpu_client.cc?rcl=69f1dd00a0df187dcea7c2099eb4b59800810477&l=58

I'll definitely assign this to myself when I start to avoid duplicating work.

Also re #2-3, CompositorImpl actually does the right thing on transient failure, so we are good there

Comment 9 by boliu@chromium.org, Nov 2 2017

Cc: ericrk@chromium.org
Owner: boliu@chromium.org
Status: Assigned (was: Available)
So here is what I plan to do, to cover common cases:

For BrowserGpuChannelHostFactory and GpuClient (the two links in #8), stop retrying establish channel and just report failure up if app is in the background, for android only.

For RenderWidgetCompositor (https://cs.chromium.org/chromium/src/content/renderer/gpu/render_widget_compositor.cc?rcl=d67c4f37d7c2bbfef518f7a4c6e74189943e4b32&l=1035), stop retrying when compositor is invisible, and try again after compositor becomes visible. I think it makes sense to do this for all platforms, not just for android.

WDYT?
> I think it makes sense to do this for all platforms, not just for android.

I agree.

> For BrowserGpuChannelHostFactory and GpuClient (the two links in #8), stop retrying establish channel and just report failure up if app is in the background, for android only.

For android only sounds okay, since it will always retry.. I'm a little on the fence about returning a failure in this API in different cases on different platforms, it relies on knowing how the caller is going to behave.. Or is the caller going to tell the factory/gpuclient to stop retrying, that would make sense?
> I'm a little on the fence about returning a failure in this API in different cases on different platforms, it relies on knowing how the caller is going to behave.. Or is the caller going to tell the factory/gpuclient to stop retrying, that would make sense?

For now, for biggest benefit for least amount of work, it's basically relying on caller to not retry either. At this point, we don't need to block gpu launch while in background. And fixing RenderWidgetCompositor is fixing the second most common caller I guess.

I think longer term, I like piman's idea of having kBackground failure mode, and also have browser signal to callers about whether it's currently considered background. But I guess unless we can ensure we can just block launching gpu altogether while in background for all cases, I'm not sure how much it's worth actually doing this
How do BrowserGpuChannelHostFactory and GpuClient know they are background?
content embedder has tell them. So similar to how the source of visibility change comes from content embedder as well. I was thinking through ContentBrowserClient but not sure if that entirely makes sense yet..
Oh is it the GpuProcessHost that is told about the visibity and returns a failure? What if it just held the request and satisfied it once visible?
Yeah, I think GpuProcessHost makes sense, but I haven't started writing code yet, so who knows..

> What if it just held the request and satisfied it once visible?

That blocks RenderThreadImpl::EstablishGpuChannelSync until visible (iirc?), which I don't think is ok
Oh, also, this "background" signal won't exactly match "visibility" signal at least in the renderer/blink sense, because blink one takes other things into account like whether there's media playing in that renderer and whatnot.

Plus "background" is a global signal, and "visibility" is per-tab/webcontents.
Right I was expecting renderer to work on browser visibility not RenderWidget/whatever. :) I had to think that one thru tho hehe.
> That blocks RenderThreadImpl::EstablishGpuChannelSync until visible (iirc?), which I don't think is ok

Shoot, I guess so. But also no code expects to retry if the channel failed. Like WebGL would just see it cant make a context and fail. Canvas would fall back to software mode, and such. Are these side effects things considered?
> But also no code expects to retry if the channel failed. Like WebGL would just see it cant make a context and fail. Canvas would fall back to software mode, and such. Are these side effects things considered?

Hmm, good point.

I don't have an answer right now, but that's clearly not ok. Guess even the simple cases aren't very simple :|

Simplest thing in my mind is just have these things pay attention to visibility. But that's getting awfully to close to just having the real background signal from browser, and then going through all callers.
"Simplest" thing might be to make context creation not synchronous api on renderthreadimpl, like for the compositor..
simplest conceptually I guess, but not actually easy to implement I guess? must be lots of assumptions built in already :p
Alternatively.. you could stop making contexts for compositors but still spin up gpu for other things. I guess that kinda kills the usefulness if there's any gpu canvas on the page when it gets lost or webgl.. so yeah. Lots of assumptions.
Right, I was definitely going to do the RenderWidgetCompositor change first, since that's clearly the easiest and correct thing to do.

But if we don't get rid of the retry in GpuClient, then I suspect the case of in m62, the a single client uploading thousands of crash reports won't go away. (I understand it'll just crash the browser on trunk now after your refactor, but that's not that much better...)
Project Member

Comment 26 by bugdroid1@chromium.org, Nov 10 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d5d38582a55f8adcf07d7fbdba510ff9f413cb56

commit d5d38582a55f8adcf07d7fbdba510ff9f413cb56
Author: Bo Liu <boliu@chromium.org>
Date: Fri Nov 10 18:30:47 2017

content: Stop FrameSink retry when invisible

If FrameSink creation fails after GPU EstablishChannel, then the layer
compositor will only retry FrameSink creation after it becomes visible
again. However if there is a failure before EstablishChannel, the retry
logic before this CL is conditional.

This is a problem on Android where there appear to be persistent failure
issues before EstablishChannel. Also it's nice to match behavior with
layer compositor as well on other platforms. So stop retrying on
EstablishChannel failure as well.

Bug: 779211
Change-Id: I9bd371b5686a0da329b6f090c9a61451fbd1134e
Reviewed-on: https://chromium-review.googlesource.com/762185
Reviewed-by: danakj <danakj@chromium.org>
Reviewed-by: enne <enne@chromium.org>
Commit-Queue: Bo <boliu@chromium.org>
Cr-Commit-Position: refs/heads/master@{#515608}
[modify] https://crrev.com/d5d38582a55f8adcf07d7fbdba510ff9f413cb56/content/renderer/gpu/render_widget_compositor.cc
[modify] https://crrev.com/d5d38582a55f8adcf07d7fbdba510ff9f413cb56/content/renderer/gpu/render_widget_compositor.h
[modify] https://crrev.com/d5d38582a55f8adcf07d7fbdba510ff9f413cb56/content/renderer/gpu/render_widget_compositor_unittest.cc

Comment 27 by boliu@chromium.org, Nov 11 2017

re canvas fallback to software. danakj: is that still true after these two CLs?
https://chromium-review.googlesource.com/c/chromium/src/+/742521
https://chromium-review.googlesource.com/c/chromium/src/+/751406

Haven't read them carefully, but from the description, it sounds like they are querying CompositingModeWatcher now. And since CompositingModeWatcher isn't really implemented on android, does that mean at least canvas won't fallback to software?

Can't hide context creation error from webgl, but maybe it's not a huge deal, since hopefully well behaved webgl won't try to create context while invisible..

Comment 28 by boliu@chromium.org, Nov 13 2017

So the canvas 2d fallback is probably this (on android anyway), not accelerated compositing being disabled: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/html/HTMLCanvasElement.cpp?rcl=2b7551c51c3459a00ecc4a6b89892f390e4f71e6&l=940

I'm still trying to workout how that gets triggered since I can't seem to get it to happen at run time.

Also accidentally came by kAccelerated2DCanvasGPUContextLost, so checked the graph on stable (not normalized over anything):
https://uma.googleplex.com/p/chrome/timeline_v2/?sid=312a21ef02521e5a05cfc3db23f6baeb

There seem to be a big regression in m60. But significantly, the code introduced in m61 to retry establish channel failure (which caused crbug.com/772049, and being reverted in m63) appear to have no benefit to this metric..

Comment 29 by boliu@chromium.org, Nov 13 2017

Ok, reading the code, looks like this is the place where things will make a difference:
https://cs.chromium.org/chromium/src/content/renderer/renderer_blink_platform_impl.cc?rcl=c34b4e4003acd279e20402a8a1e49aec6a229d98&l=1157

So if a 2d canvas tries to create a context and fails *while app is in the background*, then it falls back to software. That... doesn't sound like a huge deal to me. User can't be opening pages and whatnot while in background, and canvas 2d doesn't have a context lost notification anyway, so this only matters when a page creates a new canvas 2d element while in the background. Feels like a reasonable trade off?


That place also applies to all other blink usage of gpu context as well. Again, hopefully webgl can deal with context loss already, and just fingers crossed it doesn't retry context creation unconditionally. But overall doesn't feel "that bad"?
Cc: junov@chromium.org
https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/html/HTMLCanvasElement.cpp?rcl=ac04b481eabf80e2544200efe63de76b3690a49c&l=940 is where the check for null context causes fallback, yeah.

> So if a 2d canvas tries to create a context and fails *while app is in the
> background*, then it falls back to software. That... doesn't sound like a huge deal to me. User can't be opening pages and whatnot while in background,
> and canvas 2d doesn't have a context lost notification anyway, so this only
> matters when a page creates a new canvas 2d element while in the background.
> Feels like a reasonable trade off?

We should probably get some thoughts from canvas folks. +junov There may be some UMAs to watch to make sure things aren't accidentally terrible too.

Comment 31 by boliu@chromium.org, Nov 16 2017

Not much crash data from canary, but I'm not really convinced #26 moved GpuProcessHost::RecordProcessCrash by much.

I'll remove the retry in BrowserGpuChannelHostFactory next. There isn't any objections to stop retry while in background for browser clients (basically the browser and display compositor) right?

That should involve all the plumbing, so hopefully it's just flipping a condition to remove retry for renderer clients as well later
> I'll remove the retry in BrowserGpuChannelHostFactory next. There isn't any objections to stop retry while in background for browser clients (basically the browser and display compositor) right?

Seems fine, beware that trying to take a screenshot asks for a GLHelper which would spin up the process.

One thought, maybe we should have a mode where it just never retries and forces the caller to choose to, rather than plumbing in (surely over time increasing amounts of) data to make the decision to retry or not there?

Comment 33 by boliu@chromium.org, Nov 18 2017

> beware that trying to take a screenshot asks for a GLHelper which would spin up the process.

Android's version never establish channel itself, and relies on browser compositor having already done that: https://cs.chromium.org/chromium/src/content/browser/renderer_host/render_widget_host_view_android.cc?rcl=7feff280c104a821d6a552feefbe0b36ebb11828&l=193

I'm not sure if there is any code outside of that on android doing readbacks though. Hopefully not..

> One thought, maybe we should have a mode where it just never retries and forces the caller to choose to, rather than plumbing in (surely over time increasing amounts of) data to make the decision to retry or not there?

Yep, I think that's what I'll go for

Comment 34 by junov@chromium.org, Nov 20 2017

@#29: As long as we're not tearing-down the GPU process after an accelerated 2D canvas has been set-up, we're in the clear.  For what it's worth, there is a "hibernation" mechanism that swaps accelerated 2D canvases out of GPU memory when the page is not visible.  This is done during an idle period to avoid interfering with performance. For it to work correctly, the GPU context needs to be still available after the page with canvas in it has become invisible.  Otherwise, there will be data loss.
Project Member

Comment 35 by bugdroid1@chromium.org, Nov 28 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4ade6b1b214e4b87d26faf1e9cfcc0ff1b6246cd

commit 4ade6b1b214e4b87d26faf1e9cfcc0ff1b6246cd
Author: Bo Liu <boliu@chromium.org>
Date: Tue Nov 28 02:26:35 2017

gpu: Allow embedder to stop gpu process launch

Add a method to ContentBrowserClient. Note this controls gpu process
"launch" for in-process GPU as well. The method still allows using an
existing GPU process, so in-process case should be minimally impacted so
probably not worth carving out special conditions for it.

For now, only call this new API from BrowserGpuChannelHostFactory, which
controls GPU clients in the browser process. Clients in child process is
more risky, so holding going to do that part in a follow up CL instead.

ChromeContentBrowserClient on android implements the API by checking if
there are running or paused activities. Paused activity may still be
visible, so allow it to launch GPU in that case.

Bug: 779211
Change-Id: Ic48acc9f9ab6134371d7f193d9031e912a05d947
Reviewed-on: https://chromium-review.googlesource.com/783495
Reviewed-by: Antoine Labour <piman@chromium.org>
Commit-Queue: Bo <boliu@chromium.org>
Cr-Commit-Position: refs/heads/master@{#519529}
[modify] https://crrev.com/4ade6b1b214e4b87d26faf1e9cfcc0ff1b6246cd/chrome/browser/chrome_content_browser_client.cc
[modify] https://crrev.com/4ade6b1b214e4b87d26faf1e9cfcc0ff1b6246cd/chrome/browser/chrome_content_browser_client.h
[modify] https://crrev.com/4ade6b1b214e4b87d26faf1e9cfcc0ff1b6246cd/content/browser/gpu/browser_gpu_channel_host_factory.cc
[modify] https://crrev.com/4ade6b1b214e4b87d26faf1e9cfcc0ff1b6246cd/content/public/browser/content_browser_client.cc
[modify] https://crrev.com/4ade6b1b214e4b87d26faf1e9cfcc0ff1b6246cd/content/public/browser/content_browser_client.h

some discussion on the CL to remove retry for renderer clients:
https://chromium-review.googlesource.com/c/chromium/src/+/797617

My summary for that discussion:
Most renderer GPU clients do not retry when EstablishChannel fails, and the retry is there so that EstablishChannel never fails transiently, ie due gpu crash. Renderer compositor is the exception.


I think it's still valuable to remove that retry, even if it's only for the compositor for now. So I'm thinking adding an argument to EstablishChannel whether the client can handle transient failures, and maybe also have a return value on failure, if failure was transient or not.

Compositor can say it can handle transient failures, retry on transient failure, and wait for CompositingModeFallbackToSoftware on a fatal failure?

All other clients can just say they can't handle transient failure for now, and with no change in behavior.
So the point of previous changes were to remove retries for transient failures on the renderer side and instead do them on the browser side (avoids duplication, simpler code, fewer IPCs, overall faster). What would be the reason to move it back to the renderer?
client (compositor in particular) can use more information to decide if it wants to retry. for this bug, the big difference is that compositor won't retry if it's not visible.

I'm not sure which other clients can be made to behave like that too. Canvas 2d can't handle context loss, so not really applicable. Webgl is under js control, so not applicable either. So I think that just leaves media, which is a bunch of code I don't understand much, but I think probably worth investigating further.
It sounds like an optimisation for an edge case: the client requested a channel (was visible), and in the tiny lapse of time before it got the transient failure reply, it changed its mind (was made invisible)? Is that really worth the complexity, and additional cost (extra round trips / IPCs) for the normal cases?
In general the transient failure seems very much orthogonal to the client state... but if we want to go that route I'd rather we looked into being able to cancel requests... e.g. a successful establish might take a while (queued up behind long processing on the GPU process main thread), so if the client is no longer interested in the result, it might be worth cancelling early.
> In general the transient failure seems very much orthogonal to the client state...

Not on android. If app is in the background, then the GPU process is much more likely to be killed by the OS to free memory. I believe some of the media initialization things on android is also much more likely to fail, but I don't really have strong proof to back that up. I think crbug.com/772049#c79 shows there's some failures that's not so transient on android, resulting a single client uploading thousands of GPU crash reports.


Taking a step back, there were two reasons I filed this bug.

One was to follow up on crbug.com/772049, that we don't have infinite retry after m63. danakj solved that part for android when kDisableGpuProcessCrashLimit was removed. But that ends up being a browser crash, which is also bad. It's really hard to balance between GPU crashes from retrying forever, and client crashes from not retrying enough...

Two was outlined in the initial description here, that it would be really nice on android if we never tried launched gpu process while app is in the background.


I guess establish then cancel wouldn't be as useful for the second case.
If we want to avoid having a GPU process in some conditions, we first need to define the behavior for all the things that do depend on the GPU process in those conditions. Once we agree on that we can talk about how to implement it.
The status quo is that Chrome on Android requires a GPU to function and can't function without it (no software fallback), so terminating the browser when we can't have it is the right choice.


That is completely orthogonal to how to deal browser/renderer races on restarting the GPU process after a crash (what the GPU_HOST_INVALID retry in GpuClient is for).


Trying to combine the 2 leads to much unhappiness.
I don't mean actively kill the GPU when chrome is put into background. But if OS kills the GPU, then it would be nice if chrome doesn't try to launch it again until chrome is foreground.

I was thinking OS killing the GPU in terms of handling a transient failure, hence conflating the two..
What if we made the compositor do an async request for a gpu channel, instead of a sync one? Then the browser could just sit on that request if it's in the background and needs to try launch a gpu process?

Comment 44 by boliu@chromium.org, Dec 11 2017

> What if we made the compositor do an async request for a gpu channel, instead of a sync one? Then the browser could just sit on that request if it's in the background and needs to try launch a gpu process?

And a sync request from another client would force all pending requests through, regardless of background state? It's a little weird that async and sync requests behave differently. But moving to a fully async API, even partially, is probably a good thing.

I'm happy to implement that if piman approves :)

Comment 45 by piman@chromium.org, Dec 12 2017

Cc: sadrul@chromium.org
Moving the renderer to do an async request is probably good.
Then I think I would extend the request to have a bool allowing (or not) arbitrary large delays (letting the browser throttle when in background), to be explicit at the call site, and decoupling the synchronization logic (sync vs callback) from the delaying logic - it probably makes little sense right now for sync requests to be delayed, but after crbug.com/566273 it could make more sense (when on background threads). Note that the browser sees both sync and async requests identically at this point. Then clients can opt in piecemeal.
Cc: -junov@chromium.org

Sign in to add a comment