New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 676459 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 1
Type: Bug



Sign in to add a comment

Skia GPU memory usage causing oom kill

Project Member Reported by boliu@chromium.org, Dec 21 2016

Issue description

From internal b/33466635

Repro: load https://www.google.com/fbx?fbx=flash_cards on nexus 5x, and play the game. Happens for both chrome and webview. I was using 64bit build, which might repro easier, since it uses more memory in general.

In chrome, the gpu process gets oom killed. In webview, the app just dies, since there is no gpu process

Bisected to this skia CL from awhile ago: https://codereview.chromium.org/2361093002

Unfortunately already shipped to m55 stable. But let's get this fixed in m56.
 

Comment 1 by vmi...@chromium.org, Dec 21 2016

Cc: ericrk@chromium.org
The patch logic looks OK to me, but it increases kDefaultMaxUnusedFlushes from 64 to 1800.  We don't seem to change this default limit from Chromium code.

So, perhaps our purging is too conservative.  However I think if we're getting OOMs the issue isn't that we don't purge aggressively, but that our hard cache limit may be too high.

ericrk@ any thoughts?

Comment 2 by boliu@chromium.org, Dec 21 2016

Ohh, that could be it.. on 5x for webview, the foreground app gets killed when it uses around 600MB of memory I think. Testing setting kDefaultMaxUnusedFlushes back to 64 now

Comment 3 by boliu@chromium.org, Dec 21 2016

Yep, setting kDefaultMaxUnusedFlushes back to 64 fixes this for me on trunk.

> on 5x for webview, the foreground app gets killed when it uses around 600MB of memory I think.

Well, that was a useless statement, kDefaultMaxUnusedFlushes isn't in MBs.

Comment 4 by ericrk@chromium.org, Dec 21 2016

Hmm, yeah, this shouldn't cause a leak, but will cause us to purge caches much less aggressively. We had discussed exposing the number as a knob to CC, but hadn't gotten around to it yet. Apparently the previous (aggressive) limit was too fast for some canvas 2d content, which ended up thrashing a bit. bsalomon@ can likely provide details of what content was having problems.

We definitely want to tweak this down for non canvas contexts (such as the raster worker context).

Skia should still have a 96mb limit on cache, but maybe there's enough going on on the page in question that we end up just breaking 600mb with everything included. Will take a look in more detail now.

Comment 5 by boliu@chromium.org, Dec 22 2016

According to cs, setting maxUnusedFlushes is not exposed through the public skia headers, so this will require skia changes. So I guess bsalomon is still the right owner here?

Comment 6 by ericrk@chromium.org, Dec 22 2016

Yeah, this will require a Skia change.

Interestingly, I took a memory-infra trace during execution, and it appears that Skia is keeping ~135mb of GPU resources cached. This is well above the 96mb limit I thought we had, and additionally ~60% of the memory is purgeable (not actively in use)... I'll keep investigating, as we should really not be hitting >96mb of memory when we have things that could be purged.
I'll take a look into this. I'm WFH and don't have a N5 but I assume the high memory usage this will probably repro on any OS/device.
I was able to hit 155MB running this on Linux in N6 emulation at 150% zoom via dev tools.

The 155MB of stuff in GrResourceCache counts things that are budgeted as well as things that aren't. Non-budgeted items are typically Skia client-created textures wrapped in Skia objects via SkSurface or SkImage factory functions. Less commonly they are images or surfaces whose underlying texture was created by Skia but were created with SkBudgeted::kNo passed to the factory function.

The highest budgeted amount I saw was 99MB. We do sometimes go slightly over budget which then triggers a flush to free up resources. So this isn't too unexpected.

I suspect there is no accounting bug or leak. I was running in debug and we do a fair amount of validation with asserts in debug. Rather I think this is a case where dialing back the aggressive purging when under budgeted caused us to simply run out of memory.

I propose exposing the max unused flushes knob for now while working on an alternative that is not flush-centric since that is an odd statistic that was chosen only because it (very loosely) corresponds to frames.

Comment 9 by boliu@chromium.org, Dec 22 2016

Summary: Skia GPU memory usage causing oom kill (was: Skia GPU memory leak)
since this is not a leak..
Cc: junov@chromium.org
I'm guessing some of those non-budgeted items are canvas backing stores, which IIRC are SkImages wrapping non-budgeted (external?) textures.
That's correct. When running the game there are 8 layers.  It seems that we are using software path rasterization for clips. I'm trying to nail down why that is occurring. I think it may be the circular looking clip paths used for the dot animation, though I grabbed a SKP of that and outside of Chrome that layer isn't using sw path rasterization. It could vary from frame to frame and I captured an ok frame. Still looking...
The circles that make up the dot animation while playing are rendered using SVG as clip paths. One of the paths is determined to be ever so slightly concave. I didn't step into that code but the path is made up of cubics and I'm guessing at one of the on curve control points the tangents don't exactly match due to some precision loss at some stage of generating this content. It would be nice if these could be made to be actual SVG circles, but I suspect we lost that info in After Effects or while exporting to bodymovin.

Anyway, we wind up sw rasterizing and uploading this mask and our poor caching fills up the budget with masks with rasterized circles. I'm going to try update Skia to remove masks from the cache once they are no longer reachable (when the generating clip element is popped off the stack). I believe that will address the memory issue here. This particular instance will also be fixed when the aa tessellated path renderer is enabled, but it's worth fixing the underlying cache issue as well.
I have a CL under development to purge unreachable clip masks. With that CL applied when playing the game in Nexus 5X emulation at 150% in developer tools we use ~6MB of Skia's budget.
Project Member

Comment 15 by bugdroid1@chromium.org, Jan 6 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f848b67b1486e0aba0ddcfea7e492062ba07064f

commit f848b67b1486e0aba0ddcfea7e492062ba07064f
Author: skia-deps-roller <skia-deps-roller@chromium.org>
Date: Fri Jan 06 22:50:22 2017

Roll src/third_party/skia/ f55ea6a1d..83f532e9b (9 commits).

https://skia.googlesource.com/skia.git/+log/f55ea6a1deb2..83f532e9b5b8

$ git log f55ea6a1d..83f532e9b --date=short --no-merges --format='%ad %ae %s'
2017-01-06 mtklein Add a real SkXbyak bench, implement enough to run it.
2017-01-06 mtklein exclude src/opts/SkXbyak.cpp from g3 build
2017-01-06 kjlubick Add in Path fuzzer
2017-01-06 djsollen Add support for 64-bit devices when using gdb on Android
2017-01-06 bsalomon Purge clip masks when they are no longer findable.
2017-01-04 fmalita Avoid SkFixed overflow in decal bitmap procs
2017-01-06 bsalomon Remove arithmetic mode GrXP code.
2017-01-06 bsalomon Add support for tagging GrUniqueKeys with a debug string
2017-01-06 mtklein SkXbyak basics

BUG= 676459 , 675444 

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, see:
http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls

CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel
TBR=csmartdalton@google.com

Review-Url: https://codereview.chromium.org/2617043004
Cr-Commit-Position: refs/heads/master@{#442083}

[modify] https://crrev.com/f848b67b1486e0aba0ddcfea7e492062ba07064f/DEPS

Status: Fixed (was: Assigned)
Labels: Merge-TBD
[Auto-generated comment by a script] We noticed that this issue is targeted for M-56; it appears the fix may have landed after branch point, meaning a merge might be required. Please confirm if a merge is required here - if so add Merge-Request-56 label, otherwise remove Merge-TBD label. Thanks.
Labels: -Merge-TBD Merge-Request-56
Project Member

Comment 19 by sheriffbot@chromium.org, Jan 17 2017

Labels: -Merge-Request-56 Merge-Review-56 Hotlist-Merge-Review
This bug requires manual review: DEPS changes referenced in bugdroid comments.
Please contact the milestone owner if you have questions.
Owners: amineer@(clank), cmasso@(bling), gkihumba@(cros), bustamante@(desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-Review-56 Merge-Approved-56
OK, so our next WebView release is going to be shipped to stable users - so I'm a little wary of taking a ~60 LOC patch and sending it out with very little margin for error.  On the other hand, it sounds like when this does occur, it's a really bad regression.

So I'll defer to the Skia team here - how often do we expect this situation to occur in everyday usage?  I've only seen the complaint on this single game, and M55 has been live for a while now; is it possible other apps are dying and people just aren't reporting the cause?  Or do we expect this to be rare?

Additionally, how risky is this patch?  As I said, it's larger than I'd like to take at this stage - but if you know for certain it's unlikely to introduce regressions we can pick it up.

I'll merge approve this for M56 branch 2924, but I ask that we only merge if:

 - We believe that this is common on the web and other people are running into this, and they just haven't reported it OR
 - We believe the patch is super safe.

Conversely, if we think the patch is risky, we should not merge.

Hope that's clear enough but ping me if you have questions.
Labels: -Hotlist-Merge-Review -M-56 -Merge-Approved-56 M-57
I don't think this meets the criteria, removing the merge labels and changing the target milestone to 57.
M

Sign in to add a comment