New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Chrome leaks 40+GB of GPU memory.

Project Member Reported by erikc...@chromium.org, Apr 20 2017

Issue description

Symptoms:
1) Most/all apps get suspended. 
2) Nothing reflected in memory-infra or Activity Monitor [memory pressure = none]
3) vm_stat shows ~16 million compressed pages [~64GB]
4) Running "sudo vmmap -v -interleaved <pid>" on the GPU process shows that it has >40GB of memory in swap/compressed memory. This is spread across ~15,000 textures, averaging 3MB each.
5) ioclasscount shows that we aren't leaking IOSurfaces
6) Killing the GPU process fixes the problem.

All of this suggests that we're leaking non-IOSurface GL textures. 

We first observed this on nduca's machine.
https://bugs.chromium.org/p/chromium/issues/detail?id=700928#c52

dmoisa@ confirmed these symptoms on their machine in an email thread.

Based on the symptoms, I suspect the leak was introduced [by me] in M52
https://codereview.chromium.org/1965253002

And fixed at https://bugs.chromium.org/p/chromium/issues/detail?id=692074#c12 [merged to M58]

I haven't confirmed this yet, as most of our relevant instrumentation doesn't exist in M57. The bug applies to pepper contexts that get resized. It's hard to imagine this happening 15,000 times via manual user interaction, so perhaps these users are hitting some site that happens to trigger this condition very often.


 
Showing comments 9 - 108 of 108 Older
Cc: dcasta...@chromium.org sande...@chromium.org
+sandersd, dcastagna
I arrived a bit late to the party, but if if the metric is temporary can
you use the memory.experimental (or experimental.memory, can't remember on
top of my head ),pattern that tok used?
Both nduca and dmoisa have have a machine that's using a "Intel Iris Pro 1536 MB" GPU. 

In Issue 669775, we observed a driver-related massive GPU leak when using a certain path for GPU raster [which has since been turned off]. This leak seems to only occur on   "Intel Iris Pro 1536 MB". I wonder if the driver for this GPU is just super buggy and we're hitting other leaks with it?
Project Member

Comment 13 by bugdroid1@chromium.org, Apr 21 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5ee3686ce72a6f6f4e218984f80e230f88760115

commit 5ee3686ce72a6f6f4e218984f80e230f88760115
Author: erikchen <erikchen@chromium.org>
Date: Fri Apr 21 22:20:04 2017

Change name of recently introduced metric Memory.Gpu.PhysicalFootprint.MacOS.

The new name is Memory.Experimental.Gpu.PhysicalFootprint.MacOS, to reflect the
temporary nature of the histogram.

BUG= 713854 

Review-Url: https://codereview.chromium.org/2831273003
Cr-Commit-Position: refs/heads/master@{#466465}

[modify] https://crrev.com/5ee3686ce72a6f6f4e218984f80e230f88760115/chrome/browser/metrics/metrics_memory_details.cc
[modify] https://crrev.com/5ee3686ce72a6f6f4e218984f80e230f88760115/tools/metrics/histograms/histograms.xml

The UMA metric has landed on Canary.

https://uma.googleplex.com/p/chrome/histograms/?endDate=20170423&dayCount=1&histograms=Memory.Experimental.Gpu.PhysicalFootprint.MacOS&fixupData=true&showMax=true&filters=channel%2Ceq%2C1%2Cplatform%2Ceq%2CM%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial

50th percentile: 168MB
75th percentile: 452MB
95th percentile: 1996MB
98th percentile: 3617MB
99th percentile: 5376 MB

These numbers show that we likely still have GPU memory leak. Note that PhysicalFootprint counts non-volatile IOKit pages, so it only includes in-use and/or leaked textures, not textures cached by the driver.

One of ccameron's suggestions was to stop using virtualized OpenGL contexts, as that makes it easier to leak textures from renderers that are no longer in use. We could throw together a Finch experiment and see if that makes a difference.
Project Member

Comment 15 by bugdroid1@chromium.org, Apr 24 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3858a3320d4c95db2662b8e1b6e34c3bda979135

commit 3858a3320d4c95db2662b8e1b6e34c3bda979135
Author: erikchen <erikchen@chromium.org>
Date: Mon Apr 24 18:16:44 2017

Fix a compilation error with 10.11 SDK.

TASK_VM_INFO_REV1_COUNT is defined in the 10.12 SDK, but has a different name in
the 10.11 SDK (TASK_VM_INFO_COUNT). Instead of using it, just use
sizeof(ChromeTaskVMInfo) / sizeof(natural_t), which works for all SDK versions.

BUG= 713854 

Review-Url: https://codereview.chromium.org/2843463003
Cr-Commit-Position: refs/heads/master@{#466689}

[modify] https://crrev.com/3858a3320d4c95db2662b8e1b6e34c3bda979135/base/process/process_metrics_mac.cc

Labels: Merge-Request-58
Merge request for commit 5ee3686ce72a6f6f4e218984f80e230f88760115
Project Member

Comment 17 by sheriffbot@chromium.org, Apr 24 2017

Labels: -Merge-Request-58 Merge-Review-58 Hotlist-Merge-Review
This bug requires manual review: We are only 0 days from stable.
Please contact the milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), bhthompson@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-Review-58 Merge-Review-59
Labels: -Merge-Review-59 Merge-Request-59
Project Member

Comment 20 by bugdroid1@chromium.org, Apr 25 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/92efc4259724f8dcbc550cabfe57bc215343ebba

commit 92efc4259724f8dcbc550cabfe57bc215343ebba
Author: erikchen <erikchen@chromium.org>
Date: Tue Apr 25 00:28:12 2017

Stop emitting physical footprint on macOS 10.11.

Despite the fact that the kernel implements physical footprint calculations on
macOS 10.11, the syscall task_info(TASK_VM_INFO, ...) always returns 0. For more
details [10.11.5], see xnu-3248.50.21/osfmk/kern/task.c:3418.

BUG= 713854 

Review-Url: https://codereview.chromium.org/2838703003
Cr-Commit-Position: refs/heads/master@{#466836}

[modify] https://crrev.com/92efc4259724f8dcbc550cabfe57bc215343ebba/base/process/process_metrics_mac.cc
[modify] https://crrev.com/92efc4259724f8dcbc550cabfe57bc215343ebba/chrome/browser/metrics/metrics_memory_details.cc

Project Member

Comment 21 by sheriffbot@chromium.org, Apr 26 2017

Labels: -Merge-Request-59 Hotlist-Merge-Approved Merge-Approved-59
Your change meets the bar and is auto-approved for M59. Please go ahead and merge the CL to branch 3071 manually. Please contact milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), gkihumba@(ChromeOS), Abdul Syed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Owner: ericrk@chromium.org
I believe I can repro this on my personal machine - haven't hit 40GB leak, but can hit a few hundred megs leaked in a few minutes, and the pattern of the leak looks the same.

The bug appears to be machine specific (I can repro on my personal machine, but the same steps cause no issue on my work machine). It also appears site specific (although it seems that a good number of sites cause the repro, including inbox.google.com, which is what I'm using as my test case).

Repro steps:
1) Open ~10 windows (not tabs) w/ inbox.google.com
2) Close all windows (So chrome app is open, but no windows open)
3) Run vmmap - a number of tile-sized memory chunks will show up under IOKit. These allocations have virtual size, but no resident/dirty/swap size. They will look similar to:

REGION TYPE                      START - END             [ VSIZE  RSDNT  DIRTY   SWAP] PRT/MAX SHRMOD PURGE  
IOKit                  000000011b561000-000000011b831000 [ 2880K     0K     0K     0K] rw-/rw- SM=SHM PURGE=N

And overall IOKit memory will look like:

                                VIRTUAL RESIDENT    DIRTY  SWAPPED VOLATILE   NONVOL    EMPTY   REGION
REGION TYPE                        SIZE     SIZE     SIZE     SIZE     SIZE     SIZE     SIZE    COUNT (non-coalesced)
===========                     ======= ========    =====  ======= ========   ======    =====  =======
IOKit                             86.8M    10.4M    10.4M       0K       0K    3256K    43.4M      150

In the case outlined here, we have ~35MB of memory which is counted in virtual size, but not in other categories.

4) Virtual size alone isn't necessarily a problem, *however* if we now allow the system to sleep (in my tests I needed to sleep for a few minutes), this memory bizarrely converts to Resident/Dirty.

I'm not sure if this is real memory, or some accounting bug, but I'll keep investigating (and see if we can find a way to address the leak in either case).
Also, so far, I believe we have seen this on Intel 8th Gen devices:

Iris Pro 1536MB (Guessing this is the 8th Gen version found in 2015 15" MBP)
HD graphics 5300 1536MB (8th gen, 2015 12" Macbook)

I've confirmed that this doesn't repro on:
HD Graphics 530 1536 MB (9th Gen, 2016 Macbook Pro 15")

Will try some other devices today. Let me know if this is seen on any additional GPUs, especially if non-8th-gen.

Comment 24 by kbr@chromium.org, Apr 26 2017

Cc: yang...@intel.com yunchao...@intel.com qiankun....@intel.com
CC'ing colleagues from Intel as this sounds like a probable leak in the graphics driver.

Thanks, I can repro the issue on Intel Iris Pro 1536 MB.

test1: vmmap with 10 windows open
test2: vmmap with all windows closed
test3: vmmap with all windows closed, after sleeping 3 minutes.
test4: vmmap with 10 windows open
test5: vmmap with all windows closed
test6: vmmap with all windows closed, after sleeping 3 minutes.

Pulling out IOKit numbers for each vmmap, I'm seeing that even when there are no windows open, the GPU is leaking dirty memory. And hten going to sleep leaks more dirty memory on top of that.

      virtual  resident dirty  swapped    volatile nonvolatile  empty  region count
IOKit 778.8M   523.1M   523.1M       0K   111.3M   398.0M       0K     1120
IOKit 416.8M   192.3M   192.3M       0K   157.5M    20.9M       0K      692
IOKit 259.3M   255.8M   255.8M       0K       0K   241.9M       0K      212
IOKit 862.0M   754.0M   754.0M       0K   102.2M   636.5M       0K     1161
IOKit 532.6M   432.4M   432.4M       0K   159.2M   257.9M       0K      735
IOKit 355.3M   351.7M   351.7M       0K       0K   336.4M       0K      246
Tried the same thing on the same machine but with the discrete GPU [NVIDIA GeForce GT 750M 2048 MB]. Could not repro issue.
test21
326 KB View Download
test22
278 KB View Download
test23
191 KB View Download
test24
323 KB View Download
test25
268 KB View Download
test26
193 KB View Download
Cc: bsalomon@chromium.org
FYI, the allocations which are "leaked" appear to come from Skia (not tile resources). I wonder if we're deleting textures in a way which hits a driver bug (or undefined behavior) - such as leaving a texture bound to GL objects when it's deleted. Will dig into this a bit more.

+bsalomon@ in case he's seen anything like this before?
GPU vendor != 0x8086 [Intel]
UMA metric: Memory.Experimental.Gpu.PhysicalFootprint.MacOS

https://uma.googleplex.com/p/chrome/histograms/?endDate=20170425&dayCount=1&histograms=Memory.Experimental.Gpu.PhysicalFootprint.MacOS&fixupData=true&showMax=true&filters=platform%2Ceq%2CM%2Cchannel%2Ceq%2C1%2Cgpu_v_id%2Cne%2C32902%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial

Count: 68,714
Mean: 370
95th percentile: 997MB
98th percentile: 1808MB
99th percentile: 2687 MB

--------------------------------------------------------------------------------
GPU vendor == 0x8086 [Intel]
UMA metric: Memory.Experimental.Gpu.PhysicalFootprint.MacOS

https://uma.googleplex.com/p/chrome/histograms/?endDate=20170425&dayCount=1&histograms=Memory.Experimental.Gpu.PhysicalFootprint.MacOS&fixupData=true&showMax=true&filters=platform%2Ceq%2CM%2Cchannel%2Ceq%2C1%2Cgpu_v_id%2Ceq%2C32902%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial

Count: 92406
Mean: 927
95th percentile: 3994MB
98th percentile: 7237MB
99th percentile: 10757MB

--------------------------------------------------------------------------------
GPU vendor == 0x8086 [Intel]
GPU device = 0x0d26 [Intel Iris Pro 1536MB]
UMA metric: Memory.Experimental.Gpu.PhysicalFootprint.MacOS

https://uma.googleplex.com/p/chrome/histograms/?endDate=20170425&dayCount=1&histograms=Memory.Experimental.Gpu.PhysicalFootprint.MacOS&fixupData=true&showMax=true&filters=platform%2Ceq%2CM%2Cchannel%2Ceq%2C1%2Cgpu_v_id%2Ceq%2C32902%2Cgpu_d_id%2Ceq%2C3366%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial

Count:  18,921
Mean: 1,477
95th percentile: 5936MB
98th percentile: 9742MB
99th percentile: 15988MB

--------------------------------------------------------------------------------
GPU vendor == 0x8086 [Intel]
GPU device = 191b [HD Graphics 530 1536 MB]
UMA metric: Memory.Experimental.Gpu.PhysicalFootprint.MacOS

https://uma.googleplex.com/p/chrome/histograms/?endDate=20170425&dayCount=1&histograms=Memory.Experimental.Gpu.PhysicalFootprint.MacOS&fixupData=true&showMax=true&filters=platform%2Ceq%2CM%2Cchannel%2Ceq%2C1%2Cgpu_v_id%2Ceq%2C32902%2Cgpu_d_id%2Ceq%2C6427%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial

Count:  19
Count too small, no useful data. :(
--------------------------------------------------------------------------------
So...we're definitely seeing a massive leak on Intel devices. 
Although the 98th/99th percentiles for non-Intel devices also look pretty bad.
Project Member

Comment 30 by bugdroid1@chromium.org, Apr 26 2017

Labels: -merge-approved-59 merge-merged-3071
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/cc50b61198ad671177ecc4d8fcab71310fd75739

commit cc50b61198ad671177ecc4d8fcab71310fd75739
Author: erikchen <erikchen@chromium.org>
Date: Wed Apr 26 23:23:09 2017

[Merge to 3071] Add the UMA metric Memory.Gpu.PhysicalFootprint.MacOS

> This is a temporary metric added to debug  Issue 713854 . The existing metrics
> don't measure OpenGL memory usage.

BUG= 713854 

Review-Url: https://codereview.chromium.org/2832933003
Cr-Commit-Position: refs/heads/master@{#466405}
(cherry picked from commit e6da7858a3ffa600b6937f965dca0d685283cd68)

Review-Url: https://codereview.chromium.org/2845793002 .
Cr-Commit-Position: refs/branch-heads/3071@{#240}
Cr-Branched-From: a106f0abbf69dad349d4aaf4bcc4f5d376dd2377-refs/heads/master@{#464641}

[modify] https://crrev.com/cc50b61198ad671177ecc4d8fcab71310fd75739/base/process/process_metrics.h
[modify] https://crrev.com/cc50b61198ad671177ecc4d8fcab71310fd75739/base/process/process_metrics_mac.cc
[modify] https://crrev.com/cc50b61198ad671177ecc4d8fcab71310fd75739/chrome/browser/memory_details.cc
[modify] https://crrev.com/cc50b61198ad671177ecc4d8fcab71310fd75739/chrome/browser/memory_details.h
[modify] https://crrev.com/cc50b61198ad671177ecc4d8fcab71310fd75739/chrome/browser/memory_details_mac.cc
[modify] https://crrev.com/cc50b61198ad671177ecc4d8fcab71310fd75739/chrome/browser/metrics/metrics_memory_details.cc
[modify] https://crrev.com/cc50b61198ad671177ecc4d8fcab71310fd75739/tools/metrics/histograms/histograms.xml

Project Member

Comment 31 by bugdroid1@chromium.org, Apr 27 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/36f8ae8c0648d6e8d53549e62770eb5b04128e2a

commit 36f8ae8c0648d6e8d53549e62770eb5b04128e2a
Author: erikchen <erikchen@chromium.org>
Date: Thu Apr 27 00:06:27 2017

[Merge to 3071] Change name of recently introduced metric Memory.Gpu.PhysicalFootprint.MacOS.

> The new name is Memory.Experimental.Gpu.PhysicalFootprint.MacOS, to reflect the
> temporary nature of the histogram.

BUG= 713854 

Review-Url: https://codereview.chromium.org/2831273003
Cr-Commit-Position: refs/heads/master@{#466465}
(cherry picked from commit 5ee3686ce72a6f6f4e218984f80e230f88760115)

Review-Url: https://codereview.chromium.org/2839363002 .
Cr-Commit-Position: refs/branch-heads/3071@{#244}
Cr-Branched-From: a106f0abbf69dad349d4aaf4bcc4f5d376dd2377-refs/heads/master@{#464641}

[modify] https://crrev.com/36f8ae8c0648d6e8d53549e62770eb5b04128e2a/chrome/browser/metrics/metrics_memory_details.cc
[modify] https://crrev.com/36f8ae8c0648d6e8d53549e62770eb5b04128e2a/tools/metrics/histograms/histograms.xml

Project Member

Comment 32 by bugdroid1@chromium.org, Apr 27 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4651b8f77e96f846bcdbf0a39b8a535d82ea7a19

commit 4651b8f77e96f846bcdbf0a39b8a535d82ea7a19
Author: erikchen <erikchen@chromium.org>
Date: Thu Apr 27 00:14:03 2017

[Merge to 3071] Stop emitting physical footprint on macOS 10.11.

> Despite the fact that the kernel implements physical footprint calculations on
> macOS 10.11, the syscall task_info(TASK_VM_INFO, ...) always returns 0. For more
> details [10.11.5], see xnu-3248.50.21/osfmk/kern/task.c:3418.

BUG= 713854 

Review-Url: https://codereview.chromium.org/2838703003
Cr-Commit-Position: refs/heads/master@{#466836}
(cherry picked from commit 92efc4259724f8dcbc550cabfe57bc215343ebba)

Review-Url: https://codereview.chromium.org/2847593002 .
Cr-Commit-Position: refs/branch-heads/3071@{#245}
Cr-Branched-From: a106f0abbf69dad349d4aaf4bcc4f5d376dd2377-refs/heads/master@{#464641}

[modify] https://crrev.com/4651b8f77e96f846bcdbf0a39b8a535d82ea7a19/base/process/process_metrics_mac.cc
[modify] https://crrev.com/4651b8f77e96f846bcdbf0a39b8a535d82ea7a19/chrome/browser/metrics/metrics_memory_details.cc

Attached is a small page that, if left open, will leak about 5mb every 2 seconds. Note that you can't speed up the animation to leak more, as the animation rate interacts with our caching mechanisms and a faster animation will avoid new allocations.

Also note that until the system sleeps, the "leaked" memory has no dirty/resident/swapped size. As soon as the system sleeps, it ends up becoming dirty/resident/swapped.
leak.html
379 bytes View Download
Given that the leak only manifests when the system sleeps, I wonder if this explains some of the performance issues we've been having when users try to wake up their machine.
Using the private_memory_footprint program [looking at the phys_footprint output], we see that the app's phys_footprint is continuously going up while on this test page.

I bet that when we wake from sleep, the OS is moving the pages from the inactive queue into compressed memory, which is what allows them to get correctly accounted for in vmmap.
private_memory_footprint
13.6 KB View Download
private_memory_footprint.cc
4.0 KB View Download
Two more observations:

looked at xnu-3789.1.32/osfmk/vm/vm_compressor.c:2047, we see that there is some interesting hibernation-related behavior. Specifically, xnu waits 120s for compaction to finish pre-hibernation, which might also explain ericrk's observation that you have to wait several minutes for the increase to swapped memory to occur.

Also, I ran this for a while in 30 different windows. Eventually, at ~71GB leaked, my system ran out of memory and I got the "Force Quit Applications" dialog. I never put my machine to sleep. 
I've found the allocation which is leaking: https://cs.chromium.org/chromium/src/third_party/skia/src/gpu/gl/GrGLGpu.cpp?rcl=b8a1392b021f480e292d66576e3da5198480845c&l=1796

Still need to experiment with whether there's a way to clean up the renderbuffers in a way that prevents the leak.

Eric, do you think this is a case where Skia isn't deleting the renderbuffer or that the driver isn't deleting the underlying memory when skia deletes it?
I think this is a driver issue - Skia is deleting the renderbuffer.

I've put together a small sample which shows the leak (attached). If you run the sample, then "vmmap -v {PID}" you'll see the leak - 2000 entries like:
IOKit                  0000000135ce1000-0000000135d39000 [  352K     0K     0K     0K] rw-/rw- SM=SHM PURGE=N

I've tried a few things, and nothing seems to work around the leak. Let me know if you have any ideas. The line which causes the leak is commented in the sample.

If anyone from Intel has any ideas on a workaround, it would be greatly appreciated!
opengl_stencil_leak.mm
2.1 KB Download
Looks like the test app may switch to discrete GPU (which avoids the bug) - figuring out how to update the sample to avoid this, but until then you may be useful to force integrated GPU via quartz debug.

Comment 41 by kbr@chromium.org, Apr 27 2017

Eric, awesome test case. We should file a Radar including it.

Include the pixel format attribute kCGLPFAAllowOfflineRenderers -- see https://cs.chromium.org/chromium/src/ui/gl/gl_context_cgl.cc?type=cs&q=kCGLPFAAllowOfflineRenderers+package:%5Echromium$&l=50 -- and you'll stay on the integrated GPU.

kbr: I tried kCGLPFAAllowOfflineRenderers with ericrk's test application but it still switched me to discrete gpu.

Comment 43 by kbr@chromium.org, Apr 27 2017

Ah shoot. I forgot there's an OS-level whitelist for whether that attribute actually allows you to stay on the integrated GPU. (Chrome's on the whitelist.)

I think it may still be possible to enumerate the renderers on the system and try to find one that targets the integrated GPU, and then set kCGLPFARendererID, too?

Cc: jie.a.c...@intel.com brandon....@intel.com
Add more Intel folks. It would be better to file a radar, and attach the native sample code to reproduce this issue in the radar.
Will clean up and the sample and file a radar today. Thanks for the suggestions re. integrated/discrete, will try enumerating renderers.
Here's an updated sample which should stay on the integrated GPU. Let me know if you see issues with this triggering the discrete GPU.
opengl_stencil_leak.mm
2.2 KB Download
Filed Radar 31895333 to track the issue. 

Comment 48 by kbr@chromium.org, Apr 28 2017

Good work Eric on the sample, and digging up kCGLPFASupportsAutomaticGraphicsSwitching. We should start using that in Chrome.

Here are some options to mitigate the leak:

A) Add a workaround to Skia which avoids using stencils:
Currently, we use stencils for some non-AA masks/clipping, and for some path rasterization. We could move path rasterization to use other approaches (we already have a number of fallbacks). We have a non-stencil AA mask path, we could probably use this for non-AA masks as well.

B) Use real GL contexts (rather than virtualized ones):
This would mean that GL contexts would get cleaned up more regularly as renderers were created/destroyed, preventing leaks in these contexts from building up over time.

C) Cause Skia to never purge stencil resources on problem systems:
Currently, the leak is made worse by the way the Skia cache purges unused resources after a short duration. This means we may allocate/release a stencil a number of times, rather than just keeping it cached. While this makes sense if the stencil is actually freed, in this case we are just making the leak worse.

I think that either option (A), or a combination of options (B) and (C) would work. (B) is probably good to investigate anyway, as it also mitigates any other leaks we haven't found.
Cc: rkap...@google.com rsesek@chromium.org dpranke@chromium.org
 Issue 712718  has been merged into this issue.
Intel HD Graphics 6000 also experiences the same bug.

Comment 52 by yang...@intel.com, May 2 2017

We already brought this to our MacOS driver team. 
Labels: -Pri-2 ReleaseBlock-Stable Pri-1
Issue 703075 has been merged into this issue.
Status: Started (was: Assigned)
I'm working on option (A) from comment #49 - adding a path to Skia that will avoid all stencil buffers. Luckily alternate paths are available for all cases we currently use these, so the change should be somewhat small. Am hoping to have this out for review ~today.
I'm trying to put together a somewhat exhaustive list of impacted GPUs for the workaround.

If you've experienced this bug, please add an entry to https://docs.google.com/spreadsheets/d/1ItijZX3RYs4MRTUybFQZ-l3sPARSSUDktmTh69irxyQ/edit?usp=sharing

Ideally, confirm the issue on your system by doing the following:

1) build/run the test app from comment 46.
2) While the test app is still running, run "vmmap -v {PID OF TEST APP} | grep IOKit | grep =N" in a new terminal. 

If the leak reproduces, the command will output ~2000 entries similar to:
IOKit                  0000000135ce1000-0000000135d39000 [  352K     0K     0K     0K] rw-/rw- SM=SHM PURGE=N


If the command just outputs a few entries (10s of IOKit entries or fewer), your system does not reproduce the issue.

Thanks!
Here's a pre-built binary for the test app, if your system isn't set up to build it.
opengl_stencil_leak
9.5 KB View Download
Here's a pre-built binary for the test app, if your system isn't set up to build it.
opengl_stencil_leak
9.5 KB View Download
Here's an updated binary which should just tell the user whether their system experiences the leak. Run this instead of following the instructions in #56.
opengl_stencil_leak_tester
28.9 KB View Download
Cc: piman@chromium.org
piman points out that if we're leaking stencil buffers, this will also affect WebGL.
True, but I don't think there's any reasonable way to avoid this in WebGL (short of disabling it). If I recall correctly, WebGL uses non-virtual contexts on mac, so the leak will be contained to the lifetime of the WebGL app - which should be much better than the current "leaks forever" behavior of other contexts.

Although, I vaguely recall conversation around having WebGL no longer triggering discrete GPU (which is what we use as a signal to not virtualize on Mac - see https://cs.chromium.org/chromium/src/gpu/ipc/service/gpu_command_buffer_stub.cc?rcl=9b0ef8da376582de0ba4d3aaa223bd78c3ab2f19&l=597).

kbr@, do you know what the current status of this is? 
@#61 Have we confirmed that destroying the context frees that leaked memory? In particular, since we put all contexts in the same share group (so that we can share textures using mailboxes), leaked GL resources (e.g. if we were to leak a GL texture or a GL renderbuffer) would not be freed when destroying the context, until we destroy all contexts. Not sure where the leak comes from so it's not clear what the memory is tracked by (assuming it's "only" a logical leak).


Not sure what you tried for a workaround, throwing some ideas out there:
1- explicitly resetting the stencil attachment before destroying either the renderbuffer or the framebuffer 
2- before destroying it, changing the framebuffer to use a packed depth_stencil attachment rather than a pure stencil one
3- using a stencil texture rather than a stencil renderbuffer
4- always using a packed depth_stencil attachment rather than pure stencil one

I think those are all things we could reasonably implement as a workaround in the command buffer implementation.


@Intel folks: any insight as to what causes the problem and/or a workaround would be very very useful, as a driver fix may take a while to trickle down to all users, so we'll likely need something in Chrome in the mean time even if a driver fix is in the way - the symptoms are bad enough.
Good point re. share groups - I checked and unfortunately the leak persists until all contexts in a share group are deleted.

FWIW, the leak happens after you glFramebufferTexture or glFramebufferRenderbuffer, not from just allocating the stencil buffer. I tried workarounds (1), (3), and (4) with no luck- I'll give (2) a shot.

Comment 64 by yang...@intel.com, May 8 2017

Pinged our MacOS driver team again via internal mail, and hope they can respond here soon. 
Project Member

Comment 65 by bugdroid1@chromium.org, May 9 2017

The following revision refers to this bug:
  https://skia.googlesource.com/skia/+/5c77975e4c00e18e644c72b56f369858acd11b15

commit 5c77975e4c00e18e644c72b56f369858acd11b15
Author: Eric Karl <ericrk@chromium.org>
Date: Tue May 09 17:41:25 2017

Add flag to avoid stencil buffers in Skia

Certain systems experience a leak in the GL driver associated with
stencil buffers. Attempts to avoid the leak (while still using stencil
buffers) dind't succeed. This patch adds a GrContextOption
fAvoidStencilBuffers. This disables certain path rendering modes, as
well as stencil based masking/clipping.

Bug:  713854 
Change-Id: Ifa6c0f2bd5ee395547bda9165d6c79d197ae8b8b
Reviewed-on: https://skia-review.googlesource.com/15253
Commit-Queue: Eric Karl <ericrk@chromium.org>
Reviewed-by: Eric Karl <ericrk@chromium.org>
Reviewed-by: Brian Salomon <bsalomon@google.com>

[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/tests/ResourceCacheTest.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/src/gpu/ops/GrMSAAPathRenderer.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/src/gpu/ops/GrDefaultPathRenderer.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/tools/gpu/GrContextFactory.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/src/gpu/ops/GrAAHairLinePathRenderer.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/tools/flags/SkCommonFlagsConfig.h
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/src/gpu/GrReducedClip.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/tools/gpu/GrContextFactory.h
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/src/gpu/gl/GrGLCaps.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/src/gpu/ops/GrStencilAndCoverPathRenderer.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/src/gpu/ops/GrAAConvexPathRenderer.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/tests/GLProgramsTest.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/tests/GpuSampleLocationsTest.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/src/gpu/GrRenderTargetContext.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/src/gpu/GrCaps.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/src/gpu/GrClipStackClip.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/src/gpu/gl/GrGLTextureRenderTarget.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/src/gpu/GrSoftwarePathRenderer.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/src/gpu/GrPathRenderer.h
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/src/gpu/GrGpu.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/tests/TestConfigParsing.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/include/gpu/GrContextOptions.h
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/tests/SurfaceTest.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/src/gpu/gl/GrGLCaps.h
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/src/gpu/ops/GrSmallPathRenderer.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/src/gpu/gl/GrGLRenderTarget.cpp
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/include/gpu/GrCaps.h
[modify] https://crrev.com/5c77975e4c00e18e644c72b56f369858acd11b15/tools/flags/SkCommonFlagsConfig.cpp

Project Member

Comment 66 by bugdroid1@chromium.org, May 10 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4f3b335015f66c266bc1ed11f7b22761907df6f4

commit 4f3b335015f66c266bc1ed11f7b22761907df6f4
Author: ericrk <ericrk@chromium.org>
Date: Wed May 10 21:17:44 2017

Add workaround for Mac stencil buffer leak

This change adds a new workaround, "avoid_stencil_buffers", which is
enabled on certain Macs with leaky drivers. This workaround currently
only forwards to Skia, enabling a mode where stencil buffers are
avoided.

BUG= 713854 
CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel;master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2866353002
Cr-Commit-Position: refs/heads/master@{#470712}

[modify] https://crrev.com/4f3b335015f66c266bc1ed11f7b22761907df6f4/cc/output/in_process_context_provider.cc
[modify] https://crrev.com/4f3b335015f66c266bc1ed11f7b22761907df6f4/cc/test/test_in_process_context_provider.cc
[modify] https://crrev.com/4f3b335015f66c266bc1ed11f7b22761907df6f4/gpu/command_buffer/common/capabilities.h
[modify] https://crrev.com/4f3b335015f66c266bc1ed11f7b22761907df6f4/gpu/command_buffer/service/gles2_cmd_decoder.cc
[modify] https://crrev.com/4f3b335015f66c266bc1ed11f7b22761907df6f4/gpu/config/gpu_driver_bug_list.json
[modify] https://crrev.com/4f3b335015f66c266bc1ed11f7b22761907df6f4/gpu/config/gpu_driver_bug_workaround_type.h
[modify] https://crrev.com/4f3b335015f66c266bc1ed11f7b22761907df6f4/gpu/ipc/common/gpu_command_buffer_traits_multi.h
[modify] https://crrev.com/4f3b335015f66c266bc1ed11f7b22761907df6f4/gpu/skia_bindings/BUILD.gn
[modify] https://crrev.com/4f3b335015f66c266bc1ed11f7b22761907df6f4/gpu/skia_bindings/grcontext_for_gles2_interface.cc
[modify] https://crrev.com/4f3b335015f66c266bc1ed11f7b22761907df6f4/gpu/skia_bindings/grcontext_for_gles2_interface.h
[modify] https://crrev.com/4f3b335015f66c266bc1ed11f7b22761907df6f4/services/ui/public/cpp/gpu/context_provider_command_buffer.cc
[modify] https://crrev.com/4f3b335015f66c266bc1ed11f7b22761907df6f4/ui/compositor/test/in_process_context_provider.cc

Now that a workaround has landed, we should watch Memory.Experimental.Gpu.PhysicalFootprint.MacOS to ensure that it reflects the expected drop. If so we, should merge #65 and #66 to M58.
Labels: -Hotlist-Merge-Review -Hotlist-Merge-Approved -merge-merged-3071 Merge-Request-58
The workaround appears to be greatly lowering memory usage, especially at high percentiles.

We still see some high-usage reports. This may be due to WebGL content (which can't be easily shielded from this leak).

Either way, the improvement is large enough that we should merge the existing fix to beta while we continue investigating.

Requesting merge for Skia change from #65 and Chrome change from #66.
Here's a graph of memory usage which shows the improvement. Given the low number of days the fix has been in, data is still pretty noisy:
https://uma.googleplex.com/p/chrome/timeline_v2/?sid=81e32d50b7e47e6dced87760f3267634
Labels: -Merge-Request-58 Merge-Request-59
Project Member

Comment 71 by sheriffbot@chromium.org, May 15 2017

Labels: -Merge-Request-59 Hotlist-Merge-Approved Merge-Approved-59
Your change meets the bar and is auto-approved for M59. Please go ahead and merge the CL to branch 3071 manually. Please contact milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), gkihumba@(ChromeOS), Abdul Syed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Please merge your change to M59 branch 3071 before 4:00 PM PT tomorrow, Tuesday (05/16 )so we can pick it for this week beta release. Thank you.
Project Member

Comment 73 by bugdroid1@chromium.org, May 16 2017

Labels: merge-merged-m59
The following revision refers to this bug:
  https://skia.googlesource.com/skia/+/cc13419addb1ca867a992f96ff4455e02cb829be

commit cc13419addb1ca867a992f96ff4455e02cb829be
Author: Eric Karl <ericrk@google.com>
Date: Tue May 16 19:04:24 2017

Add flag to avoid stencil buffers in Skia

Certain systems experience a leak in the GL driver associated with
stencil buffers. Attempts to avoid the leak (while still using stencil
buffers) dind't succeed. This patch adds a GrContextOption
fAvoidStencilBuffers. This disables certain path rendering modes, as
well as stencil based masking/clipping.
NOTREECHECKS=true
NOTRY=true
NOPRESUBMIT=true
Bug:  713854 
Change-Id: Ifa6c0f2bd5ee395547bda9165d6c79d197ae8b8b
Reviewed-On: https://skia-review.googlesource.com/15253
Commit-Queue: Eric Karl <ericrk@chromium.org>
Reviewed-By: Eric Karl <ericrk@chromium.org>
Reviewed-By: Brian Salomon <bsalomon@google.com>
Reviewed-on: https://skia-review.googlesource.com/17081
Commit-Queue: Eric Karl <ericrk@google.com>

[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/tests/ResourceCacheTest.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/src/gpu/ops/GrMSAAPathRenderer.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/src/gpu/ops/GrDefaultPathRenderer.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/tools/gpu/GrContextFactory.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/src/gpu/ops/GrAAHairLinePathRenderer.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/tools/flags/SkCommonFlagsConfig.h
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/src/gpu/GrReducedClip.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/tools/gpu/GrContextFactory.h
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/src/gpu/gl/GrGLCaps.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/src/gpu/ops/GrStencilAndCoverPathRenderer.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/src/gpu/ops/GrAAConvexPathRenderer.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/tests/GLProgramsTest.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/tests/GpuSampleLocationsTest.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/src/gpu/GrRenderTargetContext.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/src/gpu/GrCaps.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/src/gpu/GrClipStackClip.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/src/gpu/gl/GrGLTextureRenderTarget.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/src/gpu/GrSoftwarePathRenderer.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/src/gpu/GrPathRenderer.h
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/src/gpu/GrGpu.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/tests/TestConfigParsing.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/include/gpu/GrContextOptions.h
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/tests/SurfaceTest.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/src/gpu/gl/GrGLCaps.h
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/src/gpu/ops/GrSmallPathRenderer.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/src/gpu/gl/GrGLRenderTarget.cpp
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/include/gpu/GrCaps.h
[modify] https://crrev.com/cc13419addb1ca867a992f96ff4455e02cb829be/tools/flags/SkCommonFlagsConfig.cpp

Project Member

Comment 74 by bugdroid1@chromium.org, May 16 2017

Labels: -merge-approved-59 merge-merged-3071
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/84f3edbaa712fc88efa0ae9a26a3953abb99b752

commit 84f3edbaa712fc88efa0ae9a26a3953abb99b752
Author: Eric Karl <ericrk@google.com>
Date: Tue May 16 21:50:58 2017

Add workaround for Mac stencil buffer leak

This change adds a new workaround, "avoid_stencil_buffers", which is
enabled on certain Macs with leaky drivers. This workaround currently
only forwards to Skia, enabling a mode where stencil buffers are
avoided.

BUG= 713854 
CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel;master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2866353002
Cr-Original-Commit-Position: refs/heads/master@{#470712}
Review-Url: https://codereview.chromium.org/2891563002 .
Cr-Commit-Position: refs/branch-heads/3071@{#589}
Cr-Branched-From: a106f0abbf69dad349d4aaf4bcc4f5d376dd2377-refs/heads/master@{#464641}

[modify] https://crrev.com/84f3edbaa712fc88efa0ae9a26a3953abb99b752/cc/output/in_process_context_provider.cc
[modify] https://crrev.com/84f3edbaa712fc88efa0ae9a26a3953abb99b752/cc/test/test_in_process_context_provider.cc
[modify] https://crrev.com/84f3edbaa712fc88efa0ae9a26a3953abb99b752/gpu/command_buffer/common/capabilities.h
[modify] https://crrev.com/84f3edbaa712fc88efa0ae9a26a3953abb99b752/gpu/command_buffer/service/gles2_cmd_decoder.cc
[modify] https://crrev.com/84f3edbaa712fc88efa0ae9a26a3953abb99b752/gpu/config/gpu_driver_bug_list.json
[modify] https://crrev.com/84f3edbaa712fc88efa0ae9a26a3953abb99b752/gpu/config/gpu_driver_bug_workaround_type.h
[modify] https://crrev.com/84f3edbaa712fc88efa0ae9a26a3953abb99b752/gpu/ipc/common/gpu_command_buffer_traits_multi.h
[modify] https://crrev.com/84f3edbaa712fc88efa0ae9a26a3953abb99b752/gpu/skia_bindings/BUILD.gn
[modify] https://crrev.com/84f3edbaa712fc88efa0ae9a26a3953abb99b752/gpu/skia_bindings/grcontext_for_gles2_interface.cc
[modify] https://crrev.com/84f3edbaa712fc88efa0ae9a26a3953abb99b752/gpu/skia_bindings/grcontext_for_gles2_interface.h
[modify] https://crrev.com/84f3edbaa712fc88efa0ae9a26a3953abb99b752/services/ui/public/cpp/gpu/context_provider_command_buffer.cc
[modify] https://crrev.com/84f3edbaa712fc88efa0ae9a26a3953abb99b752/ui/compositor/test/in_process_context_provider.cc

Project Member

Comment 75 by bugdroid1@chromium.org, May 17 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f2691533640127689f7dd59f3b322c38c5f83af0

commit f2691533640127689f7dd59f3b322c38c5f83af0
Author: Eric Karl <ericrk@google.com>
Date: Wed May 17 01:14:13 2017

Add missing include to fix build.

This was missed due to a merge conflict when merging
84f3edbaa712fc88efa0ae9a26a3953abb99b752

TBR=vmiura@chromium.org
BUG= 713854 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2885883003 .
Cr-Commit-Position: refs/branch-heads/3071@{#592}
Cr-Branched-From: a106f0abbf69dad349d4aaf4bcc4f5d376dd2377-refs/heads/master@{#464641}

[modify] https://crrev.com/f2691533640127689f7dd59f3b322c38c5f83af0/gpu/skia_bindings/grcontext_for_gles2_interface.cc

Comment 76 by yang...@intel.com, May 18 2017

Our driver team told me this bug had been fixed by Apple. FYI if not all of you were aware of this. 
Fantastic! Thanks for the update. Do you know the driver or OS version that will contain the fix?

Comment 78 by yang...@intel.com, May 18 2017

The radar system is not friendly to us as we couldn't visit it at all. I think you reported the radar and you can ping Apple there for details. 
Cc: cblume@chromium.org
Reminder that M59 Stable is launch is coming soon (less than 2 weeks)! Your bug is labelled as Stable ReleaseBlock, pls make sure to land the fix and get it merged into the release branch ASAP so it gets enough baking time in Beta (before Stable promotion). Thank you!
ericrk@ - is this just waiting on an external fix? Can we remove the ReleaseBlock-Stable label for this or is this truly a RB? M59 Stable is being targeted for this week.  
Labels: -ReleaseBlock-Stable
Status: ExternalDependency (was: Started)
This is worked around, just keeping this open to track the external fix. Removing labels

Comment 83 by bak...@gmail.com, Jun 22 2017

Added entries to https://docs.google.com/spreadsheets/d/1ItijZX3RYs4MRTUybFQZ-l3sPARSSUDktmTh69irxyQ/edit?usp=sharing
where 
Mac mini (Late 2014)	10.12.5	VENDOR=0x0000, DEVICE=0x0000	10.25.13
are showing gpu leaks.

Experiencing leaks with Chrome upgrade to 59 as well

Comment 84 by m...@zagom.net, Jul 5 2017

I see this behavior on a large proportion of Mac laptops used by developers. I also very commonly see these virtual memory leaks end up causing macOS WindowServer to show a black screen and never recover (even if you kill the Chrome gpu-process pids, etc). It's very difficult to explain to users how virtual memory works, how to monitor it, why they might need to quit chrome from a leak, how to enable functionality to mark tabs go inactive, etc.

Why isn't this a RB contender? 

I've escalated this to Apple enterprise to track status of a driver fix, and I can't comment on the exact feedback from that ticket on when this might be fixed, but I can say it won't be fixed soon and generally they do not fix video drivers for older OS versions so there will still be a large number of affected users even if a fix ships for 10.12.x/10.13.x.
What Chrome/macOS versions are you seeing this on? We've merged a workaround that should deal with the brunt of this issue to M59.
We're still seeing this issue on 59.0.3171.115

https://bugs.chromium.org/p/chromium/issues/detail?id=700928#c127

Attaching vmmap from GPU process, with 40G swapped memory.
vmmap_12640.txt
2.3 MB View Download
Status: Assigned (was: ExternalDependency)
ericrk: Can you investigate?

Re-opening bug.
If we really can't come up with a better solution, I'd even be okay with a short-term "restart GPU process when it hits X GB memory" policy. The current behavior is untenable.
UMA stats for Canary channel, macOS 10.12, vendor = 0x8086 [Intel], device = 0x0d26 [Iris Pro 1536MB], date = 7/11

https://uma.googleplex.com/p/chrome/histograms/?endDate=20170711&dayCount=1&histograms=Memory.Experimental.Gpu2.CommandBuffer%2CMemory.Experimental.Gpu2.PrivateMemoryFootprint&fixupData=true&showMax=true&analysis=0.5%200.99&filters=platform%2Ceq%2CM%2Cchannel%2Ceq%2C1%2Cosflavor%2Ceq%2CMacOS10.12%2Cgpu_v_id%2Ceq%2C32902%2Cgpu_d_id%2Ceq%2C3366%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial

50th percentile:
CommandBuffer: 252
PrivateMemoryFootprint: 541

99th percentile:
CommandBuffer:6275
PrivateMemoryFootprint: 10302

When I restrict to version = 61.0.3153.0, [released 7/10, so maximum uptime = 24 hours]
50th percentile:
CommandBuffer: 298
PrivateMemoryFootprint: 499.6

99th percentile:
CommandBuffer:6391
PrivateMemoryFootprint: 8544

So at the 99th percentile, we're probably leaking ~2GB a day. This seems about right, given that primiano@ managed to leak 40GB over 2 days [see launch time and current date in vmmap in c#86].

Comment 90 by piman@chromium.org, Jul 12 2017

"CommandBuffer" is memory as we count it, right? 6GB would seem unrelated to the driver leak.
Yes, the difference between CommandBuffer and PrivateMemoryFootprint is very likely pure leak [so in this case 2.2GB, which will grow over time]. 

CommandBuffer itself is also high, but is *less* likely to grow over time? Maybe there's a CommandBuffer leak as well.

Comment 92 by piman@chromium.org, Jul 12 2017

I wouldn't assume the difference is necessarily a leak. Our accounting is  approximate. We have no way of counting various padding, shadow buffers, or other metadata associated with textures and rendering surfaces (e.g. hierarchical Z) that are purely internal to the driver, and we don't know anything about it.
To piman's point, I set GPU vendor id != 0x8086, and saw:

99th percentile:
CommandBuffer 6958
PrivateMemoryFootprint 9036

So it seems likely the difference is caused by uncounted-for-memory. That being said, this is still a huge amount of memory for the GPU process to be using.
I've tried a number of sites, and I can't seem to get the leak to trigger without WebGL. There may still be a Skia/Gpu Raster leak, but I've tried a number of sites and can't find one. 

For WebGL, I can trigger the leak pretty reliably by navigating to maps.google.com. Note that I need to be logged in in order for the leak to occur (with my personal, non-corp acct), not sure if my acct triggers a different rendering mode, or if some part of the logged-in UI is different.

For each logged-in maps window I open, I seem to leak 20-30MB. If I increase my screen resolution to the maximum my mac supports, I leak 65-85MB. This seems like it could pile up quickly.

erikchen's comment in #88 seems like a reasonable approach - we could try to minimize damage by attempting to kill the GPU proc when no webgl windows are open, and only kill with webgl if we cross a higher threshold.

primiano@, are there any WebGL sites you use heavily, or have been using heavily the last few days when you encountered this leak?
Blockedon: 741854
Added a blocking bug - kbr@ will take a look at working around this issue in WebGL. We should re-evaluate this bug once that blocker is fixed, to make sure there are no lingering instances of the leak.

Comment 97 by kbr@chromium.org, Jul 14 2017

For the record: per the opengl_stencil_leak_tester from #59, this leak is fixed in the current 10.13 Beta (17A291j).

(Could the sources of that version be added to this bug?)

Project Member

Comment 98 by bugdroid1@chromium.org, Jul 19 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/af62f4da307a373b8c74fd986b20d48fc25f7775

commit af62f4da307a373b8c74fd986b20d48fc25f7775
Author: Eric Karl <ericrk@chromium.org>
Date: Wed Jul 19 21:39:18 2017

Expand stencil buffer workaround to apply to non-active GPU

We avoid stencil buffers on certain Intel GPUs with a known leak. From
looking at UMA, it appears that this leak may apply even when the GPU
isn't the active one (on a multi-GPU system). Expanding the workaround
to handle this case as well.

Bug:  713854 
Change-Id: I880c83fcf78b617506b3f8816dedf13fe133b2c1
Reviewed-on: https://chromium-review.googlesource.com/578376
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Reviewed-by: Zhenyao Mo <zmo@chromium.org>
Commit-Queue: Eric Karl <ericrk@chromium.org>
Cr-Commit-Position: refs/heads/master@{#487997}
[modify] https://crrev.com/af62f4da307a373b8c74fd986b20d48fc25f7775/gpu/config/gpu_driver_bug_list.json

Comment 99 by kbr@chromium.org, Jul 25 2017

Blockedon: 748148
Project Member

Comment 100 by bugdroid1@chromium.org, Jul 25 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/46f405d5f7b61a985de223e2fde5b548486531eb

commit 46f405d5f7b61a985de223e2fde5b548486531eb
Author: Kenneth Russell <kbr@chromium.org>
Date: Tue Jul 25 08:03:00 2017

Revert "Expand stencil buffer workaround to apply to non-active GPU"

This reverts commit af62f4da307a373b8c74fd986b20d48fc25f7775.

Reason for revert: speculative revert for rendering breakage on Canary on AMD based MacBook Pros:  http://crbug.com/748148  .

Original change's description:
> Expand stencil buffer workaround to apply to non-active GPU
> 
> We avoid stencil buffers on certain Intel GPUs with a known leak. From
> looking at UMA, it appears that this leak may apply even when the GPU
> isn't the active one (on a multi-GPU system). Expanding the workaround
> to handle this case as well.
> 
> Bug:  713854 
> Change-Id: I880c83fcf78b617506b3f8816dedf13fe133b2c1
> Reviewed-on: https://chromium-review.googlesource.com/578376
> Reviewed-by: Kenneth Russell <kbr@chromium.org>
> Reviewed-by: Zhenyao Mo <zmo@chromium.org>
> Commit-Queue: Eric Karl <ericrk@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#487997}

TBR=zmo@chromium.org,kbr@chromium.org,ericrk@chromium.org
NOTRY=true

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug:  713854 
Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
Change-Id: I167636eb78aa6e5504635a9d0db3ab5d07359531
Reviewed-on: https://chromium-review.googlesource.com/584127
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#489252}
[modify] https://crrev.com/46f405d5f7b61a985de223e2fde5b548486531eb/gpu/config/gpu_driver_bug_list.json

Blockedon: 749438
Project Member

Comment 102 by bugdroid1@chromium.org, Aug 2 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d676a966d41ac183b4b87ae494c36f1744b048f2

commit d676a966d41ac183b4b87ae494c36f1744b048f2
Author: Victor Miura <vmiura@chromium.org>
Date: Wed Aug 02 18:25:28 2017

Disable MSAA for slow paths if avoid_stencil_buffers workaround is set.

avoid_stencil_buffers blocks MSAA support in Skia.  If the compositor requests
MSAA surfaces, rendering will fail.

R=ericrk@chromium.org
BUG= 749438 , 713854 

Cq-Include-Trybots: master.tryserver.blink:linux_trusty_blink_rel
Change-Id: I887888845a2c1ee18dbccff6e96aa6cd1eac3df2
Reviewed-on: https://chromium-review.googlesource.com/597321
Reviewed-by: Eric Karl <ericrk@chromium.org>
Commit-Queue: Victor Miura <vmiura@chromium.org>
Cr-Commit-Position: refs/heads/master@{#491446}
[modify] https://crrev.com/d676a966d41ac183b4b87ae494c36f1744b048f2/cc/test/test_web_graphics_context_3d.h
[modify] https://crrev.com/d676a966d41ac183b4b87ae494c36f1744b048f2/cc/trees/layer_tree_host_impl.cc
[modify] https://crrev.com/d676a966d41ac183b4b87ae494c36f1744b048f2/cc/trees/layer_tree_host_impl_unittest.cc

Project Member

Comment 103 by bugdroid1@chromium.org, Aug 2 2017

Labels: merge-merged-3163
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4fc2252e12036209627c848f18352acb0932f6a0

commit 4fc2252e12036209627c848f18352acb0932f6a0
Author: Victor Miura <vmiura@chromium.org>
Date: Wed Aug 02 18:41:52 2017

Disable MSAA for slow paths if avoid_stencil_buffers workaround is set.

avoid_stencil_buffers blocks MSAA support in Skia.  If the compositor requests
MSAA surfaces, rendering will fail.

R=ericrk@chromium.org
TBR=vmiura@chromium.org
BUG= 749438 , 713854 

(cherry picked from commit d676a966d41ac183b4b87ae494c36f1744b048f2)

Cq-Include-Trybots: master.tryserver.blink:linux_trusty_blink_rel
Change-Id: I887888845a2c1ee18dbccff6e96aa6cd1eac3df2
Reviewed-on: https://chromium-review.googlesource.com/597321
Reviewed-by: Eric Karl <ericrk@chromium.org>
Commit-Queue: Victor Miura <vmiura@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#491446}
Reviewed-on: https://chromium-review.googlesource.com/598630
Reviewed-by: Victor Miura <vmiura@chromium.org>
Cr-Commit-Position: refs/branch-heads/3163@{#242}
Cr-Branched-From: ff259bab28b35d242e10186cd63af7ed404fae0d-refs/heads/master@{#488528}
[modify] https://crrev.com/4fc2252e12036209627c848f18352acb0932f6a0/cc/test/test_web_graphics_context_3d.h
[modify] https://crrev.com/4fc2252e12036209627c848f18352acb0932f6a0/cc/trees/layer_tree_host_impl.cc
[modify] https://crrev.com/4fc2252e12036209627c848f18352acb0932f6a0/cc/trees/layer_tree_host_impl_unittest.cc

Project Member

Comment 104 by bugdroid1@chromium.org, Aug 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6782ba2e36250c105b86654487e3827155afc8b1

commit 6782ba2e36250c105b86654487e3827155afc8b1
Author: Eric Karl <ericrk@chromium.org>
Date: Thu Aug 03 00:31:00 2017

Re-land Expand stencil buffer workaround to apply to non-active GPU

We avoid stencil buffers on certain Intel GPUs with a known leak. From
looking at UMA, it appears that this leak may apply even when the GPU
isn't the active one (on a multi-GPU system). Expanding the workaround
to handle this case as well.

R=vmiura

Bug:  713854 
Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
Change-Id: I32eb73416eae4740635f50922b0bd673e75b9af0
Reviewed-on: https://chromium-review.googlesource.com/599123
Reviewed-by: Victor Miura <vmiura@chromium.org>
Commit-Queue: Eric Karl <ericrk@chromium.org>
Cr-Commit-Position: refs/heads/master@{#491568}
[modify] https://crrev.com/6782ba2e36250c105b86654487e3827155afc8b1/gpu/config/gpu_driver_bug_list.json

Issue 708649 has been merged into this issue.
I'm at 53GB right now.

MacOS 10.12.6
Chrome Version 60.0.3112.113 (Official Build) (64-bit)
Intel Iris Pro 1536 MB

vm_stat:
 https://paste.googleplex.com/5559179787894784?raw
about:gpu:
   https://paste.googleplex.com/6493082040139776?raw
vmmap -v interleaved 31938:
  See attachment
for_erik_chen.txt
2.9 MB View Download
I don't believe this fix was merged to M60, only M61 and M62.
Status: Fixed (was: Assigned)
Yup, this should be addressed in M61. Please let me know if you continue seeing issues after upgrading.
Showing comments 9 - 108 of 108 Older

Sign in to add a comment