New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 700928 link

Starred by 15 users

Issue metadata

Status: Fixed
Owner:
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 1
Type: Bug

Blocking:
issue 717919



Sign in to add a comment

JSON.parse() leaks wired memory

Project Member Reported by a...@chromium.org, Mar 13 2017

Issue description

I left my Mac on Friday running a dev channel Chrome, 58.0.3029.14. I arrived this morning to find an out-of-memory box.

I force-quit a few small apps, enough to get Activity Monitor running. Of my 32 GB of memory, 25 GB was wired.

So I force-quit Chrome. The wired memory use dropped immediately to 3.5 GB, and after about a minute settled to 2.6 GB.

This isn't beefy Macs only. My MacBook at home has 8 GB of memory, and I've seen it start thrashing to swap, and have seen 4 GB of wired memory. I quit Chrome and had it drop to 2GB instantly.

I'm not up to speed with the current crop of Mac issues. Do we know about this one?
 
IMG_2658.JPG
6.1 MB View Download
IMG_2659.JPG
7.5 MB View Download
IMG_2660.JPG
6.1 MB View Download
IMG_2662.JPG
7.5 MB View Download
Showing comments 40 - 139 of 139 Older

Comment 40 by a...@chromium.org, Mar 29 2017

I just had this happen to my personal machine. 8GB memory, it started going south with 4.5GB wired memory. I started killing processes one at a time in the Chrome Task Manager.

I killed all the tasks whose value in the "Memory" column was more than 100MB. The wired memory didn't budge. So I collected up a bunch of 80MB processes and killed them all at once; the wired memory dropped.

So the wired memory isn't accounted for in the "memory" column in the Task Manager.

Comment 41 by a...@chromium.org, Mar 29 2017

Also, I tried killing the GPU process before the batch of 80MB processes. That didn't help. I don't think it's the GPU process. 
Project Member

Comment 42 by bugdroid1@chromium.org, Mar 30 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6a44d440a7e5057f2ad1637f26802e522eeb51ab

commit 6a44d440a7e5057f2ad1637f26802e522eeb51ab
Author: erikchen <erikchen@chromium.org>
Date: Thu Mar 30 23:23:19 2017

Add new wired memory metric to memory-infra dumps.

On macOS, wired memory cannot be evicted from physical memory [swapped, purged,
compressed]. We are observing a reproducible instance where Chrome appears to be
responsible for massive amounts of wired memory. It's unclear whether this wired
memory is being allocated by the kernel for resources associated with the task,
or by the task itself. This CL will help us distinguish between these cases.

BUG= 700928 

Review-Url: https://codereview.chromium.org/2782503002
Cr-Commit-Position: refs/heads/master@{#460924}

[modify] https://crrev.com/6a44d440a7e5057f2ad1637f26802e522eeb51ab/base/process/process_metrics.h
[modify] https://crrev.com/6a44d440a7e5057f2ad1637f26802e522eeb51ab/base/process/process_metrics_mac.cc
[modify] https://crrev.com/6a44d440a7e5057f2ad1637f26802e522eeb51ab/base/process/process_metrics_unittest.cc
[modify] https://crrev.com/6a44d440a7e5057f2ad1637f26802e522eeb51ab/chrome/browser/task_manager/sampling/task_group_sampler.cc
[modify] https://crrev.com/6a44d440a7e5057f2ad1637f26802e522eeb51ab/components/tracing/common/process_metrics_memory_dump_provider.cc

Project Member

Comment 43 by bugdroid1@chromium.org, Mar 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b8e248e67b573b9192610580db1382b171fc5f4f

commit b8e248e67b573b9192610580db1382b171fc5f4f
Author: stgao <stgao@chromium.org>
Date: Fri Mar 31 01:12:53 2017

Revert of Add new wired memory metric to memory-infra dumps. (patchset #7 id:120001 of https://codereview.chromium.org/2782503002/ )

Reason for revert:
New test SystemMetricsTest.LockedBytes failed on Waterfall

https://findit-for-me.appspot.com/waterfall/failure?url=https://build.chromium.org/p/chromium.memory/builders/Mac%20ASan%2064%20Tests%20%281%29/builds/28540

Original issue's description:
> Add new wired memory metric to memory-infra dumps.
>
> On macOS, wired memory cannot be evicted from physical memory [swapped, purged,
> compressed]. We are observing a reproducible instance where Chrome appears to be
> responsible for massive amounts of wired memory. It's unclear whether this wired
> memory is being allocated by the kernel for resources associated with the task,
> or by the task itself. This CL will help us distinguish between these cases.
>
> BUG= 700928 
>
> Review-Url: https://codereview.chromium.org/2782503002
> Cr-Commit-Position: refs/heads/master@{#460924}
> Committed: https://chromium.googlesource.com/chromium/src/+/6a44d440a7e5057f2ad1637f26802e522eeb51ab

TBR=primiano@chromium.org,mark@chromium.org,thestig@chromium.org,erikchen@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG= 700928 

Review-Url: https://codereview.chromium.org/2786313002
Cr-Commit-Position: refs/heads/master@{#460972}

[modify] https://crrev.com/b8e248e67b573b9192610580db1382b171fc5f4f/base/process/process_metrics.h
[modify] https://crrev.com/b8e248e67b573b9192610580db1382b171fc5f4f/base/process/process_metrics_mac.cc
[modify] https://crrev.com/b8e248e67b573b9192610580db1382b171fc5f4f/base/process/process_metrics_unittest.cc
[modify] https://crrev.com/b8e248e67b573b9192610580db1382b171fc5f4f/chrome/browser/task_manager/sampling/task_group_sampler.cc
[modify] https://crrev.com/b8e248e67b573b9192610580db1382b171fc5f4f/components/tracing/common/process_metrics_memory_dump_provider.cc

Project Member

Comment 44 by bugdroid1@chromium.org, Mar 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/863e4742a37cc39631f68b80408d1f32b408ef54

commit 863e4742a37cc39631f68b80408d1f32b408ef54
Author: erikchen <erikchen@chromium.org>
Date: Fri Mar 31 19:57:43 2017

Add new wired memory metric to memory-infra dumps.

On macOS, wired memory cannot be evicted from physical memory [swapped, purged,
compressed]. We are observing a reproducible instance where Chrome appears to be
responsible for massive amounts of wired memory. It's unclear whether this wired
memory is being allocated by the kernel for resources associated with the task,
or by the task itself. This CL will help us distinguish between these cases.

BUG= 700928 

Review-Url: https://codereview.chromium.org/2782503002
Cr-Original-Commit-Position: refs/heads/master@{#460924}
Committed: https://chromium.googlesource.com/chromium/src/+/6a44d440a7e5057f2ad1637f26802e522eeb51ab
Review-Url: https://codereview.chromium.org/2782503002
Cr-Commit-Position: refs/heads/master@{#461192}

[modify] https://crrev.com/863e4742a37cc39631f68b80408d1f32b408ef54/base/process/process_metrics.h
[modify] https://crrev.com/863e4742a37cc39631f68b80408d1f32b408ef54/base/process/process_metrics_mac.cc
[modify] https://crrev.com/863e4742a37cc39631f68b80408d1f32b408ef54/base/process/process_metrics_unittest.cc
[modify] https://crrev.com/863e4742a37cc39631f68b80408d1f32b408ef54/chrome/browser/task_manager/sampling/task_group_sampler.cc
[modify] https://crrev.com/863e4742a37cc39631f68b80408d1f32b408ef54/components/tracing/common/process_metrics_memory_dump_provider.cc

Comment 45 by a...@chromium.org, Apr 3 2017

Random data point. Home laptop was at 3.6G wired. I killed tabs, and I hit one tab, a Hangouts tab that was in the Task Manager as taking 282M, that when killed caused wired memory to drop by a gig.

Dunno what to make of that.
There are some hangouts issues currently being investigated:
https://bugs.chromium.org/p/chromium/issues/detail?id=702011

But none make me immediately think of wired memory - thanks for being so observant. Which column was showing 282M?

Comment 47 by a...@chromium.org, Apr 3 2017

The column that just says "Memory".

Your memory-infra CL is in 59.0.3059.0. As soon as it hits dev, I'll start grabbing those dumps.
Thanks - I'm very curious to see if we'll get any numbers at all. There are two ways memory can be wired [user & kernel]. We don't have much insight into the latter, and the logging I added will only report the former. This will at least help us narrow down the potential sources of the leak. 
Cc: erikc...@chromium.org
Owner: ericrk@chromium.org
Last night I tried to see if I could reproduce the problem. I don't believe that wired means resident - I think it has the same meaning as on other *NIXes, which is RAM that cannot be paged out. In my studying up on virtual memory management this week I learned that IOKit creates wired memory.

I followed these steps:

1. Launch Chrome with empty user-data-dir, no extensions
2. Use vmmap to inspect the GPU process
3. Create a new window
4. Search for shutterstock
5. Click the shutterstock.com link
6. On the shutterstock.com page, type nature and press return
7. Scroll the page to the bottom
8. Close the window
9. Use vmmap to inspect the GPU process
10. Repeat steps 3-9 (using a different search term on shutterstock each time)

I noticed the following kind of progression for the IOKit regions:

~59.0.3065.0 (462703)
                                VIRTUAL RESIDENT   REGION 
REGION TYPE                        SIZE     SIZE    COUNT (non-coalesced) 
===========                     ======= ========  ======= 
IOKit                             35.7M    31.3M       33 
IOKit                            193.2M   183.5M       89 
IOKit                            227.9M   218.2M      117 
IOKit                            230.4M   220.6M      119 
IOKit                            240.1M   230.3M      129 

If these were cached memory buffers I would expect them to be reused to render subsequent windows. Instead, the region count grows, as does the resident size.

Looking at a much earlier version of Chrome I saw a different progression:

52.0.2743.91 (394942)
                                VIRTUAL RESIDENT   REGION 
REGION TYPE                        SIZE     SIZE    COUNT (non-coalesced) 
===========                     ======= ========  ======= 
IOKit                             18.9M    16.8M       23***
IOKit                             37.9M    26.0M       28 
IOKit                             37.3M    26.3M       37 
IOKit                             74.0M    57.8M       48 
IOKit                             86.4M    58.7M       51 
IOKit                             93.0M    63.6M       52 

*** When I collected the data for 52.0.2743.91 I think I didn't run vmmap right after startup and missed the initial state. But rerunning 52.0.2743.91 I get numbers on this order every time.

The region count and resident size still grow in M52, but not as dramatically.

vmmap shows a typical IOKit region as:

IOKit                  00000001064ed000-00000001065ed000 [ 1024K  1024K] rw-/rw- SM=SHM  

so these are shared regions. I ran my same test while inspecting the browser process and window server but did not see a similar rise in IOKit region memory in those processes. I tried doing the same with the kernel but vmmap gave me errors.

Using "low resident and low region count at start" as good, and "higher resident and region count at start" as bad, I ran a bisect from 52.0.2743.91 to 59.0.3065.0. I got the following change list:

https://chromium.googlesource.com/chromium/src/+log/40c05d919958da304af825c140ee76a8e386c94b..7bdc0a395cbdfe329aaab71579e769fa0a6c6fa8

and the cl that stands out is

f91789a Enable GPU Raster for Mac waterfall by ericrk ยท 11 months ago

I also confirmed that IOKit memory does not grow dramatically in M59 when I disable GPU raster. Given that region counts grew before GPU raster was turned on, I wonder if GPU raster is tickling an existing bug rather than introducing a new one. The fact that these are IOKit regions, and that this is the GPU process, makes me wonder if these are something like DMA regions used when communicating with the GPU.

Assigning to erickr@ for his thoughts.

 
Thanks for the thorough investigation! I took your basic approach and tried to dig into this memory a bit more.

I did a bit of reading, and it sounds like IOKit memory does get allocated for various OpenGL commands (iOS article here: https://welcome.totheinter.net/2014/05/31/tracking-iokit-dirty-memory-in-ios-using-instruments/, but it seems to apply to OSX as well). Given this, it would make sense that GPU raster allocates a bit more. The question is why this number never seems to go down, even when GPU raster deletes GL objects.

After a bit more reading, it sounds like this memory is kept around speculatively by IOKit / the driver, in order to avoid expensive allocations if memory is needed again. However it also sounds like this memory should be purgeable on memory pressure. To try to better understand the memory usage, I ran an experiment similar to yours above, but also inserted some memory pressure signals and tried backgrounding and closing the shutterstock tab. See the results for GPU and SW below:

GPU:
-----------------------------------------------------------------------------------------------------------------------

                                VIRTUAL RESIDENT    DIRTY  SWAPPED VOLATILE   NONVOL    EMPTY   REGION
REGION TYPE                        SIZE     SIZE     SIZE     SIZE     SIZE     SIZE     SIZE    COUNT (non-coalesced)
===========                     ======= ========    =====  ======= ========   ======    =====  ======= 
<< Load Chrome >>
IOKit                             28.7M    24.0M    24.0M       0K    8704K    13.5M       0K       73 
<< Load about:gpu >>
IOKit                             51.6M    46.4M    46.4M       0K    9120K    35.0M       0K       85 
<< Load shutterstock.com >>
IOKit                            102.6M    97.9M    97.9M       0K    9888K    85.2M       0K      112
<< search puppies >>
IOKit                            127.8M   114.8M   114.8M       0K    1440K   109.6M       0K      137 
<< search cats >>
IOKit                            149.4M   139.3M   139.3M       0K    19.8M   115.7M       0K      151 
<< search dogs >>
IOKit                            148.8M   138.9M   138.9M       0K    19.8M   115.3M       0K      163 
<< search shi tzu >>
IOKit                            165.8M   155.8M   155.8M       0K    19.8M   132.2M       0K      186 
<< send memory pressure signal >>
IOKit                            165.7M   124.0M   124.0M       0K       0K   120.2M    31.8M      182 
<< background the shutterstock tab (open new tab page) >>
IOKit                            179.0M   138.2M   138.2M       0K    58.7M    75.5M    23.8M      200 
<< send memory pressure signal >>
IOKit                            179.0M    79.5M    79.5M       0K       0K    75.5M    82.5M      200 
<< close shutterstock tab >>
IOKit                            179.0M    79.5M    79.5M       0K       0K    75.5M    82.5M      193 
<< send memory pressure signal >>
IOKit                            179.0M    79.5M    79.5M       0K       0K    75.5M    82.5M      193 


SW:
-----------------------------------------------------------------------------------------------------------------------

                                VIRTUAL RESIDENT    DIRTY  SWAPPED VOLATILE   NONVOL    EMPTY   REGION 
REGION TYPE                        SIZE     SIZE     SIZE     SIZE     SIZE     SIZE     SIZE    COUNT (non-coalesced) 
===========                     ======= ========    =====  ======= ========   ======    =====  =======
<< Load Chrome >>
IOKit                             17.2M    12.5M    12.5M       0K     256K    10.5M       0K       72 
<< Load about:gpu >>
IOKit                             29.4M    24.7M    24.7M       0K     256K    22.8M       0K       87 
<< Load shutterstock.com >>
IOKit                             57.1M    52.4M    52.4M       0K     256K    50.4M       0K      120 
<< search puppies >>
IOKit                             72.3M    62.3M    62.3M       0K    1024K    58.6M       0K      104 
<< search cats >>
IOKit                             85.3M    75.3M    75.3M       0K    1024K    71.6M       0K      117 
<< search dogs >>
IOKit                             93.3M    83.3M    83.3M       0K    1024K    79.6M       0K      125 
<< search shi tzu >>
IOKit                             73.5M    63.5M    63.5M       0K    1024K    59.8M       0K      106 
<< send memory pressure signal >>
IOKit                             73.5M    62.5M    62.5M       0K       0K    59.8M    1024K      106 
<< background the shutterstock tab (open new tab page) >>
IOKit                             93.8M    75.8M    75.8M       0K       0K    72.0M    1024K      118 
<< send memory pressure signal >>
IOKit                             93.8M    75.8M    75.8M       0K       0K    72.0M    1024K      118 
<< close shutterstock tab >>
IOKit                             95.5M    77.5M    77.5M       0K       0K    73.8M    1024K      111 
<< send memory pressure signal >>
IOKit                             95.5M    77.5M    77.5M       0K       0K    73.8M    1024K      111 


Results:
-----------------------------------------------------------------------------------------------------------------------

Looking at these results, it appears that GPU and SW actually have similar amounts of memory in-use (non-purgeable).
After backgrounding the shutterstock tab and sending a memory pressure signal, SW raster has 75.8 MB resident and GPU
raster has 79.5, which seems more or less comparable.

Note that in this case, GPU raster shows a larger virtual size, which is accounted for by the 82.5MB "EMPTY SIZE" in
the table above. From Apple's page here: https://developer.apple.com/library/content/documentation/Performance/Conceptual/ManagingMemory/Articles/VMPages.html,
"Empty (NUL) sharing implies that the page does not really exist in physical memory."

Given this, I'm not sure that this is the source of our memory leak or memory pressure (also given that avi@ killed
the GPU proc in #41, and that didn't help). If we encounter this again, it might be useful to try "vmmap -v" to make sure
the "EMTPY SIZE" lines up with what I expect (mostly empty).
Owner: erikc...@chromium.org
Rats! I thought I had found something. Thank you ericrk@ for taking a look.

Besides killing the GPU not having an effect for avi@ with wired memory, I think you also saw previously that he was running with GPU raster disabled (looking back over the comments), so another clue that GPU raster should not be involved.

 
Cc: primiano@chromium.org nduca@chromium.org
nduca had the same symptoms on their machine and I was able to do some live debugging. His issue was caused by a *massive* leak in the GPU process (~50GB), which is invisible to Activity Monitor and chrome://tracing.

I suspect this caused by a bug fixed in M58, but could not verify. https://bugs.chromium.org/p/chromium/issues/detail?id=692074#c12

Here are my observations:
1) The GPU process is leaking ~15000 non-IOSurface textures. averaging ~3.5MB each. These textures are never being used, and are all compressed and/or swapped. 
2) Killing the GPU process releases ~56GB of compressed memory, which only occupies ~400MB of physical memory.
3) This entire time, while the system is unusable, the activity monitor shows green across the board. The global numbers look fine in chrome://tracing and Activity Monitor. The only place to easily observe the problems is vm_stat, whose row "pages stored in compressor" shows the leak.

avi: When/if you experience this problem again, please: 
1) run vm_stat. 
2) Use "sudo vmmap -v -interleaved <pid>" on the GPU process.
3) Kill the gpu process. Repeat 1-2.

The consistent memory metrics we're working on would have displayed the accurate memory usage of the GPU process (50GB+). 
nduca_gpu_leak_debugging_archive.zip
6.9 MB Download
Cc: ojan@chromium.org
Bunch of question to improve the instrumentation

> The global numbers look fine in chrome://tracing and Activity Monitor.
For cases like this, isn't there any signal showing up in the mmaps instrumentation   that you (erikchen) recently added to Mac OSX? It's not fully clear to me whether those 56 GB would have showed in the "swapped" column. But at very least I would have expected them to show up in the "virtual size" column though. 56 GB of virtual memory should be a red herring, no? Is there something we can improve on this side?
I took a look at the attachment, but unfortunately nat's trace comes form a chrome version < the one when erik's introduced mmap dumps. Instead avi's trace from #25 has the mmaps dump but the GPU process shows *only* 1gb of virtual mem.

> The GPU process is leaking ~15000 non-IOSurface textures
Is the GPU process tracking them (so it could have exposed them in its MDP) or does it leak and lose track of them? 


Perhaps unrelated, avi's trace is IMHO full of red flags. I feel I don't have enough hands and enough hair to handle this trace :/
Perhaps not all of them responsible for knocking out his machine, but IMHO lot of them are really worrying, in particular:

- I did count ~59 renderer processes, adding up (without counting any memory that we do not currently track, as discussed above) to a reported total of:
private dirty: 9.4 GB, total resident: 16.5 GB. O_o
Why are keeping 59 processes alive? why they are not being discarded. Please don't tell me that the answer here is: "avi@ likes to keep lot of tabs open, he choses to be an outlier w.r.t. number of median tabs open, so it's WAI that we use all those GB". 

- The Google+ tab itself is using 2.9 GB of memory. 1GB in oilpan, 1.2 GB in v8, 1 GB in remaining malloc (sigh), 300 MB in the compositor, 280 MB from partition alloc. Did we reach out to G+? Either there is something wrong there, or at very least I hope that avi@ is having an extreme amount of social fun, given the amount of memory his machine is spending on that tab alone.
Also, should we (Chrome) really allow a tab to sit there eating 2.9 GB?  (+ojan, this seems an excellent case for the intervention / new UX we were discussing)

- The monorail tab (pid 44165) is using 1.5 GB. 884 in oilpan itself, 264 MB in v8, 667 MB in malloc (sigh again). Same as above, why it takes 1.5 GB to show a bug. It doesn't seem to me that monorail has such an insanely complex DOM. IMHO either we (chrome) or them (monorail) are having a Woops moment here.

- Ditto for the other monorail tabs (pid 71313, 71396) which are using *only* 663 MB and 483 MB respectively.

- I could make similar considerations for the other tabs, but the same story. It takes 295 MB (pid 71405) for a tab showing "git blame" from gitiles, which is essentially a text file. Really, I can run a full modern operating system in 295 MB.


Cc: fmea...@chromium.org

Comment 55 by w...@chromium.org, Apr 19 2017

FWIW I notice that  issue 707544  (HTMLCollection leak in BlinkGC) didn't land on M58 until 58.0.3029.72, whereas Avi reports this from 58.0.3029.14.  Do we have any traces from .72+?
I looked at the trace in comment #25.

Something is clearly wrong here, but I think  issue 707544  (combined with some kind of extension for monorail) can explain the high memory usage.

We should first see how wez's fix changes things.

FYI
The Google+ process (pid 43867) also contains WhatsApp, iCloud, Inbox, Facebook.
pid 44165 is Gmail not monorail.
pid 71313, 71396, 71405 are mostly monorail

Comment 57 by a...@chromium.org, Apr 24 2017

About comment 56:

That process that contains a zillion webpages is my personal profile. Apparently, I hit the process limit with my work profile, and therefore every site loaded in my personal profile ends up in the same process.

About comment 53:

Thank you for your concern about the general amount of memory used by my Chrome, however, as a user, your shock is surprising to me. The amount of memory used by tabs on my Chrome is insane all the time and has been for years. While I don't want to take credit for getting people to notice, if you are surprised enough for action to be taken I'll accept it.

FYI I'm back to work, and the memory map was so screwed up by running Chrome that even not while running Chrome my Mac ground to a halt and needed a reboot. I'm running Chrome 59.0.3071.15, and will be taking memory infra traces to see if the issue is really fixed.
Labels: Needs-Investigation

Comment 59 by a...@chromium.org, Apr 28 2017

I just experienced this again with my personal laptop. I woke up from sleep, the wired memory use went insane, and it locked up until I force quit it. (I'm personally convinced that the wake from sleep problem and this one are related.)

I haven't been able to memory infra dump this yet on my work machine as the version of Chrome that it's running is buggy, but certainly other people have experienced this.

Erik, you put in the instrumentation. Have you have also tried tracing dumps on your long-running instances of Chrome?
Screen Shot 2017-04-28 at 10.24.49 AM.png
132 KB View Download

Comment 60 Deleted

Comment 61 by a...@chromium.org, Apr 28 2017

(Deleted a comment of mine that was not appropriate or helpful.)

FYI my laptop's GPU is Intel HD Graphics 4000, and my work Mac's GPU (I believe) is a Radeon HD 5770.
avi@: When you get a chance, can you try following these instructions on your laptop:
https://bugs.chromium.org/p/chromium/issues/detail?id=713854#c39

[Hopefully it's a single GPU machine]

My work mac is also a ATI Radeon HD 5770 1024 MB, and I've confirmed that it doesn't have the same driver bug. When I've taken traces locally, I haven't been able to see any examples of locked_memory != 0.0MB, which is a shame. This is very likely a leak of wired memory in the kernel, which will be much harder to track down...

Comment 63 by a...@chromium.org, Apr 29 2017

vmmap uploaded.

It's a MacBookPro10,2, which is afaik single-gpu.
openglvmmap.txt
132 KB View Download
Looks like you're not having the same GPU issue [although I suspect there are others based on UMA stats]. 

I currently think that the most likely possibility is:
  1) More GPU memory leaks, of different varieties. This would be the simplest explanation for crazy wired memory stats.

The easiest way to verify would be: If you run into this problem, try to kill some non-chrome processes and then run "sudo vmmap -v -interleaved" on the GPU process. Then try killing the GPU process and see how wired memory reacts.

Looking at c#26, Killing 71388 dropped wired memory by 4GB!! Looking at the trace in c#25, 71388 is a renderer responsible for multiple tabs [mostly crbug/rietveld, it looks like]. More interestingly, the total size of the virtual address space is 1.5GB. Something very suspicious is happening.
I tried killing a bunch of tabs on my work machine.

I noticed the following: Killing tabs typically drops wired memory ~40MB or so. I killed one of my windows w/ 35 tabs. Wired memory dropped by about 1GB. I then session restored the window, and cycled through all tabs to get them to load. Wired memory hardly budged at all.

This was done on a machine with no memory pressure [<100MB swap + compressed usage].

Comment 66 by a...@chromium.org, Apr 29 2017

Comment 65 sounds very similar to what I'm seeing.
Blocking: 717919
More observations:

I used the following 1-liner to observe changes to wired memory:
"""
TEMP=~/test14; P=221; sudo vmmap -v -interleaved $P > $TEMP; vm_stat >> $TEMP; sudo kill -9 $P; vm_stat >> $TEMP; sleep 5; vm_stat >> $TEMP
"""

The sleep 5 is necessary - the second vm_stat records minimal change, but the third vm_stat shows a much larger change [frequently ~200MB]. This once again suggests that something out-of-process is holding on to the wired memory, and is releasing it after the process is killed. I'm trying to figure out what that process is. [If it's a non-kernel process, killing it should show a massive reclamation of wired memory].
This script kill all processes that belong to the current user that are not Chrome, or required to run the script itself. I accidentally killed Chrome while developing this script, so can no longer test immediately. I intend to try it out after rebuilding some leaked wired memory.
kill_all_processes.sh
833 bytes View Download
Although...if another process *were* responsible, I'd expect to see elevated memory stats in activity monitor. We're not even seeing that for the kernel, so still feel like I'm digging around the dark.

Attaching a bunch of processes killed via the command in c#68
test14
691 KB View Download
test13
879 KB View Download
test12
343 KB View Download
test11
3.4 MB View Download
test10
307 KB View Download

Comment 71 by a...@chromium.org, May 10 2017

Do we have a contact at Apple to ask for advice on debugging here?
Cc: pinkerton@chromium.org

Comment 73 by a...@chromium.org, May 10 2017

The result of running c69's script on my Mac.
This updated script is like the previous one, but also excludes fish, the shell that avi uses.
kill_all_processes2.sh
848 bytes View Download

Comment 75 by a...@chromium.org, May 17 2017

My wired memory made it up to about 10GB, so I ran the new kill_all_processes2 script. It successfully killed lots of stuff, leaving Chrome alive. I relaunched Activity Monitor. Wired memory is still about 10GB.

Comment 76 by a...@chromium.org, May 30 2017

OK; it's been a while, but my wired memory is up to 16GB.

Here is vm_stat:

Mach Virtual Memory Statistics: (page size of 4096 bytes)
Pages free:                               12763.
Pages active:                           1980331.
Pages inactive:                         1836186.
Pages speculative:                       101809.
Pages throttled:                              0.
Pages wired down:                       4392090.
Pages purgeable:                           6534.
"Translation faults":                1767158508.
Pages copy-on-write:                   44926777.
Pages zero filled:                   1216236202.
Pages reactivated:                     23099218.
Pages purged:                            396098.
File-backed pages:                      1117941.
Anonymous pages:                        2800385.
Pages stored in compressor:             5078631.
Pages occupied by compressor:             60526.
Decompressions:                        26801180.
Compressions:                          35590160.
Pageins:                              137578539.
Pageouts:                                371203.
Swapins:                               18702104.
Swapouts:                              21342791.

And here are some screenshots and the trace.

In previous times, the trace wasn't run for long enough. If that's the case here let me know and I'll re-run it.
Screen Shot 2017-05-30 at 1.41.12 PM.png
346 KB View Download
Screen Shot 2017-05-30 at 1.41.21 PM.png
398 KB View Download
trace_20170530.json.gz
35.3 KB Download

Comment 77 by a...@chromium.org, May 30 2017

Re the compressed memory figure.

~/Downloads> vm_stat
Mach Virtual Memory Statistics: (page size of 4096 bytes)
Pages free:                               84023.
Pages active:                           1924719.
Pages inactive:                         1692772.
Pages speculative:                        32144.
Pages throttled:                              0.
Pages wired down:                       4486812.
Pages purgeable:                           4477.
"Translation faults":                1807132791.
Pages copy-on-write:                   45306421.
Pages zero filled:                   1242842370.
Pages reactivated:                     26261401.
Pages purged:                            397514.
File-backed pages:                       681060.
Anonymous pages:                        2968575.
Pages stored in compressor:             5037385.
Pages occupied by compressor:            161312.
Decompressions:                        30151222.
Compressions:                          39645480.
Pageins:                              140322792.
Pageouts:                                416726.
Swapins:                               25988785.
Swapouts:                              28605504.
~/Downloads> ps aux | grep Chrome | grep type=gpu
avi               2300   0.0  0.2  3600048  63336   ??  S    Fri11AM  11:40.40 /Applications/Google Chrome.app/Contents/Versions/60.0.3107.4/Google Chrome Helper.app/Contents/MacOS/Google Chrome Helper --type=gpu-process --field-trial-handle=5234089567716675229,1291062884208880532,131072 --metrics-client-id=B1318A64-865C-8727-BA0D-A46628E39BD2 --supports-dual-gpus=false --gpu-driver-bug-workarounds=1,10,24,27,37,50,66,68,69,71,77,79,87,88,92,94,95 --disable-gl-extensions=GL_KHR_blend_equation_advanced GL_KHR_blend_equation_advanced_coherent --gpu-vendor-id=0x1002 --gpu-device-id=0x68b8 --gpu-driver-vendor --gpu-driver-version --gpu-driver-date --gpu-active-vendor-id=0x1002 --gpu-active-device-id=0x68b8 --metrics-client-id=B1318A64-865C-8727-BA0D-A46628E39BD2 --service-request-channel-token=BB11C1E34D41FB139C4CE584F5D36F4B
~/Downloads> kill -9 2300
~/Downloads> vm_stat
Mach Virtual Memory Statistics: (page size of 4096 bytes)
Pages free:                               37905.
Pages active:                           1806055.
Pages inactive:                         1799377.
Pages speculative:                         2136.
Pages throttled:                              0.
Pages wired down:                       4488650.
Pages purgeable:                           4508.
"Translation faults":                1807429230.
Pages copy-on-write:                   45320977.
Pages zero filled:                   1243073171.
Pages reactivated:                     26262214.
Pages purged:                            397518.
File-backed pages:                       622663.
Anonymous pages:                        2984905.
Pages stored in compressor:             4970411.
Pages occupied by compressor:            247959.
Decompressions:                        30160041.
Compressions:                          39645481.
Pageins:                              140323199.
Pageouts:                                417044.
Swapins:                               26113390.
Swapouts:                              28633514.
~/Downloads> 

Comment 78 by a...@chromium.org, May 30 2017

Screen Shot 2017-05-30 at 2.23.32 PM.png
172 KB View Download

Comment 79 by a...@chromium.org, May 30 2017

2311kill.txt
949 KB View Download
2342kill.txt
632 KB View Download
2295browserprocessmap.txt
1.0 MB View Download

Comment 80 by a...@chromium.org, May 30 2017

BTW, after the kill of 2311, the wired memory dropped from 17.18G to 16.95G. After the kill of 2342, it dropped to 16.77G.
Thanks! Starting to get some ideas...

1) First, let's try to grab a memory-infra trace. The last one didn't run for long enough and has no data.

2) Let's upload the data from chrome://histograms.

3) The attached script kills all renderers, and grabs vm_stat and vmmap of each before doing so.

Theories:
Based on the two vmmaps attached in c#79, and observations from running the script on my machine...I wonder if we're loading Font files incorrectly and causing them to count as wired memory, and to not be de-duped between Renderers...
kill_all_renderers.sh
1.0 KB View Download

Comment 82 by a...@chromium.org, May 30 2017

Here are the first two. I let the memory grab run for 5 minutes; hope that's enough.
trace_20170530_2.json.gz
68.3 KB Download
histograms.txt
3.6 MB View Download

Comment 83 by a...@chromium.org, May 30 2017

Here is an archive of all the test_* files.

My wired memory dropped from 15G to about 2.3G.
Archive.zip
2.4 MB Download
Looking at c#83, the system releases ~ 3.7M wired pages over the course of the script. 2.4M are released after killing the second renderer [93071]. Unfortunately, the trace doesn't seem to be working correctly [no memory-infra dumps], so we don't know what this renderer is doing. Looking at the two renderers that were killed: 2337 and 93071 up to that point -  they both have virtual memory sizes of 1.9GB. And looking at chrome://histograms, no renderer, browser, or GPU process that a footprint over 2GB. So somehow, renderer 93071 was holding on to ~10GB of wired pages. The most likely explanation is that the renderer is leaking a massive number of kernel-side resources. 

This also explains why my newly added wired-memory metrics show no wired memory in use in our processes. [There are very few code paths that trigger wired memory in user-space in our code, and they are almost never hit]. 

Next idea to try: Check the number of Mach ports each process has open and see if there is a correlation between massive number of Mach ports, and amount of wired memory freed.
All renderers that were holding more than 100k wired pages:

pid 2351, 456K pages
pid 2356, 151K pages
pid 96629: 151K pages
pid 93074: 125K pages
pid 93071: 2435K pages

Weirdly enough...none of these pids show up in the Chrome task manager screen shot in c#76! Are there renderers that we intentionally don't show in the task manager?

I confirmed that the other pids show up in the Archive in c#83.
Oh. I bet avi@ sorted the screenshot in c#76 by most Memory, but didn't include all renderers [don't see any extensions]. It's very suspicious that the 5 problem processes aren't on the list of highest memory consumers. :)
Process 93071 was started 2 hours before the vmmap was taken, which means it wasn't around back when the screenshot in c#76 was taken. This suggests that it's unlikely to be the source of 10GB of wired memory leak [unless the leak can magically teleport between processes!]. The only other process that has been killed up to that point is  2337 - which hosts a Google Doc process. This process has been around for 4 days, and is a much more likely candidate. Furthermore, the process is spinning at 106% CPU, which is kind of amazing and shocking at the same time. 

This means that we need to wait several seconds after killing processes for vm_stat to give an accurate result.

Comment 88 by a...@chromium.org, May 31 2017

Correct. The screenshot was sorted by memory, but included maybe a third of the renderers. Since it doesn't correlate with memory usage, next time I'll be sure to grab all of the renderer lines in the list.
kill_all_renderers.sh
1.0 KB View Download
avi@ - I keep meaning to check in with you on this bug. Have you experienced it recently? I'm guessing maybe not?

Comment 91 by a...@chromium.org, Jun 22 2017

This is still very much a problem. I was out with personal time last week, and when I returned my wired usage was 16GB. I ran the script but had forgotten to grab a render process list, so I trashed the files. Chrome has creeped back up to 6GB wired, which is 4G wired from when it started. Erik, if you want, I can do the kill-all-renderers thing again.
OK, good to know. I think it would be helpful if we could narrow the problem down to a particular site. erikchen@ looked at, I think, your last data upload and concluded it was a Docs site.
Thanks for following up, can you run the script again at 10GB? I updated a new script in c#89 with a built-in 5s sleep.

Comment 94 by a...@chromium.org, Jun 22 2017

Will do. It will likely be a few more days before I hit 10G.

Comment 95 by a...@chromium.org, Jun 29 2017

Sorry for the delay; I made the mistake of visiting an abusive website with my main Chrome. Ouch, forced to restart it.

But I hit 10G today of wired memory, so I took pictures of my task manager and ran the script. Attached are the results.

It dropped to 2.5G wired, and it was a sudden plunge, like it was a particular renderer that was doing something that really had an impact.
Archive.zip
3.1 MB Download

Comment 96 by a...@chromium.org, Jun 29 2017

Looking at the log, I see render processes with 1-5000 wired pages released when they were killed, and one render process, pid 12975, that released 1,706,048 pages (6.8G) when it was killed.

pid 12975 held a Rietvield review page, and... a GOMA proxy status page.

That kinda makes sense. My GOMA proxy page is the most changing of any page in my Chrome. If there's a leak in layout or rendering, if there's a weird wired memory leak with fonts or some weird IOSurface edge case in compositing, it would make sense that it would hit there, with the sheer amount of churning that page does when I compile.

When I look at the vmmap file from that render process, it doesn't appear any different than the vmmap files from other processes, so I'm doubting that the wired memory is held in-process. It's probably in some other process and associated with it.
avi@: Thanks for following up - agree with your assessment. Can you try opening 20 tabs of goma status, and then running some compiles?

Comment 98 by a...@chromium.org, Jun 29 2017

I did a 40,000 file clean build with 20 GOMA tabs open, and the wired memory worked its way up from about 3.5G to 4.5G.

I wonder if we can build a repro based on the GOMA proxy page's code.

Comment 99 by a...@chromium.org, Jun 29 2017

Actually, I just killed all the random renderer processes that the GOMA proxy pages ended up in, and I'm at 2.9G, so it grew from 2.9G to 4.5G.

We definitely should work on getting a repro from the GOMA page.
I have a laptop where I see this issue to a lesser extent. I opened 4 windows with GOMA status page. I then did several full compiles of the chrome repository. Afterwards, I killed the renderers 1 at a time. Each time, wired memory dropped by about 200M.

But killing the renderers also frees all the GPU resources associate with the foreground renderer...so this isn't necessarily a fair comparison. I navigated each of those renderers to google.com wired memory increased by ~200M total.

Comment 101 by a...@chromium.org, Jun 29 2017

Here is an attempt at repro-ing but it doesn't seem to repro very well, if at all.

I don't know what's happening on your laptop, but you can read your script's result as well as I can. Something is up here.

I'll ask again: can we get some support from Apple here? This is something we have to track down.
700928repro.html
4.2 KB View Download
re: Getting help - you're welcome to file a radar. Historically, I haven't seen much help from Apple without repro steps.

I've attached two scripts that attempt to isolate the network behavior of the goma status page. To run them, launch browser with "--disable-web-security" and "--user-data-dir", e.g.:

/Applications/Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary --disable-web-security --user-data-dir=/tmp/ah982

goma_test2.html tries to faithfully mimic the goma script, but with no delay between XHR requests. 
goma_test.html leaks 100000 XHR objects. 

I have not yet managed to repro the issue on my machine [with any degree of certainty].
goma_test2.html
1005 bytes View Download
goma_test.html
1.1 KB View Download
Still trying and failing to repro. Starting to doubt my sanity.

avi@: Can you turn on the #ports column in Activity Monitor? The next time this happens, that will be a good sanity check.

avi@, in #98, did you have them in separate windows or the same window? 

I'm running the following test overnight: 1 backgrounded goma tab, 1 foregrounded goma tab. 20 full compiles of all targets in the chrome repository. 


Comment 104 by a...@chromium.org, Jun 30 2017

Re #ports in Activity Monitor, ACK will do.

Re c98, I had 19 tabs in one window but the one original pinned GOMA tab in a different window where I usually keep it.
Cc: dskiba@chromium.org
Over the course of 20 full compiles, number of wired pages grew by 108K

Killing foreground goma page: Released 88K wired pages
Killing background goma opage: Released 102K wired pages

Before killing these tabs, I took a memory dump. I looked for identical allocations [by stacktrace] with > 1000 entries. Nothing interesting stood out. 

I had 20 tabs running the html demo in c#101 [19 of them backgrounded]. Killing all the renderers released 50K wired pages.
Next test. Open 40 tabs to localhost:8088. [Observe that wired memory grows by about 1k per tab].

Run a full compile of the chrome repository. Notice that wired memory grows by 360K pages. Kill all 40 tabs. Wired memory drops by 360K pages.
Next test results:
I modified goma_test2.html to have a 1-second delay between requests [just like the goma status page]. Ran the same test with Chrome + new profile, and Chrome + standard profile - no repro.

I then tried running localhost:8088 [40 tabs] with a new profile [still has some corp extensions]. Saw increase of 400K wired pages over a full chrome compile. Killing the foreground tab process had minimal effect. Killing the remaining 39 tabs reduced wired pages by 500K. 

Repeated this procedure with 40 tabs, new profile, and --disable-extensions. Same result - increase by ~400K, decrease by ~450K.
> Saw increase of 400K wired pages over a full chrome compile.

To be clear (for me), are you saying 400,000 wired pages (so total memory of 400,000 * 8k per page, or whatever the page size is on the Mac), or 400Kb of wired pages?
400K wired pages, so 400K * 4K = 1.6GB of wired memory.


Summary: JSON.parse() leaks wired memory (was: Chrome leaks wired memory)
I have a minimized repro. I've confirmed that it repros on two machines.

Repro instructions:

1) Launch Chrome with "--disable-web-security" and a custom "--user-data-dir". I also use "--disable-extensions" just to reduce potential sources of confusion. e.g.

"""
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --disable-web-security --user-data-dir=/tmp/982h3 --disable-extensions
"""

2) Unzip goma_test_3.zip. Update the "url" var in goma_test3.html to point to the extracted location of data.json.

3) Navigate to goma_test3.html.
4) Use """vm_stat | grep wired""" to observe the leak.
5) [optional] Open the html page in 40 tabs to observe the leak more quickly. [e.g. 100MB/s]. The fastest way to do this:
  "cmd + L" to select the omnibox
  hold down "cmd + return" to open 10 tabs per second.

Observations:
* The wired memory that is leaked is *not* reflected in vmmap of the renderer or browser process. It only lives in the kernel.

* Navigating the renderer to about:blank and performing a GC does not cause the wired memory to be reclaimed. Only killing the renderer seems to accomplish this.
goma_test_3.zip
101 KB Download

Comment 112 by a...@chromium.org, Jun 30 2017

FYI, for those watching, the "--disable-web-security" is so that the test .html page can load from the filesystem. This wired memory leak is quite observable on the real web without that flag.
The best "real world" repro we've found right now is the goma status page. To repro:

1) Make sure machine is running goma.
2) Open 40 goma status pages.
3) Build all targets in the Chrome repository.
4) Observe massive spike in wired memory.

Btw, the repro in c#111 does the following:
Perform a *single* XMLHttpRequest to fetch a 50MB json string.
Repeatedly perform JSON.parse() on this string.
Components: Blink>JavaScript
I used a single renderer and some quick logging to determine the impact of the leak:

We leak roughly 160KB of wired memory each time JSON.parse() is called on a 5.8MB string.

We leak roughly 30KB of wired memory each time JSON.parse() is called on a 767KB string.

We leak roughly 4KB of wired memory each time JSON.parse() is called on a 102KB string.
Cc: danno@chromium.org hpayer@chromium.org u...@chromium.org
Owner: ----
Status: Untriaged (was: Assigned)
v8 team: Please help investigate further.


This v8 patch is intended to be patched onto commit: bf37b4eb1f3ff98ce3340c443998a9c15c2f6315.

This patch makes a couple of changes:
  * Makes VirtualMemory::CommitRegion, VirtualMemory::UncommitRegion, VirtualMemory::ReleasePartialRegion, and OS::Free into no-ops.
  * All allocation/deallocations are now done by ReserveRegionInternal/VirtualMemory::ReleaseRegion respectively
  * Allocation/deallocations can easily switch between malloc and mmap.

Observations: 

1) Switching allocations to use malloc makes the leak fully go away. 5000 calls of JSON.parse on a 767KB string has no effect on wired pages [modulo noise].
2) Switching allocations to use mmap makes the leak appear. 1000 calls of JSON.parse on a 767KB string leaks ~10 wired pages [40KB] per call of JSON.parse.
3) Switching the mmap implementation to use NULL in place of OS::GetRandomMmapAddr() reduces wired page leak to ~2 pages [8KB] per call of JSON.parse.

4) I tried playing with other mmap flags [e.g. MAP_NOCACHE, VM_FLAGS_PURGEABLE] but they didn't have any affect on the leak.
5) I originally thought that the issue is related to calling munmap() multiple times on different subregions of an mmaped() region, but I wasn't able to prove this in a demo C app or v8.

Aside: I don't understand the implementation of commit/uncommit. I'm not sure why we're calling mmap instead of mprotect.
v8.diff
5.6 KB Download
There was a slight error in the implementation of v8.diff in c#115. The saved size in VirtualMemory::VirtualMemory was incorrectly using the modified size. I've attached a new patch that fixes this bug, has other clean ups, and tests more configurations.

Allocator choices:
  * malloc
  * mach_vm_allocate
  * mmap

reuse_allocated_regions:
  * when true, regions are never deallocated. Instead they're place in a container and reused.

use_fixed_region:
  * when true, passes OS::GetRandomMmapAddr() as a hint to mach_vm_allocate and mmap when allocating.

Observations:
1) malloc never leaks wired memory. It's internal implementation uses a cache, so that might be it.
2) Wired memory does not leak over time if reuse_allocated_regions = true [with any allocator]. This is expected, and is a sanity check.
3) With both vm_allocate and mmap, the wired memory leak occurs *if and only if* use_fixed_region = true.

v8_2.diff
8.0 KB Download
OS::GetRandomMmapAddr() returns a random address between 0 and 0x3ffffffff000. I've attached a vmmap of the renderer's address space with use_fixed_region = true. The vm_allocate regions are sprayed across that entire virtual address space. This causes the page table to bloat [explaining the increase in wired memory that is only reclaimed when the process is killed].

Changing the implementation of OS::GetRandomMmapAddr to return 0x0000fffff000 instead of 0x3ffffffff000 fixes the problem using my patch. Testing now with a clean v8 tree.
err, typo, changing OS::GetRandomMmapAddr to use 0x0000fffff000 instead of 0x3ffffffff000 as a bitwise AND fixes the issue.
Labels: -Needs-Investigation ReleaseBlock-Stable M-60
Owner: erikc...@chromium.org
Status: Started (was: Untriaged)
CL: https://chromium-review.googlesource.com/c/557958/

This bug was introduced in 2011 [ http://codereview.chromium.org/8115014 ] to improve ASLR: https://bugs.chromium.org/p/v8/issues/detail?id=1749. 

I've filed a v8 bug to investigate if this causes an issue on other platforms:
https://bugs.chromium.org/p/v8/issues/detail?id=6555
Thank you, Erik. Great investigation!

Do you know if this behavior on Darwin (not freeing page table entries) is WAI or a bug?

As I replied on the CL, I'll prepare a CL that randomizes base address of each space in the heap. This should fix this issue without regressing security.


CL that implements the proposal from comment 121: https://chromium-review.googlesource.com/c/558876/
It's not clear if this is WAI from the perspective of the Darwin Kernel. I filed radar 33162362.
Copying info from radar.

"""
Steps to Reproduce:
1. Compile the test program. e.g. """clang++ wired_memory_leak.cc"""
2. Run """vm_stat | grep wired""" to count the number of wired pages on the system.
3. Run the program and wait for it to sleep. e.g. """./a.out"""
4. Run """vm_stat | grep wired""" to count the number of wired pages on the system.
5. Run """sudo vmmap -v -interleaved <pid>""" to count all virtual memory regions in the test application.

Expected Results:
The number of wired pages at step (4) should be close to the number of wired pages at step (2).

Observed Results:
The number of wired pages at step (4) increases by 500,000, the number of allocated [and deallocated regions].
"""
wired_memory_leak.cc
1.2 KB View Download
Here's a better example that doesn't use rand() that shows that PDEs are leaked. This means that for every 4GB region we allow v8 to randomly allocate in, we will eventually leak ~1024 wired pages [4MB of memory].
wired_memory_leak.cc
1.4 KB View Download

Comment 126 by a...@chromium.org, Jul 10 2017

Re the page map leak, out of curiosity, can you share the link to the kernel source file that implements it?
I just got into a similar weird state.
MBP Pro retina, 15 Inch, Mid 2014. I got the popup "Your system has run out of application memory" and even by killing one of the two chrome (Canary and Chrome) the popup stays there.
If I open Activity Monitor it says that:f
- there is low memory pressure (green, ~30% filled)
- 4.78 GB of wired memory
- ~5 GB of non-used memory (still, it calims I'm out of memory)

in case it's useful I ran vm_map -interleaved -v over the all processes on my machine + grabbed `ps aux` output.

One thing that bumps to my attention is that it took >2 minutes to generate the vmmap_257.txt (257 is uplink_proxy). Which smells a bit suspicious. It has a bazilion 128MB virtual allocations.  I tried to kill it though but the wired memory reported by task manager didn't go down.

Attachments:
https://drive.google.com/open?id=0B1oJWfrTOXwxYnJ3N0s3ams3V28
c#127 is  Issue 713854 .
Project Member

Comment 129 by bugdroid1@chromium.org, Jul 14 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/0640cbf378bb569ce99a45095acd0dec2518720d

commit 0640cbf378bb569ce99a45095acd0dec2518720d
Author: Ulan Degenbaev <ulan@chromium.org>
Date: Fri Jul 14 07:15:40 2017

[heap] Rework ASLR for base::Platform::VirtualMemory

Currently every VirtualMemory allocation on 64-bit systems
uses a random 46-bit address hint for ASLR.

This leads to wired page leak on MacOS discovered by Erik Chen (see
 crbug.com/700928  and https://chromium-review.googlesource.com/c/557958/):
"The Darwin kernel [as of macOS 10.12.5] does not clean up page directory
entries [PDE] created from mmap or mach_vm_allocate, even after
the region is destroyed. Using a virtual address space that is too large
causes a leak of about 1 wired [can never be paged out] page per call to
mmap(). The page is only reclaimed when the process is killed."

This patch changes VirtualMemory to accept the hint parameter explicitly.

On MacOS the hints are confined to 4GB contiguous region. Algorithm:
- On startup, set heap.mmap_region_base_ to a random address.
- For each mmap use heap.mmap_region_base_ + (random_offset % (4*GB)).

BUG= chromium:700928 

Cq-Include-Trybots: master.tryserver.chromium.linux:linux_chromium_rel_ng
Change-Id: I2ae6a024e02fbe63f940105d7920b57c19abacc6
Reviewed-on: https://chromium-review.googlesource.com/558876
Commit-Queue: Ulan Degenbaev <ulan@chromium.org>
Reviewed-by: Michael Lippautz <mlippautz@chromium.org>
Cr-Commit-Position: refs/heads/master@{#46656}
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/api.cc
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/base/platform/platform-aix.cc
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/base/platform/platform-cygwin.cc
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/base/platform/platform-freebsd.cc
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/base/platform/platform-fuchsia.cc
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/base/platform/platform-linux.cc
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/base/platform/platform-macos.cc
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/base/platform/platform-openbsd.cc
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/base/platform/platform-posix.cc
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/base/platform/platform-qnx.cc
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/base/platform/platform-solaris.cc
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/base/platform/platform-win32.cc
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/base/platform/platform.h
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/d8.cc
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/heap/heap.cc
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/heap/heap.h
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/heap/sequential-marking-deque.cc
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/heap/spaces.cc
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/heap/spaces.h
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/heap/store-buffer.cc
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/src/x64/codegen-x64.cc
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/test/cctest/test-platform-linux.cc
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/test/cctest/test-platform-win32.cc
[modify] https://crrev.com/0640cbf378bb569ce99a45095acd0dec2518720d/test/unittests/heap/heap-unittest.cc

Labels: Merge-Request-60
Status: Fixed (was: Started)
Project Member

Comment 131 by sheriffbot@chromium.org, Jul 14 2017

Labels: -Merge-Request-60 Hotlist-Merge-Review Merge-Review-60
This bug requires manual review: We are only 10 days from stable.
Please contact the milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), josafat@(ChromeOS), bustamante@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-Review-60 Merge-Rejected-60
hi Erik - thanks for the fix! I'm a bit worried since we are only 10 days away from Stable, and this seems to be a fairly large change. And this issue has been around since March. My recommendation is to wait until M61. We are trying to reduce any risk for M60 and also stabilize the branch. I'm rejecting this merge for now, but please re-apply Merge-Request label if you think this should be included for M60 and if you think this is a safe merge. 
Labels: Merge-Request-60
Reapplying Merge-Request.

This issue has actually been around 6 years, but it's also a massive memory leak [1GB+] of kernel wired memory. Let's let the v8 team chime in about the potential-danger of the change?
Project Member

Comment 134 by sheriffbot@chromium.org, Jul 15 2017

Labels: -Merge-Request-60 Merge-Review-60
This bug requires manual review: We are only 9 days from stable.
Please contact the milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), josafat@(ChromeOS), bustamante@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 135 by u...@chromium.org, Jul 17 2017

The change should be safe. Canary with the change (61.0.3159.0) looks good so far.


Seems like if this is a bug users have lived with for six years that there's no rush to get it fixed in a branch going to stable in a handful of days. I don't think waiting six more weeks is the end of the world.

Labels: -Merge-Review-60
Great, thanks for more info. As much as I'd like to see this get fixed earlier, it'll be safer to have this go through full beta cycles for M61. Since we're only a week away from stable and this has been around for 6 years, my recommendation is to wait until M61.

Comment 138 by a...@chromium.org, Jul 18 2017

FYI this landed in v8 6.1.485, which rolled in at 61.0.3158.0. I'll verify once it hits dev channel.
Issue 707962 has been merged into this issue.
Showing comments 40 - 139 of 139 Older

Sign in to add a comment