New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 873313 link

Starred by 14 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 20
Cc:
Components:
EstimatedDays: ----
NextAction: 2018-08-20
OS: Windows , Mac , Fuchsia
Pri: 1
Type: Bug

Blocking:
issue 867468



Sign in to add a comment

Commit charge increasing over time.

Project Member Reported by sebmarchand@chromium.org, Aug 10

Issue description

I've realized that the commit charge is usually quite high on my machine and it seems to be increasing over time. When I've left yesterday it was at ~30GB and this morning it was at 60GB (I had 131 tabs opened: 113 discarded, 7 frozen, 11 loaded). Earlier this week I had the same issue but this time my commit charge was at 180GB, and at this point Windows started killing applications.

Before restarting Chrome I've killed a bunch of chrome.exe processes manually, I've noted that there's a single renderer process that caused a drop of 10GB in committed memory after being killed (that or the "Committed field in task manager is really slow to update, I was waiting 2-3 secs after killing each process), and killing the browser "freed" 20GB of commit space.

There's no single process that show with a really high commit value (see attached screenshot)

 
overcommit_commit_per_process.png
84.8 KB View Download
Owner: sebmarchand@chromium.org
Assigning to myself for now, I'll collect more data on this.
This might be related to issue 867468
More data point on this:

When I've left the office on Friday the commit charge was at ~30GB on my machine, it's now Monday morning and it's at 64.6GB, "only" 21.6GB is considered as "In Use" (see screenshot attached). 

I've exported some data from Resource Monitor, this lists the commit, WS and other memory related metrics from all the running processes, see https://docs.google.com/spreadsheets/d/1i8NIx1E9Z4YjUgHJcOElKyiKm35uds1UNXghTQ76Yks/edit?usp=sharing (Google only, I haven't sanity checked the list of processes)

It's interesting that the sum of the commit for all the processes is only at 22GB , there's 44 GB of commit charge that aren't accounted for here. I've also captured a memory snapshot of the browser process with VMMap and the commit seems higher than the one shown in Resource Monitor (1.4GB vs 860MB), so I'm not entirely sure that I can fully trust Resource Monitor... This VMMap trace is available here: https://drive.google.com/file/d/1Yi6PIeAVa46UExpTb_QT0ZT6OcrdQ9K_/view?usp=sharing
overcommit_task_manager.png
30.4 KB View Download
Status: Assigned (was: Untriaged)
More data:
- There's only 90 zombie processes running, this seems reasonable.
- Handle count for Chrome's browser process is 6600, this doesn't seem too high.
- Rammap tells me that I've 32GB of mapped files, this seems suspicious. I've attached a screenshot to this comment, and the rammap trace is available here: https://drive.google.com/file/d/1JHUg4V6B2r3ld9oAHVuvwV6i-DLfuwj2/view?usp=sharing

I've also attached a screenshot of the VMMap dump mentioned in the previous comment for convenience. 
vmmap_overcommit.png
68.4 KB View Download
rammap_overcommit.png
27.2 KB View Download
Correction, handle count for the browser is at 20000 (I was looking at the handle count for another instance of Chrome in the previous comment), and a lot of these handles are memory mapped files. Seems like a Mojo issue?
I just took a look over Seb's shoulder, and it looks like the browser now has about 20k handles, most of them to (anonymous) shared sections. I'm guessing we're leaking shmem handles somehow.
Out of the 17000 handles listed in the spreadsheet from comment 8, ~13000 are handles to a shared memory object owned by Chrome (Name contains "CrSharedMem")
Good find!

Aside 1: I think our accounting for shared memory is incorrect: https://bugs.chromium.org/p/chromium/issues/detail?id=872939 which is part of the problem.

Aside 2: There's a recent major refactor of shared memory, see:
https://bugs.chromium.org/p/chromium/issues/detail?id=795291

relevant changes in commit range:
https://chromium-review.googlesource.com/1127948
https://chromium-review.googlesource.com/985981
Here's a chrome://memory-internals memory dump.
trace_with_heap_dump.json.gz
223 KB Download
Cc: erikc...@chromium.org alexilin@chromium.org
I'm seeing very few shared memory regions in the memory dump, which suggests either:

1) The accounting is super broken and is straight-up missing shared memory regions.
2) We're dead-leaking shared memory regions.

I think (2) is more likely. +alexilin
I've just noticed that the process for my Gmail tab is also using ~23k handles, most of them are handles to a shared memory section. So this issue affect both the renderers and the browser process (at least, the GPU process handle count is relatively low)
Blocking: 867468
Marking this issue as a blocker for 867468 which is about a OOM spike. It's very likely that the OOM spike is caused by this issue.
Cc: roc...@chromium.org dcheng@chromium.org
Labels: -Pri-3 ReleaseBlock-Stable Pri-1
I think that this should be RBS, as this is blocking a RBS issue.

+dcheng@ and rockot@ per dcheng's request.
Project Member

Comment 16 by sheriffbot@chromium.org, Aug 14

This issue is marked as a release blocker with no milestone associated. Please add an appropriate milestone.

All release blocking issues should have milestones associated to it, so that the issue can tracked and the fixes can be pushed promptly.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: M-69
Cc: mattcary@chromium.org
It looks indeed like we may leak shared memory regions. I'm surprised though why we've started to see this only now. The new shared memory classes have been widely used since M68 and mojo uses them internally since then. It may be that the usage of the new API significantly increased recently.

Chrome doesn't count shared memory regions that aren't currently mapped into the address space. It could be that these regions still contribute to the commit charge. It explains why we would not see leaked regions in a memory dump.

sebmarchand@, could you check whether the number of handles decreases after reverting https://crrev.com/c/1127948?
Sure, I'll revert this in a local build and see if it fix/improve the issue.
I haven't been able to revert yet (there was some conflicts) so instead I've tried to manually bisect this by grabbing the first official build with this commit (69.0.3491.0) and the one preceding it (69.0.3489.0). Then I've started these 2 builds, each with a new user data dir and opened a Gmail tab in each of them.

After 10 min of usage the handle counts are:
- 69.0.3489.0: 314
- 69.0.3491.0: 522 , at least 220 of these handles contain "CrSharedMem" in their name.

So https://crrev.com/c/1127948, or something else in the 69.0.3489.0..69.0.3491.0 range seems to be causing the handle count to spike. 
This keeps increasing, I'm at 313 VS 573 now. Note that the handle count for the browser process seems stable, so this might not be the only issue (the handle count for my Canary is at ~20k, it has been running for 2 days).

I'll keep these 2 builds running overnight to see if it keeps increasing.
I found a repro for this, just start a build that has this commit, open 2 tabs (e.g. cnn.com and a NTP) and keep Ctrl+Tab pressed (to quickly cycle through these 2 tabs). After 1 min of cycling I had one tab using 8GB of commit space (it's hard to measure this because you don't get this value directly in process explorer, instead you just need to kill the tab and look at the drop in commit charge).
Cc: piman@chromium.org
Cc: -alexilin@chromium.org sebmarchand@chromium.org
Owner: alexilin@chromium.org
Reassigning to Alex. Lmk if I can help.
Status: Started (was: Assigned)
I think we found the cause of the leak. The fix is on the way: https://crrev.com/c/1175918/7
Project Member

Comment 26 by bugdroid1@chromium.org, Aug 15

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6544b8a5f6cd7a060f1293d9683b0d8f88ba1cd6

commit 6544b8a5f6cd7a060f1293d9683b0d8f88ba1cd6
Author: Alexandr Ilin <alexilin@chromium.org>
Date: Wed Aug 15 23:08:54 2018

base: Fix shared memory handle leak in IPC serialization

PlatformSharedMemoryRegion leaked handles after serializing into the IPC message
on Windows, Mac and Fuchsia. Serialization code incorrectly assumed that
HandleWin, MachPortMac and HandleFuchsia classes take ownership over the handle.

What really happens is that HandleFoo is passed to WriteParam(), which causes
ParamTraits<HandleFoo>::Write() to be invoked, before it returns, creating the
HandleAttachmentFoo, which will duplicate the handle.

By the time we return from WriteParam() the handle should have been duplicated
into the serialized message, so we can safely let the PSMR close the handle when
it goes out of scope.

This CL makes the PlatformSharedMemoryRegion serialization repeat the logic of
the base::SharedMemoryHandle serialization as if the OwnershipPassesToIPC() flag
is always set to true.

Bug:  873313 ,  873668 
Change-Id: I4e98ff9483a48970ebcb2fcf6b150f447af82794
Reviewed-on: https://chromium-review.googlesource.com/1175918
Commit-Queue: Wez <wez@chromium.org>
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Reviewed-by: Wez <wez@chromium.org>
Reviewed-by: Ken Rockot <rockot@chromium.org>
Cr-Commit-Position: refs/heads/master@{#583434}
[modify] https://crrev.com/6544b8a5f6cd7a060f1293d9683b0d8f88ba1cd6/ipc/ipc_message_utils.cc

Cc: gov...@chromium.org
Components: Internals>Core
Labels: OS-Android OS-Chrome OS-Fuchsia OS-Linux OS-Mac
Note that the diagnosed leak affected all platforms that use Chrome IPC.
Cc: w...@chromium.org benmason@chromium.org
NextAction: 2018-08-16
Pls update bug with canary result tomorrow and request a merge to M69 when it is safe to merge. Thank you.
Labels: -OS-Linux -OS-Android -OS-Chrome
Re c#27: the diagnosed leak didn't affect Linux, Android and Chrome OS.
The NextAction date has arrived: 2018-08-16
How is the change listed at #26 looking in canary? 
Looking at https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.name+LIKE+%27%5BOut+of+Memory%5D%25%27+AND+product_name%3D%27Chrome%27+AND+expanded_custom_data.ChromeCrashProto.channel%3D%27canary%27 it looks like the OOM rate is lower, but it might be because there's not so many users with this build yet, so we should probably wait to get more data. 

I haven't seen any crash that seems to be caused by Alex's CL and it clearly fix some issues so I think that it should be merged, but we can maybe wait for a few hours to get more crash reports?
NextAction: 2018-08-17
The change listed at #26 wasn't released in canary yet, according to https://storage.googleapis.com/chromium-find-releases-static/654.html#6544b8a5f6cd7a060f1293d9683b0d8f88ba1cd6. 

AFAIK we don't have a metric that directly shows the commit charge or the number of opened handles. We can expect that the number of OOM crashes will go down, though.

We don't have full-day data for Aug 16 so it's incorrect to compare it with previous days.
NextAction: 2018-08-20
Yeah, change is not in canary yet. Lets wait until Monday morning to get more canary coverage. Pls update the bug on Monday morning with canary result.
Re #34: Since there is no possibility of rolling this build out to stable, I would recommend that we roll a fresh Canary immediately, review numbers tomorrow, and merge then if they look no worse.

Note that the code that was patched is definitely incorrect - the patch restores the behaviour of the original implementation, which was correct - it's not a speculative fix. There is some small risk that calling-code has other bugs causing reliance on the broken, leaky behaviour, but given that this is how the old code worked, again we suspect not.
Project Member

Comment 36 by bugdroid1@chromium.org, Aug 16

Labels: merge-merged-3524
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9ba9f73180a591eba65b70fa708bdac3786cead9

commit 9ba9f73180a591eba65b70fa708bdac3786cead9
Author: Alexandr Ilin <alexilin@chromium.org>
Date: Thu Aug 16 17:35:10 2018

base: Fix shared memory handle leak in IPC serialization

PlatformSharedMemoryRegion leaked handles after serializing into the IPC message
on Windows, Mac and Fuchsia. Serialization code incorrectly assumed that
HandleWin, MachPortMac and HandleFuchsia classes take ownership over the handle.

What really happens is that HandleFoo is passed to WriteParam(), which causes
ParamTraits<HandleFoo>::Write() to be invoked, before it returns, creating the
HandleAttachmentFoo, which will duplicate the handle.

By the time we return from WriteParam() the handle should have been duplicated
into the serialized message, so we can safely let the PSMR close the handle when
it goes out of scope.

This CL makes the PlatformSharedMemoryRegion serialization repeat the logic of
the base::SharedMemoryHandle serialization as if the OwnershipPassesToIPC() flag
is always set to true.

TBR: abdulsyed
Bug:  873313 ,  873668 
Change-Id: I4e98ff9483a48970ebcb2fcf6b150f447af82794
Reviewed-on: https://chromium-review.googlesource.com/1175918
Commit-Queue: Wez <wez@chromium.org>
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Reviewed-by: Wez <wez@chromium.org>
Reviewed-by: Ken Rockot <rockot@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#583434}(cherry picked from commit 6544b8a5f6cd7a060f1293d9683b0d8f88ba1cd6)
Reviewed-on: https://chromium-review.googlesource.com/1178261
Cr-Commit-Position: refs/branch-heads/3524@{#5}
Cr-Branched-From: 4c984bc7664755d701c75b49b795678e164551e3-refs/heads/master@{#583420}
[modify] https://crrev.com/9ba9f73180a591eba65b70fa708bdac3786cead9/ipc/ipc_message_utils.cc

We have <24hrs of data at this point, but if we look at e.g. the blink::PageMemoryRegion::Allocate signature then crash rates look like:

70.0.3523.0 - 69
70.0.3524.0 - 38 (build was live for ~half the day, prior to .3 rev, subject to whether users restarted it)
70.0.3524.3 - 1 (the report shows 'system commit remaining' of >4GB, and cpu-architecture is x86 -> address-space exhaustion).
70.0.3525.0 - 0

So things look as-expected so far; suggest reviewing later today to confirm report #s in 3525.
The NextAction date has arrived: 2018-08-20
Project Member

Comment 40 by sheriffbot@chromium.org, Aug 20

Labels: -Merge-Request-69 Merge-Review-69 Hotlist-Merge-Review
This bug requires manual review: M69 has already been promoted to the beta branch, so this requires manual review
Please contact the milestone owner if you have questions.
Owners: amineer@(Android), kariahda@(iOS), cindyb@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-Review-69 Merge-Approved-69
Approving merge to M69 branch 3497 based on comment #37, #39 and per offline chat with wez@. Pls merge ASAP and mark bug as fixed if nothing else is pending. Thank you.
Taking care of the merge now.
Thank you wez@.
Project Member

Comment 44 by bugdroid1@chromium.org, Aug 20

Labels: -merge-approved-69 merge-merged-3497
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/579a09af56a7648c5380bc126e0c42c45cfa6cfd

commit 579a09af56a7648c5380bc126e0c42c45cfa6cfd
Author: Alexandr Ilin <alexilin@chromium.org>
Date: Mon Aug 20 17:53:48 2018

base: Fix shared memory handle leak in IPC serialization

PlatformSharedMemoryRegion leaked handles after serializing into the IPC message
on Windows, Mac and Fuchsia. Serialization code incorrectly assumed that
HandleWin, MachPortMac and HandleFuchsia classes take ownership over the handle.

What really happens is that HandleFoo is passed to WriteParam(), which causes
ParamTraits<HandleFoo>::Write() to be invoked, before it returns, creating the
HandleAttachmentFoo, which will duplicate the handle.

By the time we return from WriteParam() the handle should have been duplicated
into the serialized message, so we can safely let the PSMR close the handle when
it goes out of scope.

This CL makes the PlatformSharedMemoryRegion serialization repeat the logic of
the base::SharedMemoryHandle serialization as if the OwnershipPassesToIPC() flag
is always set to true.

TBR: govind
Bug:  873313 ,  873668 
Change-Id: I4e98ff9483a48970ebcb2fcf6b150f447af82794
Reviewed-on: https://chromium-review.googlesource.com/1175918
Commit-Queue: Wez <wez@chromium.org>
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Reviewed-by: Wez <wez@chromium.org>
Reviewed-by: Ken Rockot <rockot@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#583434}(cherry picked from commit 6544b8a5f6cd7a060f1293d9683b0d8f88ba1cd6)
Reviewed-on: https://chromium-review.googlesource.com/1181701
Cr-Commit-Position: refs/branch-heads/3497@{#717}
Cr-Branched-From: 271eaf50594eb818c9295dc78d364aea18c82ea8-refs/heads/master@{#576753}
[modify] https://crrev.com/579a09af56a7648c5380bc126e0c42c45cfa6cfd/ipc/ipc_message_utils.cc

Status: Fixed (was: Started)
Cc: haraken@chromium.org
 Issue 869871  has been merged into this issue.

Sign in to add a comment