Commit charge increasing over time. |
|||||||||||||||||||||
Issue descriptionI've realized that the commit charge is usually quite high on my machine and it seems to be increasing over time. When I've left yesterday it was at ~30GB and this morning it was at 60GB (I had 131 tabs opened: 113 discarded, 7 frozen, 11 loaded). Earlier this week I had the same issue but this time my commit charge was at 180GB, and at this point Windows started killing applications. Before restarting Chrome I've killed a bunch of chrome.exe processes manually, I've noted that there's a single renderer process that caused a drop of 10GB in committed memory after being killed (that or the "Committed field in task manager is really slow to update, I was waiting 2-3 secs after killing each process), and killing the browser "freed" 20GB of commit space. There's no single process that show with a really high commit value (see attached screenshot)
,
Aug 10
This might be related to issue 867468
,
Aug 11
,
Aug 13
More data point on this: When I've left the office on Friday the commit charge was at ~30GB on my machine, it's now Monday morning and it's at 64.6GB, "only" 21.6GB is considered as "In Use" (see screenshot attached). I've exported some data from Resource Monitor, this lists the commit, WS and other memory related metrics from all the running processes, see https://docs.google.com/spreadsheets/d/1i8NIx1E9Z4YjUgHJcOElKyiKm35uds1UNXghTQ76Yks/edit?usp=sharing (Google only, I haven't sanity checked the list of processes) It's interesting that the sum of the commit for all the processes is only at 22GB , there's 44 GB of commit charge that aren't accounted for here. I've also captured a memory snapshot of the browser process with VMMap and the commit seems higher than the one shown in Resource Monitor (1.4GB vs 860MB), so I'm not entirely sure that I can fully trust Resource Monitor... This VMMap trace is available here: https://drive.google.com/file/d/1Yi6PIeAVa46UExpTb_QT0ZT6OcrdQ9K_/view?usp=sharing
,
Aug 13
More data: - There's only 90 zombie processes running, this seems reasonable. - Handle count for Chrome's browser process is 6600, this doesn't seem too high. - Rammap tells me that I've 32GB of mapped files, this seems suspicious. I've attached a screenshot to this comment, and the rammap trace is available here: https://drive.google.com/file/d/1JHUg4V6B2r3ld9oAHVuvwV6i-DLfuwj2/view?usp=sharing I've also attached a screenshot of the VMMap dump mentioned in the previous comment for convenience.
,
Aug 13
Correction, handle count for the browser is at 20000 (I was looking at the handle count for another instance of Chrome in the previous comment), and a lot of these handles are memory mapped files. Seems like a Mojo issue?
,
Aug 13
I just took a look over Seb's shoulder, and it looks like the browser now has about 20k handles, most of them to (anonymous) shared sections. I'm guessing we're leaking shmem handles somehow.
,
Aug 13
Here's the list of handles for Chrome's browser process: https://docs.google.com/spreadsheets/d/10irhdjLTRJm0YlTXhQpuLkbbVaO_H69KPVXBoXJjZXQ/edit?usp=sharing
,
Aug 13
Out of the 17000 handles listed in the spreadsheet from comment 8, ~13000 are handles to a shared memory object owned by Chrome (Name contains "CrSharedMem")
,
Aug 13
Good find! Aside 1: I think our accounting for shared memory is incorrect: https://bugs.chromium.org/p/chromium/issues/detail?id=872939 which is part of the problem. Aside 2: There's a recent major refactor of shared memory, see: https://bugs.chromium.org/p/chromium/issues/detail?id=795291 relevant changes in commit range: https://chromium-review.googlesource.com/1127948 https://chromium-review.googlesource.com/985981
,
Aug 13
Here's a chrome://memory-internals memory dump.
,
Aug 13
I'm seeing very few shared memory regions in the memory dump, which suggests either: 1) The accounting is super broken and is straight-up missing shared memory regions. 2) We're dead-leaking shared memory regions. I think (2) is more likely. +alexilin
,
Aug 13
I've just noticed that the process for my Gmail tab is also using ~23k handles, most of them are handles to a shared memory section. So this issue affect both the renderers and the browser process (at least, the GPU process handle count is relatively low)
,
Aug 13
Marking this issue as a blocker for 867468 which is about a OOM spike. It's very likely that the OOM spike is caused by this issue.
,
Aug 13
I think that this should be RBS, as this is blocking a RBS issue. +dcheng@ and rockot@ per dcheng's request.
,
Aug 14
This issue is marked as a release blocker with no milestone associated. Please add an appropriate milestone. All release blocking issues should have milestones associated to it, so that the issue can tracked and the fixes can be pushed promptly. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Aug 14
,
Aug 14
It looks indeed like we may leak shared memory regions. I'm surprised though why we've started to see this only now. The new shared memory classes have been widely used since M68 and mojo uses them internally since then. It may be that the usage of the new API significantly increased recently. Chrome doesn't count shared memory regions that aren't currently mapped into the address space. It could be that these regions still contribute to the commit charge. It explains why we would not see leaked regions in a memory dump. sebmarchand@, could you check whether the number of handles decreases after reverting https://crrev.com/c/1127948?
,
Aug 14
Sure, I'll revert this in a local build and see if it fix/improve the issue.
,
Aug 14
I haven't been able to revert yet (there was some conflicts) so instead I've tried to manually bisect this by grabbing the first official build with this commit (69.0.3491.0) and the one preceding it (69.0.3489.0). Then I've started these 2 builds, each with a new user data dir and opened a Gmail tab in each of them. After 10 min of usage the handle counts are: - 69.0.3489.0: 314 - 69.0.3491.0: 522 , at least 220 of these handles contain "CrSharedMem" in their name. So https://crrev.com/c/1127948, or something else in the 69.0.3489.0..69.0.3491.0 range seems to be causing the handle count to spike.
,
Aug 14
This keeps increasing, I'm at 313 VS 573 now. Note that the handle count for the browser process seems stable, so this might not be the only issue (the handle count for my Canary is at ~20k, it has been running for 2 days). I'll keep these 2 builds running overnight to see if it keeps increasing.
,
Aug 14
I found a repro for this, just start a build that has this commit, open 2 tabs (e.g. cnn.com and a NTP) and keep Ctrl+Tab pressed (to quickly cycle through these 2 tabs). After 1 min of cycling I had one tab using 8GB of commit space (it's hard to measure this because you don't get this value directly in process explorer, instead you just need to kill the tab and look at the drop in commit charge).
,
Aug 14
,
Aug 14
Reassigning to Alex. Lmk if I can help.
,
Aug 15
I think we found the cause of the leak. The fix is on the way: https://crrev.com/c/1175918/7
,
Aug 15
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6544b8a5f6cd7a060f1293d9683b0d8f88ba1cd6 commit 6544b8a5f6cd7a060f1293d9683b0d8f88ba1cd6 Author: Alexandr Ilin <alexilin@chromium.org> Date: Wed Aug 15 23:08:54 2018 base: Fix shared memory handle leak in IPC serialization PlatformSharedMemoryRegion leaked handles after serializing into the IPC message on Windows, Mac and Fuchsia. Serialization code incorrectly assumed that HandleWin, MachPortMac and HandleFuchsia classes take ownership over the handle. What really happens is that HandleFoo is passed to WriteParam(), which causes ParamTraits<HandleFoo>::Write() to be invoked, before it returns, creating the HandleAttachmentFoo, which will duplicate the handle. By the time we return from WriteParam() the handle should have been duplicated into the serialized message, so we can safely let the PSMR close the handle when it goes out of scope. This CL makes the PlatformSharedMemoryRegion serialization repeat the logic of the base::SharedMemoryHandle serialization as if the OwnershipPassesToIPC() flag is always set to true. Bug: 873313 , 873668 Change-Id: I4e98ff9483a48970ebcb2fcf6b150f447af82794 Reviewed-on: https://chromium-review.googlesource.com/1175918 Commit-Queue: Wez <wez@chromium.org> Reviewed-by: Daniel Cheng <dcheng@chromium.org> Reviewed-by: Wez <wez@chromium.org> Reviewed-by: Ken Rockot <rockot@chromium.org> Cr-Commit-Position: refs/heads/master@{#583434} [modify] https://crrev.com/6544b8a5f6cd7a060f1293d9683b0d8f88ba1cd6/ipc/ipc_message_utils.cc
,
Aug 16
Note that the diagnosed leak affected all platforms that use Chrome IPC.
,
Aug 16
Pls update bug with canary result tomorrow and request a merge to M69 when it is safe to merge. Thank you.
,
Aug 16
Re c#27: the diagnosed leak didn't affect Linux, Android and Chrome OS.
,
Aug 16
The NextAction date has arrived: 2018-08-16
,
Aug 16
How is the change listed at #26 looking in canary?
,
Aug 16
Looking at https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.name+LIKE+%27%5BOut+of+Memory%5D%25%27+AND+product_name%3D%27Chrome%27+AND+expanded_custom_data.ChromeCrashProto.channel%3D%27canary%27 it looks like the OOM rate is lower, but it might be because there's not so many users with this build yet, so we should probably wait to get more data. I haven't seen any crash that seems to be caused by Alex's CL and it clearly fix some issues so I think that it should be merged, but we can maybe wait for a few hours to get more crash reports?
,
Aug 16
The change listed at #26 wasn't released in canary yet, according to https://storage.googleapis.com/chromium-find-releases-static/654.html#6544b8a5f6cd7a060f1293d9683b0d8f88ba1cd6. AFAIK we don't have a metric that directly shows the commit charge or the number of opened handles. We can expect that the number of OOM crashes will go down, though. We don't have full-day data for Aug 16 so it's incorrect to compare it with previous days.
,
Aug 16
Yeah, change is not in canary yet. Lets wait until Monday morning to get more canary coverage. Pls update the bug on Monday morning with canary result.
,
Aug 16
Re #34: Since there is no possibility of rolling this build out to stable, I would recommend that we roll a fresh Canary immediately, review numbers tomorrow, and merge then if they look no worse. Note that the code that was patched is definitely incorrect - the patch restores the behaviour of the original implementation, which was correct - it's not a speculative fix. There is some small risk that calling-code has other bugs causing reliance on the broken, leaky behaviour, but given that this is how the old code worked, again we suspect not.
,
Aug 16
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9ba9f73180a591eba65b70fa708bdac3786cead9 commit 9ba9f73180a591eba65b70fa708bdac3786cead9 Author: Alexandr Ilin <alexilin@chromium.org> Date: Thu Aug 16 17:35:10 2018 base: Fix shared memory handle leak in IPC serialization PlatformSharedMemoryRegion leaked handles after serializing into the IPC message on Windows, Mac and Fuchsia. Serialization code incorrectly assumed that HandleWin, MachPortMac and HandleFuchsia classes take ownership over the handle. What really happens is that HandleFoo is passed to WriteParam(), which causes ParamTraits<HandleFoo>::Write() to be invoked, before it returns, creating the HandleAttachmentFoo, which will duplicate the handle. By the time we return from WriteParam() the handle should have been duplicated into the serialized message, so we can safely let the PSMR close the handle when it goes out of scope. This CL makes the PlatformSharedMemoryRegion serialization repeat the logic of the base::SharedMemoryHandle serialization as if the OwnershipPassesToIPC() flag is always set to true. TBR: abdulsyed Bug: 873313 , 873668 Change-Id: I4e98ff9483a48970ebcb2fcf6b150f447af82794 Reviewed-on: https://chromium-review.googlesource.com/1175918 Commit-Queue: Wez <wez@chromium.org> Reviewed-by: Daniel Cheng <dcheng@chromium.org> Reviewed-by: Wez <wez@chromium.org> Reviewed-by: Ken Rockot <rockot@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#583434}(cherry picked from commit 6544b8a5f6cd7a060f1293d9683b0d8f88ba1cd6) Reviewed-on: https://chromium-review.googlesource.com/1178261 Cr-Commit-Position: refs/branch-heads/3524@{#5} Cr-Branched-From: 4c984bc7664755d701c75b49b795678e164551e3-refs/heads/master@{#583420} [modify] https://crrev.com/9ba9f73180a591eba65b70fa708bdac3786cead9/ipc/ipc_message_utils.cc
,
Aug 17
We have <24hrs of data at this point, but if we look at e.g. the blink::PageMemoryRegion::Allocate signature then crash rates look like: 70.0.3523.0 - 69 70.0.3524.0 - 38 (build was live for ~half the day, prior to .3 rev, subject to whether users restarted it) 70.0.3524.3 - 1 (the report shows 'system commit remaining' of >4GB, and cpu-architecture is x86 -> address-space exhaustion). 70.0.3525.0 - 0 So things look as-expected so far; suggest reviewing later today to confirm report #s in 3525.
,
Aug 20
The NextAction date has arrived: 2018-08-20
,
Aug 20
The number of OOM crashes went back to normal after 70.0.3525.0: https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.name+LIKE+%27%5BOut+of+Memory%5D%25%27+AND+product_name%3D%27Chrome%27+AND+expanded_custom_data.ChromeCrashProto.channel%3D%27canary%27 Requesting merge to M69.
,
Aug 20
This bug requires manual review: M69 has already been promoted to the beta branch, so this requires manual review Please contact the milestone owner if you have questions. Owners: amineer@(Android), kariahda@(iOS), cindyb@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Aug 20
Approving merge to M69 branch 3497 based on comment #37, #39 and per offline chat with wez@. Pls merge ASAP and mark bug as fixed if nothing else is pending. Thank you.
,
Aug 20
Taking care of the merge now.
,
Aug 20
Thank you wez@.
,
Aug 20
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/579a09af56a7648c5380bc126e0c42c45cfa6cfd commit 579a09af56a7648c5380bc126e0c42c45cfa6cfd Author: Alexandr Ilin <alexilin@chromium.org> Date: Mon Aug 20 17:53:48 2018 base: Fix shared memory handle leak in IPC serialization PlatformSharedMemoryRegion leaked handles after serializing into the IPC message on Windows, Mac and Fuchsia. Serialization code incorrectly assumed that HandleWin, MachPortMac and HandleFuchsia classes take ownership over the handle. What really happens is that HandleFoo is passed to WriteParam(), which causes ParamTraits<HandleFoo>::Write() to be invoked, before it returns, creating the HandleAttachmentFoo, which will duplicate the handle. By the time we return from WriteParam() the handle should have been duplicated into the serialized message, so we can safely let the PSMR close the handle when it goes out of scope. This CL makes the PlatformSharedMemoryRegion serialization repeat the logic of the base::SharedMemoryHandle serialization as if the OwnershipPassesToIPC() flag is always set to true. TBR: govind Bug: 873313 , 873668 Change-Id: I4e98ff9483a48970ebcb2fcf6b150f447af82794 Reviewed-on: https://chromium-review.googlesource.com/1175918 Commit-Queue: Wez <wez@chromium.org> Reviewed-by: Daniel Cheng <dcheng@chromium.org> Reviewed-by: Wez <wez@chromium.org> Reviewed-by: Ken Rockot <rockot@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#583434}(cherry picked from commit 6544b8a5f6cd7a060f1293d9683b0d8f88ba1cd6) Reviewed-on: https://chromium-review.googlesource.com/1181701 Cr-Commit-Position: refs/branch-heads/3497@{#717} Cr-Branched-From: 271eaf50594eb818c9295dc78d364aea18c82ea8-refs/heads/master@{#576753} [modify] https://crrev.com/579a09af56a7648c5380bc126e0c42c45cfa6cfd/ipc/ipc_message_utils.cc
,
Aug 20
,
Aug 24
|
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by sebmarchand@chromium.org
, Aug 10