Issue metadata
Sign in to add a comment
|
Update PartitionAlloc to only MEM_RESERVE the SuperPage until an actually allocation is made. |
||||||||||||||||||||||||
Issue descriptionThe current calculation uses committed memory for each process. It has the following drawbacks: * Committed memory overcounts private memory footprint of allocators, including [PartitionAlloc, HeapAlloc, and the v8 allocator]. These allocators will commit regions of a given size [2 MB on PartitionAlloc, 500KB for v8, 128 KB for HeapAlloc], but actual usage of the region depends on allocation patterns. This has been observed to overestimate the private memory footprint of renderer processes by up to 20 MB. The bulk of this comes from PartitionAlloc - each new superpage consumes 2MB of committed memory, even if there is minimal usage of the superpage. * GPU drivers will allocate large, committed regions with lower resident-memory usage. A difference of at least 50MB between committed + resident has been observed. This will cause private memory footprint to be non-trivially over-estimated on GPUs. Unfortunately, there is no way to use system APIs to distinguish between committed memory that has never been touched, and committed memory that has been touched, but then compressed/swapped, so there is no generic solution to this problem. Specifically, we could: Update PartitionAlloc to keep track of: 1) Total size of all committed regions. 2) Total size of all blocks that have never been touched. [Basically, keep a high-water-marker for highest block]. Then a better private memory footprint estimate will be [committed - (1) + (2)].
,
Sep 19 2017
Fixing title to match actual work.
,
Sep 19 2017
,
Sep 20 2017
It is probably worthwhile to write a quick test program that reserves some address space and then walks through it touching and committing pages (touching the pages is critical). The test could try committing all the pages ahead of time, doing them one at a time, doing the 64 KB at a time, etc., and then time this (with the machine set to high performance mode or running for several seconds for timings to stabilize and then reporting the median or maybe fastest times). It would be good to confirm the overhead and use that to tweak the minimum number of pages that we commit at a time.
,
Oct 4 2017
Deleted comment #5 as I just read the code wrong and write up a confusingly incorrect summary.
,
Oct 12 2017
Okay, read through this (a lot) more. Thanks to palmer@ for helping me work through the math and weave of functions. Here's the current understanding. The PartitionAlloc code actually *does* do commit/decommit in steady state after a set or "PartitionPages" have been assigned to a PartitionBucket. Thus, I don't think it's necessary to do the microbenchmark as it doesn't affect hot path or do significantly more work than steadystate. The problem we are having is with the initial allocation of the SuperPage (which comes as a 2020kb of committed memory after guard pages get decommitted. See https://chromium.googlesource.com/chromium/src.git/+/master/base/allocator/partition_allocator/partition_alloc.cc#388) followed by fragmentation of the SuperPage because the upper layers of the PartitionAlloc code assumes that "not faulted" == Decommitted which is false on windows. You can create a pathological case as follows: artitionAlloc has multiple bucket sizes that quantizes your allocation size (which in debug, is padded by 32-bytes for cookie guards). The largest bucket size (despite documentation) is 0xf0000 = 983040. Buckets reach for space from a "SuperPage" which is 2MB of which 32kb is guard and metadata. This means that 1 SuperPage has an effective 2064384 bytes of data available. Even though SuperPages are said to be part of an "Extent", the PA logic doesn't merge them so it's not really a contiguous region due to the metadata location as well as the guard regions (this is a terminology mistake...the SuperPage *is* the extent currently. oops). Thus, if we want to screw up PA via fragmentation, we only need force to to allocate new SuperPages ineffeciently. 1 SuperPage can hold (2064384 /983040) = 2.1 allocations of the largest bucket. 0.1*983040 = 98304 = 96k. So if you allocate (983040 - 32 - 1) followed by 97*1024 and touch those pages in a loop, you will result in a series of super pages with 2020kb of committed memory but only 1064kb touched where the 956kb left over are NEVER usable again. This is about 47% wastage in the pathological case.
,
Oct 12 2017
Here's a test case to show the problem. https://chromium-review.googlesource.com/#/c/chromium/src/+/714367 I'm unclear what the impact is in production. Will spend a little bit of time looking at stats, but then probably just move on to trying to fix it. There are 2 fixes required: (a) Allocate the SuperPage as Reserved only. Commit as needed before assigning to a Bucket. (b) See if the "extent" concept should actually merge sequential SuperPages to *actually* be an extent. This would remove fragmentation. (a) Is medium sized work because the PageAllocator API abstraction needs fixing. And then a bunch of the comments + stats gathering have the mental model wrong and might be wrong. But we need to detangle to understand. (b) Is large sized work. It might end up churning like 30%+ of the logic since it'd change all the metadata lookups and assumptions. Currently focusing only on (a).
,
Oct 12 2017
Adding palmer and haraken for awareness as PartitionAlloc OWNERS.
,
Oct 13 2017
Your plan sounds reasonable to me. (a) makes sense.
,
Oct 13 2017
I concur.
,
Oct 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b43627f02312b1045039c35e68cf5ca17ff4ec86 commit b43627f02312b1045039c35e68cf5ca17ff4ec86 Author: Albert J. Wong <ajwong@chromium.org> Date: Fri Oct 20 22:53:11 2017 Only commit used regions of a SuperPage. Previously, PartitionAlloc would allocate a SuperPage (2MB) as committed even before the regions inside it were given out to PartitionBuckets for actual use. This was wasteful, particularly in the case of SuperPage fragmentation because these committed bytes would be leaked by heap itself. Because these pages are never faulted in, this had 0 impact on the amount of RAM being used, but in Windows, due to its committed memory model, Pagefile space was still reserved causing unnecesasry Pagefile growth in the best case, or in the worst case for systems with restricted pagefile size (either due to Windows's intrinsic limit on max Pagefile size being a small multiple of physical ram, or adverse system policies in managed Windows deployments) it this memory commitment could adversely affect the ability of other programs to launch. This CL is a patch that fixes the problem, though longer term it would be cleaner to restructure some of the Page Allocator APIs. Bug: 765406 Change-Id: I7b87466353b1e22735eb55e9a2af325f439d412e Reviewed-on: https://chromium-review.googlesource.com/717942 Reviewed-by: Chris Palmer <palmer@chromium.org> Reviewed-by: Kentaro Hara <haraken@chromium.org> Reviewed-by: Albert J. Wong <ajwong@chromium.org> Commit-Queue: Chris Palmer <palmer@chromium.org> Cr-Commit-Position: refs/heads/master@{#510596} [modify] https://crrev.com/b43627f02312b1045039c35e68cf5ca17ff4ec86/base/allocator/partition_allocator/PartitionAlloc.md [modify] https://crrev.com/b43627f02312b1045039c35e68cf5ca17ff4ec86/base/allocator/partition_allocator/partition_alloc.cc [modify] https://crrev.com/b43627f02312b1045039c35e68cf5ca17ff4ec86/base/allocator/partition_allocator/partition_alloc.h
,
Nov 21 2017
The CL has landed and is sticking. We should no longer be leaking commit in this way. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by erikc...@chromium.org
, Sep 15 2017Owner: ajwong@chromium.org
Status: Assigned (was: Untriaged)