New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 765406 link

Starred by 5 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug
Hotlist-MemoryInfra

Blocked on:
issue 766882



Sign in to add a comment

Update PartitionAlloc to only MEM_RESERVE the SuperPage until an actually allocation is made.

Project Member Reported by erikc...@chromium.org, Sep 14 2017

Issue description

The current calculation uses committed memory for each process. It has the following drawbacks:

* Committed memory overcounts private memory footprint of allocators, including [PartitionAlloc, HeapAlloc, and the v8 allocator]. These allocators will commit regions of a given size [2 MB on PartitionAlloc, 500KB for v8, 128 KB for HeapAlloc], but actual usage of the region depends on allocation patterns. This has been observed to overestimate the private memory footprint of renderer processes by up to 20 MB. The bulk of this comes from PartitionAlloc - each new superpage consumes 2MB of committed memory, even if there is minimal usage of the superpage.

* GPU drivers will allocate large, committed regions with lower resident-memory usage. A difference of at least 50MB between committed + resident has been observed. This will cause private memory footprint to be non-trivially over-estimated on GPUs.

Unfortunately, there is no way to use system APIs to distinguish between committed memory that has never been touched, and committed memory that has been touched, but then compressed/swapped, so there is no generic solution to this problem. 

Specifically, we could:
Update PartitionAlloc to keep track of: 
  1) Total size of all committed regions.
  2) Total size of all blocks that have never been touched. [Basically, keep a high-water-marker for highest block].

Then a better private memory footprint estimate will be [committed - (1) + (2)].
 
Labels: -Pri-3 M-63 Pri-1
Owner: ajwong@chromium.org
Status: Assigned (was: Untriaged)
Based on a previous investigation by Bruce Dawson, the marginal performance cost of additional calls to VirtualAlloc should be relatively small. 

See discussion at https://groups.google.com/a/chromium.org/forum/?utm_medium=email&utm_source=footer#!msg/chromium-dev/ELSYMXnvbBc/2YXPa_pCBAAJ.

As such, we should investigate changing the behavior of partition alloc on Windows to be more lazy about committing memory in super pages.

ajwong: Please take a look at this. While you're at it, please also update the documentation at key_concepts.md.

Comment 2 by ajwong@chromium.org, Sep 19 2017

Summary: Update PartitionAlloc to only MEM_RESERVE the SuperPage until an actually allocation is made. (was: Optimization for Windows Private Memory Footprint)
Fixing title to match actual work.

Comment 3 by ajwong@chromium.org, Sep 19 2017

Blockedon: 766882
It is probably worthwhile to write a quick test program that reserves some address space and then walks through it touching and committing pages (touching the pages is critical). The test could try committing all the pages ahead of time, doing them one at a time, doing the 64 KB at a time, etc., and then time this (with the machine set to high performance mode or running for several seconds for timings to stabilize and then reporting the median or maybe fastest times). It would be good to confirm the overhead and use that to tweak the minimum number of pages that we commit at a time.

Comment 5 Deleted

Deleted comment #5 as I just read the code wrong and write up a confusingly incorrect summary.

Comment 7 by ajwong@chromium.org, Oct 12 2017

Okay, read through this (a lot) more.  Thanks to palmer@ for helping me work through the math and weave of functions.

Here's the current understanding.

The PartitionAlloc code actually *does* do commit/decommit in steady state after a set or "PartitionPages" have been assigned to a PartitionBucket. Thus, I don't think it's necessary to do the microbenchmark as it doesn't affect hot path or do significantly more work than steadystate.

The problem we are having is with the initial allocation of the SuperPage (which comes as a 2020kb of committed memory after guard pages get decommitted. See https://chromium.googlesource.com/chromium/src.git/+/master/base/allocator/partition_allocator/partition_alloc.cc#388) followed by fragmentation of the SuperPage because the upper layers of the PartitionAlloc code assumes that "not faulted" == Decommitted which is false on windows.

You can create a pathological case as follows:

artitionAlloc has multiple bucket sizes that quantizes your allocation size (which in debug, is padded by 32-bytes for cookie guards). The largest bucket size (despite documentation) is 0xf0000 = 983040.    Buckets reach for space from a "SuperPage" which is 2MB of which 32kb is guard and metadata.  This means that 1 SuperPage has an effective 2064384 bytes of data available. Even though SuperPages are said to be part of an "Extent", the PA logic doesn't merge them so it's not really a contiguous region  due to the metadata location as well as the guard regions (this is a terminology mistake...the SuperPage *is* the extent currently. oops).  Thus, if we want to screw up PA via fragmentation, we only need force to to allocate new SuperPages ineffeciently.

1 SuperPage can hold (2064384 /983040) = 2.1 allocations of the largest bucket.  0.1*983040 = 98304 = 96k.

So if you allocate (983040 - 32 - 1) followed by 97*1024 and touch those pages in a loop, you will result in a series of super pages with 2020kb of committed memory but only 1064kb touched where the 956kb left over are NEVER usable again. This is about 47% wastage in the pathological case.


Comment 8 by ajwong@chromium.org, Oct 12 2017

Here's a test case to show the problem.

https://chromium-review.googlesource.com/#/c/chromium/src/+/714367

I'm unclear what the impact is in production. Will spend a little bit of time looking at stats, but then probably just move on to trying to fix it.

There are 2 fixes required:
  (a) Allocate the SuperPage as Reserved only. Commit as needed before assigning to a Bucket.
  (b) See if the "extent" concept should actually merge sequential SuperPages to *actually* be an extent. This would remove fragmentation.

(a) Is medium sized work because the PageAllocator API abstraction needs fixing. And then a bunch of the comments + stats gathering have the mental model wrong and might be wrong. But we need to detangle to understand.

(b) Is large sized work. It might end up churning like 30%+ of the logic since it'd change all the metadata lookups and assumptions.

Currently focusing only on (a).

Comment 9 by ajwong@chromium.org, Oct 12 2017

Cc: haraken@chromium.org palmer@chromium.org
Adding palmer and haraken for awareness as PartitionAlloc OWNERS.
Your plan sounds reasonable to me. (a) makes sense.

I concur.
Project Member

Comment 12 by bugdroid1@chromium.org, Oct 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b43627f02312b1045039c35e68cf5ca17ff4ec86

commit b43627f02312b1045039c35e68cf5ca17ff4ec86
Author: Albert J. Wong <ajwong@chromium.org>
Date: Fri Oct 20 22:53:11 2017

Only commit used regions of a SuperPage.

Previously, PartitionAlloc would allocate a SuperPage (2MB)
as committed even before the regions inside it were given out
to PartitionBuckets for actual use. This was wasteful, particularly
in the case of SuperPage fragmentation because these committed
bytes would be leaked by heap itself.

Because these pages are never faulted in, this had 0 impact on the
amount of RAM being used, but in Windows, due to its committed
memory model, Pagefile space was still reserved causing unnecesasry
Pagefile growth in the best case, or in the worst case for systems
with restricted pagefile size (either due to  Windows's intrinsic limit
on max Pagefile size being a small multiple of physical ram, or
adverse system policies in managed Windows deployments) it this
memory commitment could adversely affect the ability of other
programs to launch.

This CL is a patch that fixes the problem, though longer term it
would be cleaner to restructure some of the Page Allocator APIs.

Bug:  765406 
Change-Id: I7b87466353b1e22735eb55e9a2af325f439d412e
Reviewed-on: https://chromium-review.googlesource.com/717942
Reviewed-by: Chris Palmer <palmer@chromium.org>
Reviewed-by: Kentaro Hara <haraken@chromium.org>
Reviewed-by: Albert J. Wong <ajwong@chromium.org>
Commit-Queue: Chris Palmer <palmer@chromium.org>
Cr-Commit-Position: refs/heads/master@{#510596}
[modify] https://crrev.com/b43627f02312b1045039c35e68cf5ca17ff4ec86/base/allocator/partition_allocator/PartitionAlloc.md
[modify] https://crrev.com/b43627f02312b1045039c35e68cf5ca17ff4ec86/base/allocator/partition_allocator/partition_alloc.cc
[modify] https://crrev.com/b43627f02312b1045039c35e68cf5ca17ff4ec86/base/allocator/partition_allocator/partition_alloc.h

Status: Fixed (was: Assigned)
The CL has landed and is sticking. We should no longer be leaking commit in this way.

Sign in to add a comment