New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 753007 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 2
Type: Bug

Blocking:
issue 751812



Sign in to add a comment

tcmalloc aggressively uses MADV_FREE when switching to a new Linux sysroot even when unsupported by the kernel

Project Member Reported by thomasanderson@chromium.org, Aug 7 2017

Issue description

To reproduce:
$ git cl patch 598737
$ gclient sync

Do a release build using the arguments of linux_chromium_rel_ng:
dcheck_always_on = true
ffmpeg_branding = "Chrome"
is_component_build = false
is_debug = false
proprietary_codecs = true
strip_absolute_paths_from_debug_symbols = true
symbol_level = 1
use_goma = true

MemoryInstrumentationTest.PrivateFootprintComputatio fails with:

../../content/browser/tracing/memory_instrumentation_browsertest.cc:107: Failure
Value of: after_kb - before_kb
Expected: (is >= -6656) and (is <= 6656)
  Actual: 66732
../../content/browser/tracing/memory_instrumentation_browsertest.cc:111: Failure
Value of: during_kb - after_kb
Expected: (is >= 63560) and (is <= 69560)
  Actual: -8
 

Comment 1 by hjd@chromium.org, Aug 8 2017

Cc: primiano@chromium.org ssid@chromium.org
Status: Started (was: Assigned)
+primiano & ssid

fyi the end-to-end CMM test is breaking on Debian Stretch blocking updating the bots from Jessie.

e.g. https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Ftryserver.chromium.linux%2Flinux_chromium_rel_ng%2F518482%2F%2B%2Frecipes%2Fsteps%2Fcontent_browsertests__with_patch_%2F0%2Flogs%2FMemoryInstrumentationTest.PrivateFootprintComputatio%2F0

Comment 2 by hjd@chromium.org, Aug 8 2017

Cc: erikc...@chromium.org
So I spent some time investigating this today, repo was easy with the instructions above (Thanks Tom!).

Looks like we correctly see the allocation but then never see the deallocation although the system knows the memory is free and will reuse it.
We saw this optimisation on MacOS for small allocations (<64mb) but now we're seeing even for very large allocations (>640mb) on Linux.

I tried various experiments with 2 allocations:

// numbers in kb, alloc and free are the amount of memory allocated/freed the rest are measurements of browser_dump->os_dump->private_footprint_kb

// 650mb then 650mb
before1 7560
alloc1  665600
during1 674524
free1   665600
after1  674536

before2 674536
alloc2  665600
during2 674536
free2   665600
after2  674540

// 650mb then 325mb
before1 7436
alloc1  665600
during1 674436
free1   665600
after1  674440

before2 674440
alloc2  332800
during2 674440
free2   332800
after2  674440

// 650mb then 1300mb
before1 7584
alloc1  665600
during1 674552
free1   665600
after1  674552

before2 674556
alloc2  1331200
during2 2008380
free2   1331200
after2  2008380

// 65kb then 65kb
before1 7404
alloc1  65
during1 9596
free1   65
after1  9688
before2 9688
alloc2  65
during2 9688
free2   65
after2  9756

// 650mb overlapped with 650mb
before1 7516
alloc1  665600
before2 674464
alloc2  665600
during1 1341376
during2 1341376
free2   665600
after2  1341376
free1   665600
after1  1341380

Note that is all cases we never observe the memory reported by private footprint to go down.
I also tried putting in a bunch of sleeps in case it was a timing issue but no dice.
Next I'm going to try catting statm directly while running the test to check it isn't a bug with the service code.


Which OS do I need in order to reproduce this? 
is this just sysroot dependent or depends also on the host OS (i.e. the actual libc version?)
I need to take a look with strace to see what happens under the hoods

I think it's only sysroot dependent and you can repro on at least Ubuntu Trusty
Cc: hjd@chromium.org
Owner: thomasanderson@chromium.org
So I did a quick check with strace and looks like we really end up mmap-ing the 65 MB array (note 65 not 650 as per #2, which is consistent with  kAllocSize = 65 * 1024 * 1024; in the code) but never munmap-ing it.
The actual computation code is working fine, so I have two theories here:

1) The switch to the sysroot is causing base::MakeUnique<char[]> to somehow mess up  the deleter (see https://groups.google.com/a/chromium.org/d/msg/chromium-dev/iQgMedVA8-k/IP6qa__3BQAJ). In which case this would be a real bug because we use MakeUnique<.*[]> in chrome and that would mean we'd start leaking those unique pointers.

2) This is some weird optimization in the allocator. However there are two things I cannot get in this case: (i) "retaining 65 MB after a free() seems very hard to digest as a plausible optimization";  (ii) AFAIK the allocator is still part of libc.so and unless I am missing something, we are not statically linking it here. If this is the case this wouldn't explain how I get this repro on Trusty by just tweaking the  sysroot.

thomasanderson@, unfortunately I am about to leave. Could you please check that this isn't actually (1) ?

If we are confident that there isn't a problem that causes us to leak memory but it's just a problem with the test:
- assign back the bug to hjd@
- hjd@:  just remove deallocation coverage in the test to unblock the sysroot work
- keep this bug open and assigned to me because I still need to understand what's going on.
I created a minified repro. I can rule out 1 and still convinced the test does the right thing.
I can repro even with malloc/free (see https://chromium-review.googlesource.com/c/612070/1/test.cc) which suggests to me for whatever reason the allocator is seriously flawn or I am missing something gigantic here

The minified repro works fine without the sysroot CL. Specifically with reference to the minified repro above.

With the old sysroot I get:
-----
VirtSize=19312 KB, RSS=2852 KB
malloc done, press a key to .reset()
VirtSize=86004 KB, RSS=69852 KB
reset done, press a key to exit()
VirtSize=86004 KB, RSS=3328 KB
-----

With new sysroot:
-----
VirtSize=19312 KB, RSS=2460 KB
malloc done, press a key to .reset()
VirtSize=86004 KB, RSS=69660 KB
reset done, press a key to exit()
VirtSize=86004 KB, RSS=69828 KB  <- !!!!!!!
-----

I think the key is in the strace output:

old:
---
mmap(0x1fafa6be1000, 68157440, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x1fafa6be1000
...
madvise(0x1fafa6be1000, 68157440, MADV_DONTNEED) = 0
---

new:
---
mmap(0xa2e967af000, 68157440, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xa2e967af000
...
madvise(0xa2e967af000, 68157440, 0x8 /* MADV_??? */) = -1 EINVAL (Invalid argument)
---

So it looks like:
- free() does NOT munmap() but just madvise, this behavior exists also in the current sysroot.
- in the new sysroot free() uses a madvise argument that the trusty kernel does not understand.

So, this is a real bug and could lead to real memory leaks in production code.
Summary: The allocator in the Debian Stretch sysroot uses a madvise argument for free() which doesn't seem supported on trusty (was: MemoryInstrumentationTest.PrivateFootprintComputatio fails on Linux when using Debian Stretch sysroot)
Ah there we go.
Kernel 4.5 introduced recently MADV_FREE, which brings delayed free semantics 
https://github.com/torvalds/linux/commit/854e9ed09dedf0c19ac8640e91bcc74bc3f9e5c9

which means that memory gets freed in a delayed only upon pressure.

In turn the "stretch" sysroot seem to have adopted that aggressively without performing and backwards compatible checks. So in essence free() in the new sysroot is using a syscall argument that kernels < 4.5 do not understand, effectively turning free() into a no-op (still a bit surprised, I though that free was immplemented in libc.so not in the sysroot, but maybe there is some header int he middle).

I am afraid that we cannot use this sysroot as-is, at least not in this state.
Also I wonder how many other surprises like this are there in the new sysroot (in terms of using bleeding edge kernel syscalls which are unsupported on current kernel versions)

First of all, kudos to hjd@ for adding this test!  It was the only test failure, and needless to say, it would have been disastrous to let this fly under the radar.

As for the fix, I think the issue is in tcmalloc.  0x8 is MADV_FREE, which was added in Linux kernel 4.5
https://github.com/gperftools/gperftools/issues/780

But there are some places in tcmalloc where we test if we should use MADV_FREE if the macro is defined.
eg. https://cs.chromium.org/chromium/src/third_party/tcmalloc/vendor/src/system-alloc.cc?type=cs&q=MADV_FREE&sq=package:chromium&l=67
Sure enough, MADV_FREE was *not* defined in the Jesise sysroot, but is in the Stretch sysroot.

I think a good short-term solution is to switch tcmalloc to always use MADV_DONTNEED on Linux, and long-term to add runtime detection for MADV_FREE.  primiano@ wdyt?
c#8 beat me to it!

We're using tcmalloc, not the libc allocator, correct?
Ahhh I completely forgot about tcmalloc (double facepalm, as I am on of the two owners of //base/allocator).
Yes I see the problem now, that code in tcmalloc was speculating on the fact that only FreeBSD supports MADV_FREE.
Adding an #if defined(OS_LINUX) to that (note tcmalloc/chromium not tcmalloc/vendor, the latter is the upstream reference) seems the right thing to do.
Also we should check if this has been fixed in tcmalloc upstream and if possible just cherry-pick that patch.
Summary: tcmalloc aggressively uses MADV_FREE when switching to a new Linux sysroot even when unsupported by the kernel (was: The allocator in the Debian Stretch sysroot uses a madvise argument for free() which doesn't seem supported on trusty)

Sign in to add a comment