Chrome on Chrome OS still uses an old snapshot of tcmalloc. We will use this bug to track the effort and progress of switching to an alternative implementation.
Mao recently has successfully compiled Chrome with ToT tcmalloc in his sandbox, and we are in the process of trying to measure the before/after impact of the switch.
I don't see the point of using tcmalloc for heap profiling.
Just use https://chromium.googlesource.com/chromium/src/+/master/chrome/profiling/README.md
tcmalloc heap profiler still doesn't intercept partition alloc and oilpan allocations.
Also, the current heap profiler has all the workarounds in place to deal with sandboxing (already deals with getting mmaps from the browser process).
Furthermore the chrome heap profiler is way faster, bot due to its OOP design and due to the possibility of enabling sampling.
I think that trying to revive the tcmalloc heap profiler has very little sense.
also, updating to the latest tcmalloc is very attractive because Google has been working on several improvements for it.
We (chromeos-toolchain) are interested in applying/collaborating-on some of those improvements (like use of huge pages).
is tcmalloc the long term plan for allocators on linux-like platforms? On Android, we do not use it afaik, so this is for just Linux and ChromeOS? What about using partition-alloc instead on these platforms? +palmer
I'd like to see some comparisons of the perf/memory for updating to latest version, have we got any results from the perf bots?
I can happily have the discussion on this bug, or on one of the CLs linked from https://crrev.com/c/961780, let me know what you'd prefer.
#7: Do you mean Transparent Huge Pages (in the Linux sense: `MADV_HUGEPAGE` as a parameter to `madvise`)? We (verwaest and I) believe that *not* using THPs, as Chrome and Chrome OS currently do not, might have value as a security mitigation. (Ping me privately for more detail if you like.)
> On Android, we do not use it afaik
Correct
> so this is for just Linux and ChromeOS?
Yup
> What about using partition-alloc instead on these platforms? +palmer
Curious here as well, and generally feel aligned with the question "is tcmalloc the long term plan for allocators on linux-like platforms?"
> I'd like to see some comparisons of the perf/memory for updating to latest version, have we got any results from the perf bots?
Strong +1 to this
On profiling, there may be some misconceptions about tcmalloc's allocation sampler (heapz). heapz is always on in ~every C++ (and Java and Go) server inside google, and costs <0.1% CPU and a few tens of kilobytes per process (proportional to memory allocated). The output isn't a trace, but a sampled callgraph of allocation sites.
It's technically true that OOPHP has no overhead when it's disabled, but the allocator shim has the cost of at least a branch, which is the same as heapz when disabled. It looks like OOPHP has a 50% CPU overhead when enabled. I'm not entirely sure what the OOPHP is reporting back; perhaps it's more detailed than heapz? Covering partition and oilpan is obviously nice, but it's somewhat orthogonal because tcmalloc's sampler could be extended (or copied) for those as well.
On the broader topic of refreshing the allocator, the tcmalloc in chromeos is pretty ancient, and we've invested a lot into improving tcmalloc inside google3 (which is somewhat consistent with gperftools as well), with really impressive results. Whether the improvements translate to ChromeOS is an open question. 3.4% of CPU is spent in tcmalloc on ChromeOS, so there's definitely some opportunity. Whether tcmalloc is the best allocator for chromeos is also an open question, and performance may not be the only deciding factor. We should evaluate the options, ideally with A/B field tests.
I would like to restart this discussion, and find some owner on the Chromium side that can work with me (mainly on reviewing the changes) to uprev tcmalloc.
I ran some experiments on benchmarks with the old and the new tcmalloc.
The goal was to find a set of configuration parameters that don't introduce memory or performance regressions.
https://docs.google.com/document/d/1MDswP07GuBAzMmwPjGinKfT6NhVdKpznK7-Ed1VyYjQ/edit?usp=sharing
According to alkby@, some additional tuning work is possible to get back part of the performance gains that were lost by enabling aggressive decommit, but that is separate from this uprev effort.
The following revision refers to this bug:
https://chromium.googlesource.com/chromium/src.git/+/73d72081d54976e5723d930eb8fa67559a22264d
commit 73d72081d54976e5723d930eb8fa67559a22264d
Author: Gabriel Marin <gmx@chromium.org>
Date: Wed Jul 25 22:18:47 2018
tcmalloc: use relative addresses with the windows addr2line wrapper
Modifies the Windows addr2line wrapper to expect addresses relative to
DllBase to better simulate how addr2line works with modules in Linux.
Windows DLLs have a concept of "default load address" which hints to the OS
where to load the binary image after relocation. The dbghelp.dll
symbolization library will load the module at this location in the virtual
address space meaning the caller of these functions would need to be aware
of the base address. This makes things unnecessarily complex in the face of
ASLR and also diverges from the behavior of addr2line when used with linux-
style DSOs. This CL simply adds the module base address to the incoming
addresses, thereby making the input relative addresses for the module which
both is easier to use and lines up better with linux's addr2line behavior.
These changes were made originally as part of CL
https://codereview.chromium.org/2730473002.
BUG=724399,b:70905156
Change-Id: I0abe9e0c380e7e60ae29a11021bb805b31718d08
Reviewed-on: https://chromium-review.googlesource.com/1147668
Commit-Queue: Gabriel Marin <gmx@chromium.org>
Reviewed-by: Albert J. Wong <ajwong@chromium.org>
Reviewed-by: Will Harris <wfh@chromium.org>
Cr-Commit-Position: refs/heads/master@{#578096}
[modify] https://crrev.com/73d72081d54976e5723d930eb8fa67559a22264d/third_party/tcmalloc/chromium/src/windows/addr2line-pdb.c
Last I checked, open-source tcmalloc was very different from google-internal tcmalloc, and chromium tcmalloc is very different from both. This was admittedly a while ago, but back then they had common ancestors but were basically 3 different things.
Is public tcmalloc now closer to google-internal tcmalloc? If not, is the proposal to use google-internal tcmalloc or public tcmalloc?
Upstream (open-source) tcmalloc and google-internal tcmalloc are still different things, but alkby@ has been working on upstreaming more of the internal changes.
Some changes (per-cpu caches) could not be open sourced because the upstream kernel didn't have the necessary support. There is a separate effort to upstream those kernel changes.
So, while the upstream gperftools doesn't have all the internal optimizations, it does get a subset of them and more are on the way.
The effort here is to use the newest public tcmalloc.
Re comment 26: Sounds great, thanks. Please make sure to upstream the chromium-specific changes we made (listed in third_party/tcmalloc/README.chromium) so that they don't get lost.
This bug requires manual review: DEPS changes referenced in bugdroid comments.
Please contact the milestone owner if you have questions.
Owners: benmason@(Android), kariahda@(iOS), kbleicher@(ChromeOS), govind@(Desktop)
For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Comment 1 by haraken@chromium.org
, May 19 2017