New issue
Advanced search Search tips

Issue 3453 link

Starred by 71 users

Issue metadata

Status: Fixed
Owner:
Closed: May 2016
Cc:
Components:
HW: ----
NextAction: ----
OS: ----
Priority: 2
Type: FeatureRequest



Sign in to add a comment

endless map growth with --perf-basic-prof

Reported by bgr...@netflix.com, Jul 21 2014

Issue description

I'm using the --perf-basic-prof feature so that Linux perf_events can profile stack traces for Node.js 0.11.13, and generate flame graphs. It works great.

However, the perf map file does grow endlessly, reaching over 5 Gbytes after 3 days of production workload:

sprod$ ls -lh /tmp/perf-29828.map 
-rw-r--r-- 1 root root 5.8G Jul 15 01:14 /tmp/perf-29828.map
sprod$ 
sprod$ ps -fp 29828
UID        PID  PPID  C STIME TTY          TIME CMD
root     29828  2675 57 Jul12 ?        1-06:06:36 node --trace-gc --trace-gc-nvp --perf-basic-prof app.js
sprod$ 
sprod$ date
Tue Jul 15 01:14:58 UTC 2014
sprod$ 
sprod$ tail -20 /tmp/perf-29828.map 
b1190e3da80 33 Stub:utf8Write
b1190e3db20 33 Stub:write
b1190e3dbc0 33 Stub:utf8Write
b1190e3dc60 3b Stub:stack
b1190e3dd00 33 Stub:write
b1190e3dda0 33 Stub:utf8Write
b1190e3de40 33 Stub:write
b1190e3dee0 33 Stub:utf8Write
b1190e3df80 3b Stub:stack
b1190e3d760 33 Stub:write
b1190e3d800 33 Stub:utf8Write
b1190e3aa00 33 Stub:write
b1190e3aaa0 33 Stub:utf8Write
b1190e3ab40 3b Stub:stack
b1190e39d60 33 Stub:write
b1190e39e00 33 Stub:utf8Write
b1190e39ea0 33 Stub:write
b1190e39f40 33 Stub:utf8Write
b1190e39fe0 3b Stub:stack

I understood that --perf-basic-prof would try not to move symbols, reducing churn in this file. But I have a lot of churn.
 
Owner: yangguo@chromium.org
Yang, anything actionable here?

Comment 2 by yun...@netflix.com, Jan 7 2015

Yang? Any chance you'll be able to get to this? This seems like a pretty popular request. We'd love to be able to run with this option in production.
We are also very keen on the this, our volumes are such that production monitoring isn't really possible for any useful length of time without massive memory use.

Comment 4 by i...@bnoordhuis.nl, Jan 8 2015

The utf8Write stubs are most likely the result from the application calling Buffer#write('string', 'utf8') a lot (on many different buffers, par for the course for the average node app.)

It can probably be mitigated somewhat in node but I doubt that it's solvable in general.  Code that generates lots of stubs will keep filling up the log.

If you file an issue at https://github.com/iojs/io.js/issues and cc me (@bnoordhuis), I'll see what I can do.  Can you include the output of `cut -f3 perf.map | sort | uniq | sort -nr | head -50` so we know what the biggest offenders are?

It's perhaps worth mentioning that one of our (StrongLoop's) commercial offerings installs a kind of low-pass filter for "uninteresting" events at run-time, to prevent the logs from getting too big.

We mainly use it for normal (--prof) profiling but there is no reason it wouldn't work for perf.map.  It's probably not suitable for upstreaming in its current shape but maybe Yang or someone else can take the idea and run with it.

Comment 5 by yjx...@gmail.com, Jan 8 2015

Not sure what this "iojs" is, but we're using node.js. In any case this is a fix that needs to be made in v8, and not the node runtime.

Comment 6 by i...@bnoordhuis.nl, Jan 8 2015

@5: https://iojs.org/ - in a nutshell, most node.js maintainers (and therefore most active development) moved over to io.js because of governance problems around joyent/node.  We call it a "spork" to emphasize that it's not a hard fork but more of a spiritual (but backwards compatible) successor.
Owner: jarin@chromium.org
Not familiar with that feature. Reassigning to Jaroslav.

Comment 8 by r...@uber.com, Jan 8 2015

We would be very keen to run this in production too, it would be incredibly helpful to be able to take profiling from any production machine if we were able to run this.

Comment 9 by yun...@netflix.com, Feb 11 2015

Hi v8 team? Any updates on this? Demand for this looks to be off the charts. It'd be really great to get some traction on this.

Comment 10 by shahr...@uber.com, Feb 11 2015

Hi there, it would be really great to see this fixed please. need this for sampling without restarting processes with special flags

Comment 11 by jarin@google.com, Feb 11 2015

It is unclear what can be fixed here. Any code that gets compiled needs to be written to the log. As long as we keep recompiling, the log will grow. To prevent the growth, you would have to structure your application so that code re-compilation is minimized. That mainly means keeping functions alive so that they are not collected and avoid deopts. Not sure this is really feasible.

You could also try to prune old entries in the log that overlap in memory with newer entries (which basically means that the older entry got garbage collected). I do not think v8 can help much here, we just dump code objects and their addresses/sizes into the log file.

Comment 12 by bgr...@netflix.com, Feb 11 2015

Yes, it's unclear, but the log (in the ticket description) I think has a clue: it shows three symbols that are continually written to different locations. Relocation? The original patch for --perf_basic_prof has (https://codereview.chromium.org/70013002/diff/180001/src/isolate.cc):

 2072   // If we are profiling with the Linux perf tool, we need to disable
 2073   // code relocation.
 2074   if (FLAG_perf_jit_prof || FLAG_perf_basic_prof) {
 2075     FLAG_compact_code_space = false;
 2076   }

This code was later removed (https://codereview.chromium.org/367033002/patch/120001/130008).

Might the problem be fixed if this code is returned?

Comment 13 by bgr...@netflix.com, Feb 11 2015

... Something not clear from the original log is the rate: the log is growing by about 1,000 symbols per second, which are the same three hot functions. Why recompile the same function hundreds of times per second? Can we avoid that?
Those are not functions, those are IC stubs. I doubt it's relocation [code compaction is disabled via negative flag implication, see https://codereview.chromium.org/367033002/diff/120001/src/flag-definitions.h]  - most likely some code is producing new maps all the time and then hits an IC that causes a new IC stub to be compiled for it.

As an interim solution just put !code->IsStub() into PerfBasicLogger::LogRecordedBuffer. I suspect you don't care much about stubs anyways. 

On the high level --perf-basic-prof is too *basic* for you to use it like this in production. JavaScript is way too dynamic so you simply can't expect that V8 at some point will reach a stable point where it does not compile any new code. It happens only to a certain programs (and even then V8 would GC code from time to time just to keep memory consumption at bay). 

You can write a custom JitCodeEventHandler that respects CODE_REMOVED event and keeps accurate map of code space in memory. Then if you want to attach perf and start recording you can ask this handler to dump precise map into /tmp/perf-${pid}.map instead of continuously appending to the /tmp/perf-${pid}.map

Comment 15 by bgr...@netflix.com, Feb 12 2015

Ah, thanks for the explanation. I hacked up a dirty workaround that we can use for further testing. IsStub() isn't in the Code object, and CompilationInfo isn't handy, so I just matched on the symbol name to exclude the "Stub:" symbols:

--- node/deps/v8/src/log.cc	2015-02-11 22:59:26.000000000 +0000
+++ node-brendan/deps/v8/src/log.cc	2015-02-12 03:35:17.172384034 +0000
@@ -284,9 +284,12 @@
                                        int length) {
   DCHECK(code->instruction_start() == code->address() + Code::kHeaderSize);
 
-  base::OS::FPrint(perf_output_handle_, "%llx %x %.*s\n",
-                   reinterpret_cast<uint64_t>(code->instruction_start()),
-                   code->instruction_size(), length, name);
+  if (strstr(name, "Stub:") != name) {
+    base::OS::FPrint(perf_output_handle_, "%llx %x %.*s\n",
+                     reinterpret_cast<uint64_t>(code->instruction_start()),
+                     code->instruction_size(), length, name);
+    Flush(perf_output_handle_);
+  }
 }

I also threw a Flush() in there. The by-default 4k buffering would mean we'd miss a handful of symbols when translating a profile, unless we stopped node first to flush the buffer.

Yes, an on-demand solution might be preferable. We have experience with both: Java's perf-map-agent began as an appender (like v8), then switched to a dump on demand only. The problem with on demand (at least, the way perf-map-agent works) is that it can require over a second of CPU time to do the full dump, and I'm usually trying this at the worst possible time: when Java is already CPU overloaded and I'm profiling to see why. The extra demand on CPU can cause a spike in request latency.

As for V8 compilation never reaching a stable point: an appender might still be ok for us. Our server instances typically have a short life span (days). The problem was that the Stub symbols were causing the log file to grow by ~2 Gbytes per day. If that's now ~2 Mbytes per day (we can find out with this hack), it's might be fine.

Comment 16 by shac...@peer5.com, Feb 12 2015

@15: Would love to get an update here once you have tested it! thanks
@bgregg: ah true there is no simple IsStub on the code object. you could ask inverse easily though: 

(code->kind() == Code::FUNCTION || code->kind() == Code::OPTIMIZED_FUNCTION) 

instead of !code->IsStub() that I suggested by mistake.  

Comment 18 by bgr...@netflix.com, Feb 24 2015

Filtering "Stub:", as I previously posted a patch for, reduced the growth from 1.6 Gbytes to 400 Mbytes per day. Still too high.

Matching on those code types as you suggested was much better. Using this:

diff -ur node/deps/v8/src/log.cc node-patched/deps/v8/src/log.cc
--- node/deps/v8/src/log.cc	2015-02-11 22:59:26.000000000 +0000
+++ node-patched/deps/v8/src/log.cc	2015-02-24 21:41:47.928289262 +0000
@@ -284,9 +284,13 @@
                                        int length) {
   DCHECK(code->instruction_start() == code->address() + Code::kHeaderSize);
 
-  base::OS::FPrint(perf_output_handle_, "%llx %x %.*s\n",
-                   reinterpret_cast<uint64_t>(code->instruction_start()),
-                   code->instruction_size(), length, name);
+  if (code->kind() == Code::FUNCTION ||
+      code->kind() == Code::OPTIMIZED_FUNCTION)  {
+    base::OS::FPrint(perf_output_handle_, "%llx %x %.*s\n",
+                     reinterpret_cast<uint64_t>(code->instruction_start()),
+                     code->instruction_size(), length, name);
+    Flush(perf_output_handle_);
+  }
 }

The growth is now projected to be less than 20 Mbytes/day, which is fine for us. We're still looking at how useful the profile is with only these symbols, and so far it's looking ok. Thanks!
Labels: Area-Runtime Type-FeatureRequest Priority-Medium
Status: Assigned

Comment 20 by bgr...@netflix.com, Jul 28 2015

G'Day; I think the previous patch posted is ok. Can anyone get it included in v8? Would be appreciated, thanks!

Comment 21 by yun...@netflix.com, Jul 28 2015

+1, would really love to see this get integrated in V8
Project Member

Comment 22 by bugdroid1@chromium.org, Aug 14 2015

The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/9da3ab661fe7190fcb99bd99db30cf95913d3659

commit 9da3ab661fe7190fcb99bd99db30cf95913d3659
Author: ofrobots <ofrobots@google.com>
Date: Fri Aug 14 14:51:51 2015

New flag --perf_basic_prof_only_functions

Restricts linux perf-event code range reporting to functions only (i.e. on
stubs.) While this makes the gathered ticks less accurate, it reduces the
growth of the /tmp/perf-${pid}.map file.

BUG= v8:3453 
R=hablich@chromium.org,danno@chromium.org
LOG=N

Review URL: https://codereview.chromium.org/1292743002

Cr-Commit-Position: refs/heads/master@{#30179}

[modify] http://crrev.com/9da3ab661fe7190fcb99bd99db30cf95913d3659/src/flag-definitions.h
[modify] http://crrev.com/9da3ab661fe7190fcb99bd99db30cf95913d3659/src/log.cc

Status: Fixed (was: Assigned)
Closing as fixed.
Labels: Priority-2

Sign in to add a comment