New issue
Advanced search Search tips
Starred by 15 users

Issue metadata

Status: Fixed
Owner:
Closed: Dec 2013
HW: ----
NextAction: ----
OS: ----
Priority: 2
Type: Bug



Sign in to add a comment

If the function is always compiled through OSR it's optimized code is never cached

Project Member Reported by vegorov@chromium.org, Apr 17 2013 Back to list

Issue description

Right now InsertCodeIntoOptimizedCodeMap does not insert code produced for OSR into the cache. If I understand the reasoning correctly this is done because we would like to give a function chance to warm up normally next time and not be optimized via OSR.

This however penalizes short lived (one time use) functions that are always compiled only through OSR.

This for example causes huge variance and lower scores in results produced by Benchmark.js (and as a result on jsperf.com).

To reproduce download https://raw.github.com/bestiejs/benchmark.js/v1.0.0/benchmark.js and use the following test:

load("benchmark.js");

Benchmark.prototype.setup = function() {
  var numbers = [];
  for (var i = 0; i < 100; i++) {
    var val = (Math.random() * 100) | 0;
    numbers.push(val === 0 ? 1 : val);
  }
};

var suite = new Benchmark.Suite;

suite.add('smth', function() {
  var idx1 = (Math.random() * 100) | 0;
  var idx2 = (Math.random() * 100) | 0;
  numbers[idx1] + numbers[idx2];
})
.on('cycle', function(event) {
  print(String(event.target));
})
.run();

When you run it the following will be printed:

smth x 66,732,072 ops/sec ±24.71% (42 runs sampled)

Notice huge 20% variance.

This variance happens because harness compiles a new "measuring" closure for each sample. Each closure looks like:

function () {
  /* prepare */
  while(i--){ /* body */ }
  return /* elapsed time */
}

Compilation of these closures actually hit compilation cache so all of them are backed by the same SharedFunctionInfo. 

Each closure becomes hot and is optimized via OSR.

But optimized code is not added to the cache so we keep optimizing different closures and incrementing optimization counter on the SharedFunctionInfo. 

Eventually it hits limit and we refuse to optimize new instance of closure that leads to an unoptimized sample which is obviously much longer.

After a bit of time we start optimizing things again and cycle repeats.

Here is an excerpt from --trace-opt and elapsed time for a sample collected by harness (--trace-opt output was adjusted to print address of the SharedFunctionInfo in square brackets):

[marking  0x577c3980 for recompilation, reason: hot and stable, ICs with typeinfo: 24/26 (92%)]
[optimizing:  / 57f080c1 [4132e7d9] - took 0.044, 0.217, 0.058 ms]
 sample 0.076
[marking  0x57f26408 for recompilation, reason: hot and stable, ICs with typeinfo: 24/26 (92%)]
[optimizing:  / 57f26409 [4132e7d9] - took 0.044, 0.216, 0.059 ms]
 sample 0.076
[marking  0x57f78e50 for recompilation, reason: hot and stable, ICs with typeinfo: 24/26 (92%)]
[optimizing:  / 57f78e51 [4132e7d9] - took 0.044, 0.216, 0.060 ms]
 sample 0.075
[marking  0x57fcb904 for recompilation, reason: hot and stable, ICs with typeinfo: 24/26 (92%)]
[optimizing:  / 577080c1 [4132e7d9] - took 0.048, 0.233, 0.061 ms]
 sample 0.078
[marking  0x5772e788 for recompilation, reason: hot and stable, ICs with typeinfo: 24/26 (92%)]
[optimizing:  / 5772e789 [4132e7d9] - took 0.105, 0.267, 0.112 ms]
 sample 0.092
[marking  0x577815cc for recompilation, reason: hot and stable, ICs with typeinfo: 24/26 (92%)]
[optimizing:  / 577815cd [4132e7d9] - took 0.044, 0.211, 0.061 ms]
 sample 0.091
[marking  0x577d47a0 for recompilation, reason: hot and stable, ICs with typeinfo: 24/26 (92%)]
[disabled optimization for  [4132e7d9], reason: optimized too many times]
 sample 0.392
 sample 0.39
[marking  0x413b3f2c for recompilation, reason: hot and stable, ICs with typeinfo: 24/26 (92%)]
[optimizing:  / 413b3f2d [4132e7d9] - took 0.105, 0.239, 0.079 ms]
 sample 0.297
[marking  0x5767f660 for recompilation, reason: hot and stable, ICs with typeinfo: 24/26 (92%)]
[optimizing:  / 5767f661 [4132e7d9] - took 0.045, 0.217, 0.060 ms]
 sample 0.077
  
I did two experiments:

1) bumping the optimization counter threshold:

$ out/ia32.release/d8 --max-opt-count=100000 test.js 
smth x 109,226,981 ops/sec ±2.23% (68 runs sampled)

2) allowing code produced with an OSR entry to join the map

$ out/ia32.release/d8 test.js 
smth x 115,338,811 ops/sec ±1.55% (70 runs sampled)

as you can see both of them solve the issue.

I think we should count how many times function was optimized through an OSR and if it happens several times put it into the cache.

Alternatively we can always put OSRed code into the cache. I don't really think that code quality will be an issue here.


 

Comment 1 by danno@chromium.org, Apr 18 2013

Owner: yangguo@chromium.org
Status: Assigned
Status: Fixed
OSR code is now cached in the optimized code map.
Labels: Priority-2

Sign in to add a comment