New issue
Advanced search Search tips

Issue 878742 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Mac
Pri: 2
Type: Bug


Show other hotlists

Hotlists containing this issue:
Hotlist-4


Sign in to add a comment

Performance issue when performing additions of the form (a + b + c) | 0

Reported by linb...@gmail.com, Aug 29

Issue description

UserAgent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36

Steps to reproduce the problem:
1. Run performance test: https://jsbench.me/2ojlf3ex5j/1
2. See results of tests A and B
3. A should be roughly 50% slower on Chrome. That figure is 2% in Firefox.

What is the expected behavior?
Both tests should have similar performance. 

What went wrong?
The way in which d is added to t dramatically reduces performance if it is of the form `var t = (a + b + d) | 0;`,
rather than `var t = (a + b) | 0; t = t + d | 0;`

Did this work before? No 

Chrome version: 68.0.3440.106  Channel: stable
OS Version: 6.1 (Windows 7, Windows Server 2008 R2)
Flash Version: 

The code in question is a random number generator.
 
Note that the the "unsigned-right-shift zero" and "OR zero" operations are to keep numbers 32-bit, which also dramatically improve performance in Chrome when doing these calculations.
Labels: Needs-Triage-M68
Components: -Blink Blink>JavaScript
Could be a problem in jsbench code as I don't see any difference when I run A and B locally with 1000000 reps instead of 100.
How is your test set up? I tried building a test with 1m reps (see attachment) and it does show Test A is slower (but not as much as jsbench).

As for jsben.ch, it gives identical results to jsperf.com (probably similar codebase). However my results show a consistent 8-12% decrease in performance rather than 50%, which is still significant I think.


Test A: 2405.843017578125ms
Test B: 2193.088134765625ms

Test A: 2453.068115234375ms
Test B: 2174.673095703125ms

Test A: 2438.18798828125ms
Test B: 2187.712890625ms

Firefox for comparison:

Test A: 2017ms sfc32_perftest.htm:29:1
Test B: 2077ms sfc32_perftest.htm:56:1

Test A: 2064ms sfc32_perftest.htm:29:1
Test B: 2094ms sfc32_perftest.htm:56:1

Test A: 2019ms sfc32_perftest.htm:29:1
Test B: 2081ms sfc32_perftest.htm:56:1 
sfc32_perftest.htm
1.7 KB View Download

Comment 6 Deleted

Comment 7 Deleted

I had to re-post this comment as I've noticed some mistakes.
You need to isolate the same-named function sfc32 or rename it.

I see a 1-4% difference between runs which I think is negligible.
Same results in node.js 10.6.0

Test A0: 44.732177734375ms
Test B0: 43.89599609375ms

Test A1: 39.85693359375ms
Test B1: 38.78271484375ms

Test A2: 38.970947265625ms
Test B2: 41.472900390625ms

test.html
1.8 KB View Download
Keep in mind you're only testing subsequent calls to the function. Going that route, the speed gap is not as wide, but still measurable. Here's a jsbench.me that reflects your test scenerio: https://jsbench.me/9kjlgu4rv1/1. A is still slower.

I think the numbers are accurate. benchmarkjs (what jsbench.me uses) calculates standard deviation and standard error, making outliers more pronounced. In our tests, we're just comparing timestamps. 

----

I'm pretty sure I figured out the bug. The OR operator in `(a + b + d) | 0` does not get applied to `a`. It needs to be `(a + b | 0) + d | 0` for V8 to optimize things properly. With that simple change, it runs faster: https://jsbench.me/7vjlh9eq17/1

You can see the differing bytecode here (logged with --print-bytecode in Node v9.10): http://www.mergely.com/0k3Zdk6G/
Maybe that'll give clues.

Perhaps the real question is, why do these bitwise operators speed things up in Chrome but not FF? They aren't even necessary in FF and actually slows it down slightly. I haven't seen any articles or posts about it. Whenever I port 32-bit algorithms from C to JS, they're always faster in Chrome when numbers are "converted to 32-bit" using bitwise operators. So I have to apply bitwise operators when using non-bitwise addition/subtraction, etc. to force it back to 32-bit mode. 
Labels: Triaged-ET Target-70 M-70 FoundIn-70 OS-Linux OS-Mac
Status: Untriaged (was: Unconfirmed)
Able to reproduce the issue on Mac 10.13.3, Win-10 and Ubuntu 14.04 using chrome reported version #68.0.3440.106 and latest canary #70.0.3537.0.
This is a non-regression issue as it is observed from M60 old builds. 

Hence, marking it as untriaged to get more inputs from dev team.

Thanks...!!
Looking at it again, I think that the loss in performance is due to (a + b + c) not being explicit enough in definition, causing V8 to store (a + b) in a register, discarding the bitwise number format that was previously defined. 

I reduced the test case to the essential components here: https://jsbench.me/86jli0i8j8/1

Basically, despite masking a,b,c,d with an OR 0, some of this information is discarded when calculating (a + b + d).

      a |= 0; b |= 0; c |= 0; d |= 0;
      var t = a + b + d | 0; 
      c = t + (c | 0) | 0;
      d = 1 + (d | 0) | 0;

This affects performance drastically in the test, but can be mitigated by being more explicit:

      var t = (a | 0) + (b | 0) + (d | 0) | 0; 
      c = t + (c | 0) | 0;
      d = 1 + (d | 0) | 0;

This can then be further optimized to this:

      c |= 0; d |= 0;
      var t = (a + b | 0) + d | 0; 
      c = t + c | 0;
      d = 1 + d | 0;

The `c |= 0; d |= 0;` ensures that c and d are 32-bit when being added.  I'm not sure why (a + b | 0) works as opposed to (a | 0) + (b | 0). Probably because only the number being added to another number needs to be 32-bit.


Cc: hablich@chromium.org
Status: WontFix (was: Untriaged)
From the conversation it sounds like this can be closed ....?
The issue has not been fixed as far as I know. I've only supplied additional info for tracking the cause of the bug.

Can anyone working on V8 take alook?
Components: -Blink>JavaScript Blink>JavaScript>Compiler
Status: Available (was: WontFix)
Owner: jarin@chromium.org
Status: Assigned (was: Available)
I turned the https://jsbench.me/86jli0i8j8/1 into a simple repro case for d8:

======================================================================
var s = [0xCA566EB5, 0xD250CE2D, 0x6D2B79F5, 0x9E3779B9];

const foo = (function(a, b, c, d) {
  return function foo() {
    a |= 0;
    b |= 0;
    c |= 0;
    d |= 0;
    var t = a + b + d | 0;
    c = t + (c | 0) | 0;
    d = 1 + (d | 0) | 0;
    return t;
  }
})(s[0], s[1], s[2], s[3]);

const bar = (function(a, b, c, d) {
  return function bar() {
    d |= 0;
    var t = (a + b | 0) + d | 0;
    c = t + (c | 0) | 0;
    d = 1 + d | 0;
    return t;
  }
})(s[0], s[1], s[2], s[3]);

function test(f, n) {
  let result;
  for (let i = 0; i < n; ++i) {
    result = f();
  }
  return result;
}
for (var i = 0; i < 1e3; ++i) {
  test(foo, 1e4);
  test(bar, 1e4);
}

for (const f of [bar, foo]) {
  console.time(f.name);
  test(f, 1e8);
  console.timeEnd(f.name);
}
======================================================================

The difference between these two is within noise. 
Created a version of the original reported issue as well.

======================================================================
var s = [0xCA566EB5, 0xD250CE2D, 0x6D2B79F5, 0x9E3779B9];

const foo = (function(a, b, c, d) {
  return function foo() {
    a >>>= 0;
    b >>>= 0;
    c >>>= 0;
    d >>>= 0;
    d = d + 1 | 0;
    var t = (a + b + d) | 0;  // DIFF LINE
    a = b ^ b >>> 9;
    b = c + (c << 3) | 0;
    c = (c << 21 | c >>> 11);
    c = c + t | 0;
    return (t >>> 0) / 4294967296;
  }
})(s[0], s[1], s[2], s[3]);

const bar = (function(a, b, c, d) {
  return function bar() {
    a >>>= 0;
    b >>>= 0;
    c >>>= 0;
    d >>>= 0;
    d = d + 1 | 0;
    var t = (a + b) | 0;  // DIFF LINE
    a = b ^ b >>> 9;
    b = c + (c << 3) | 0;
    c = (c << 21 | c >>> 11);
    t = t + d | 0;  // DIFF LINE
    c = c + t | 0;
    return (t >>> 0) / 4294967296;
  }
})(s[0], s[1], s[2], s[3]);

function test(f, n) {
  let result;
  for (let i = 0; i < n; ++i) {
    result = f();
  }
  return result;
}
for (var i = 0; i < 1e3; ++i) {
  test(foo, 1e4);
  test(bar, 1e4);
}

for (const f of [bar, foo]) {
  console.time(f.name);
  test(f, 1e7);
  console.timeEnd(f.name);
}
======================================================================

Performance difference is within 5% range (latest V8 ToT).
Changing the benchmark to use local variables for the actual computations fixes most of the performance issues:

======================================================================
var s = [0xCA566EB5, 0xD250CE2D, 0x6D2B79F5, 0x9E3779B9];

const foo = (function(a1, b1, c1, d1) {
  return function foo() {
    var a = a1 >>> 0;
    var b = b1 >>> 0;
    var c = c1 >>> 0;
    var d = d1 >>> 0;
    d = d + 1 | 0;
    var t = (a + b + d) | 0;  // DIFF LINE
    a = b ^ b >>> 9;
    b = c + (c << 3) | 0;
    c = (c << 21 | c >>> 11);
    c = c + t | 0;
    a1 = a;
    b1 = b;
    c1 = c;
    d1 = d;
    return (t >>> 0) / 4294967296;
  }
})(s[0], s[1], s[2], s[3]);

const bar = (function(a1, b1, c1, d1) {
  return function bar() {
    var a = a1 >>> 0;
    var b = b1 >>> 0;
    var c = c1 >>> 0;
    var d = d1 >>> 0;
    d = d + 1 | 0;
    var t = (a + b) | 0;  // DIFF LINE
    a = b ^ b >>> 9;
    b = c + (c << 3) | 0;
    c = (c << 21 | c >>> 11);
    t = t + d | 0;  // DIFF LINE
    c = c + t | 0;
    a1 = a;
    b1 = b;
    c1 = c;
    d1 = d;
    return (t >>> 0) / 4294967296;
  }
})(s[0], s[1], s[2], s[3]);

function test(f, n) {
  let result;
  for (let i = 0; i < n; ++i) {
    result = f();
  }
  return result;
}
for (var i = 0; i < 1e3; ++i) {
  test(foo, 1e4);
  test(bar, 1e4);
}

for (const f of [bar, foo]) {
  console.time(f.name);
  test(f, 1e7);
  console.timeEnd(f.name);
}
======================================================================

Sign in to add a comment