Performance issue when performing additions of the form (a + b + c) | 0
Reported by
linb...@gmail.com,
Aug 29
|
|||||||
Issue descriptionUserAgent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36 Steps to reproduce the problem: 1. Run performance test: https://jsbench.me/2ojlf3ex5j/1 2. See results of tests A and B 3. A should be roughly 50% slower on Chrome. That figure is 2% in Firefox. What is the expected behavior? Both tests should have similar performance. What went wrong? The way in which d is added to t dramatically reduces performance if it is of the form `var t = (a + b + d) | 0;`, rather than `var t = (a + b) | 0; t = t + d | 0;` Did this work before? No Chrome version: 68.0.3440.106 Channel: stable OS Version: 6.1 (Windows 7, Windows Server 2008 R2) Flash Version: The code in question is a random number generator.
,
Aug 29
,
Aug 30
,
Aug 30
Could be a problem in jsbench code as I don't see any difference when I run A and B locally with 1000000 reps instead of 100.
,
Aug 30
How is your test set up? I tried building a test with 1m reps (see attachment) and it does show Test A is slower (but not as much as jsbench). As for jsben.ch, it gives identical results to jsperf.com (probably similar codebase). However my results show a consistent 8-12% decrease in performance rather than 50%, which is still significant I think. Test A: 2405.843017578125ms Test B: 2193.088134765625ms Test A: 2453.068115234375ms Test B: 2174.673095703125ms Test A: 2438.18798828125ms Test B: 2187.712890625ms Firefox for comparison: Test A: 2017ms sfc32_perftest.htm:29:1 Test B: 2077ms sfc32_perftest.htm:56:1 Test A: 2064ms sfc32_perftest.htm:29:1 Test B: 2094ms sfc32_perftest.htm:56:1 Test A: 2019ms sfc32_perftest.htm:29:1 Test B: 2081ms sfc32_perftest.htm:56:1
,
Aug 30
I had to re-post this comment as I've noticed some mistakes. You need to isolate the same-named function sfc32 or rename it. I see a 1-4% difference between runs which I think is negligible. Same results in node.js 10.6.0 Test A0: 44.732177734375ms Test B0: 43.89599609375ms Test A1: 39.85693359375ms Test B1: 38.78271484375ms Test A2: 38.970947265625ms Test B2: 41.472900390625ms
,
Aug 31
Keep in mind you're only testing subsequent calls to the function. Going that route, the speed gap is not as wide, but still measurable. Here's a jsbench.me that reflects your test scenerio: https://jsbench.me/9kjlgu4rv1/1. A is still slower. I think the numbers are accurate. benchmarkjs (what jsbench.me uses) calculates standard deviation and standard error, making outliers more pronounced. In our tests, we're just comparing timestamps. ---- I'm pretty sure I figured out the bug. The OR operator in `(a + b + d) | 0` does not get applied to `a`. It needs to be `(a + b | 0) + d | 0` for V8 to optimize things properly. With that simple change, it runs faster: https://jsbench.me/7vjlh9eq17/1 You can see the differing bytecode here (logged with --print-bytecode in Node v9.10): http://www.mergely.com/0k3Zdk6G/ Maybe that'll give clues. Perhaps the real question is, why do these bitwise operators speed things up in Chrome but not FF? They aren't even necessary in FF and actually slows it down slightly. I haven't seen any articles or posts about it. Whenever I port 32-bit algorithms from C to JS, they're always faster in Chrome when numbers are "converted to 32-bit" using bitwise operators. So I have to apply bitwise operators when using non-bitwise addition/subtraction, etc. to force it back to 32-bit mode.
,
Aug 31
Able to reproduce the issue on Mac 10.13.3, Win-10 and Ubuntu 14.04 using chrome reported version #68.0.3440.106 and latest canary #70.0.3537.0. This is a non-regression issue as it is observed from M60 old builds. Hence, marking it as untriaged to get more inputs from dev team. Thanks...!!
,
Aug 31
Looking at it again, I think that the loss in performance is due to (a + b + c) not being explicit enough in definition, causing V8 to store (a + b) in a register, discarding the bitwise number format that was previously defined. I reduced the test case to the essential components here: https://jsbench.me/86jli0i8j8/1 Basically, despite masking a,b,c,d with an OR 0, some of this information is discarded when calculating (a + b + d). a |= 0; b |= 0; c |= 0; d |= 0; var t = a + b + d | 0; c = t + (c | 0) | 0; d = 1 + (d | 0) | 0; This affects performance drastically in the test, but can be mitigated by being more explicit: var t = (a | 0) + (b | 0) + (d | 0) | 0; c = t + (c | 0) | 0; d = 1 + (d | 0) | 0; This can then be further optimized to this: c |= 0; d |= 0; var t = (a + b | 0) + d | 0; c = t + c | 0; d = 1 + d | 0; The `c |= 0; d |= 0;` ensures that c and d are 32-bit when being added. I'm not sure why (a + b | 0) works as opposed to (a | 0) + (b | 0). Probably because only the number being added to another number needs to be 32-bit.
,
Sep 13
From the conversation it sounds like this can be closed ....?
,
Sep 13
The issue has not been fixed as far as I know. I've only supplied additional info for tracking the cause of the bug. Can anyone working on V8 take alook?
,
Sep 13
,
Sep 14
I turned the https://jsbench.me/86jli0i8j8/1 into a simple repro case for d8: ====================================================================== var s = [0xCA566EB5, 0xD250CE2D, 0x6D2B79F5, 0x9E3779B9]; const foo = (function(a, b, c, d) { return function foo() { a |= 0; b |= 0; c |= 0; d |= 0; var t = a + b + d | 0; c = t + (c | 0) | 0; d = 1 + (d | 0) | 0; return t; } })(s[0], s[1], s[2], s[3]); const bar = (function(a, b, c, d) { return function bar() { d |= 0; var t = (a + b | 0) + d | 0; c = t + (c | 0) | 0; d = 1 + d | 0; return t; } })(s[0], s[1], s[2], s[3]); function test(f, n) { let result; for (let i = 0; i < n; ++i) { result = f(); } return result; } for (var i = 0; i < 1e3; ++i) { test(foo, 1e4); test(bar, 1e4); } for (const f of [bar, foo]) { console.time(f.name); test(f, 1e8); console.timeEnd(f.name); } ====================================================================== The difference between these two is within noise.
,
Sep 14
Created a version of the original reported issue as well.
======================================================================
var s = [0xCA566EB5, 0xD250CE2D, 0x6D2B79F5, 0x9E3779B9];
const foo = (function(a, b, c, d) {
return function foo() {
a >>>= 0;
b >>>= 0;
c >>>= 0;
d >>>= 0;
d = d + 1 | 0;
var t = (a + b + d) | 0; // DIFF LINE
a = b ^ b >>> 9;
b = c + (c << 3) | 0;
c = (c << 21 | c >>> 11);
c = c + t | 0;
return (t >>> 0) / 4294967296;
}
})(s[0], s[1], s[2], s[3]);
const bar = (function(a, b, c, d) {
return function bar() {
a >>>= 0;
b >>>= 0;
c >>>= 0;
d >>>= 0;
d = d + 1 | 0;
var t = (a + b) | 0; // DIFF LINE
a = b ^ b >>> 9;
b = c + (c << 3) | 0;
c = (c << 21 | c >>> 11);
t = t + d | 0; // DIFF LINE
c = c + t | 0;
return (t >>> 0) / 4294967296;
}
})(s[0], s[1], s[2], s[3]);
function test(f, n) {
let result;
for (let i = 0; i < n; ++i) {
result = f();
}
return result;
}
for (var i = 0; i < 1e3; ++i) {
test(foo, 1e4);
test(bar, 1e4);
}
for (const f of [bar, foo]) {
console.time(f.name);
test(f, 1e7);
console.timeEnd(f.name);
}
======================================================================
Performance difference is within 5% range (latest V8 ToT).
,
Sep 14
Changing the benchmark to use local variables for the actual computations fixes most of the performance issues:
======================================================================
var s = [0xCA566EB5, 0xD250CE2D, 0x6D2B79F5, 0x9E3779B9];
const foo = (function(a1, b1, c1, d1) {
return function foo() {
var a = a1 >>> 0;
var b = b1 >>> 0;
var c = c1 >>> 0;
var d = d1 >>> 0;
d = d + 1 | 0;
var t = (a + b + d) | 0; // DIFF LINE
a = b ^ b >>> 9;
b = c + (c << 3) | 0;
c = (c << 21 | c >>> 11);
c = c + t | 0;
a1 = a;
b1 = b;
c1 = c;
d1 = d;
return (t >>> 0) / 4294967296;
}
})(s[0], s[1], s[2], s[3]);
const bar = (function(a1, b1, c1, d1) {
return function bar() {
var a = a1 >>> 0;
var b = b1 >>> 0;
var c = c1 >>> 0;
var d = d1 >>> 0;
d = d + 1 | 0;
var t = (a + b) | 0; // DIFF LINE
a = b ^ b >>> 9;
b = c + (c << 3) | 0;
c = (c << 21 | c >>> 11);
t = t + d | 0; // DIFF LINE
c = c + t | 0;
a1 = a;
b1 = b;
c1 = c;
d1 = d;
return (t >>> 0) / 4294967296;
}
})(s[0], s[1], s[2], s[3]);
function test(f, n) {
let result;
for (let i = 0; i < n; ++i) {
result = f();
}
return result;
}
for (var i = 0; i < 1e3; ++i) {
test(foo, 1e4);
test(bar, 1e4);
}
for (const f of [bar, foo]) {
console.time(f.name);
test(f, 1e7);
console.timeEnd(f.name);
}
======================================================================
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by linb...@gmail.com
, Aug 29