Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Starred by 2 users
Status: Fixed
Owner:
Closed: Oct 2
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment
Miserable performance when storing booleans in typed arrays
Reported by kevin.g...@gmail.com, Sep 9 2013 Back to list
UserAgent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36

Steps to reproduce the problem:
1. Create a test case that stores booleans in a JS Array
2. Measure performance
3. Replace JS array with a typed array (i.e. Uint8Array) and appropriate type conversions
4. Measure performance

What is the expected behavior?
The typed array version should not perform dramatically slower

What went wrong?
I've attached two test cases which demonstrate the problem.

ByteVsBoolArray.zip contains a test case that runs the same algorithm using two array types - a JS array containing Booleans, and a Uint8Array containing 0/1. In Chrome (both release and canary) both test cases deliver equivalent performance, which is awesome! In Firefox the JS array case was dramatically slower, so I updated my compiler to use Uint8Array to store booleans.

ByteVsBoolArray2.zip contains the same test case as produced by the updated compiler. Now, both test cases run incredibly fast in Firefox, and run miserably slow in Chrome. Interestingly the test case that originally used a boolean array - Program_SieveBools - is considerably slower in Chrome despite the fact that it now uses a byte array.

My guess is that either the type hint generated by the compiler - (isNotPrime[candidate]) | 0) - or the use of 'true' and 'false' - isNotPrime[multiple] = true; along with Array.Erase - is causing Crankshaft to generate bad code or fail to optimize the test cases equivalently.

Did this work before? No 

Chrome version: 29.0.1547.66  Channel: stable
OS Version: 6.1 (Windows 7, Windows Server 2008 R2)
Flash Version: Shockwave Flash 11.8 r800

It seems reasonable to expect storing true/false into typed arrays to be efficient and coerce to 1/0 appropriately; doing this should be a cheap and consistent win across browsers in terms of memory usage if you're storing lots of boolean data. That it somehow isn't is surprising.

The fact that *both* test cases get slower, but by different amounts, when they were originally both blazing fast suggests that some sort of ICs or type information are getting polluted here and causing everything to fall apart.

Timing data:

ByteVsBoolArray in Chrome Version 29.0.1547.66 m
Sieve Bytes: 01948.00ms
Sieve Bools: 01931.00ms

ByteVsBoolArray2 in Chrome Version 29.0.1547.66 m
Sieve Bytes: 03633.00ms
Sieve Bools: 10158.00ms

ByteVsBoolArray in Firefox 25.0a2 (2013-09-07)
Sieve Bytes: 02570.00ms
Sieve Bools: 03625.00ms

ByteVsBoolArray2 in Firefox 25.0a2 (2013-09-07)
Sieve Bytes: 00286.00ms
Sieve Bools: 00321.00ms
 
ByteVsBoolArray.zip
168 KB Download
ByteVsBoolArray2.zip
172 KB Download
I tested my theory about what's wrong here and it appears to only be partially correct.

Fixing the uses of 'true' and 'false' (changing JSIL.Core.js line ~7232 to use '0' as default value for boolean, and Sieve.cs.js to store 1 into the array instead of 'true') improves the test case performance in Chrome but it is still miserable for no obvious reason:

Sieve Bytes: 01511.00ms
Sieve Bools: 05339.00ms

Removing the type hint (| 0) in the if statement doesn't seem to make a difference, so it's probably fine.

Changing which function is warmed in Program_Main (to compute the basis result and print it out) seems to have no impact on performance.
Cc: danno@chromium.org
Labels: -Cr-Content-JavaScript Cr-Blink-JavaScript
Status: Available
Comment 3 by danno@chromium.org, Sep 10 2013
Owner: dslomov@chromium.org
Status: Assigned
Labels: Cr-Blink-JavaScript-Language
Labels: Hotlist-Recharge
This issue likely requires triage.  The current issue owner may be inactive (i.e. hasn't fixed an issue in the last 30 days or commented in this particular issue in the last 90 days).  Thanks for helping out!

-Anthony
Cc: jochen@chromium.org
Owner: bmeu...@chromium.org
This is still happening. Use test case two to reproduce.

Labels: -Cr-Blink-JavaScript-Language Cr-Blink-JavaScript-Compiler
Labels: Performance
Cc: bmeu...@chromium.org tebbi@chromium.org yangguo@chromium.org
Labels: -OS-Windows OS-All
Owner: petermarshall@chromium.org
Hey Peter, please have a look at this when you have time. Close if it's no longer relevant.
Status: Started
Performing the fix suggested by Kevin (changing JSIL.Core.js line ~7232 to use '0' as default value for boolean, and Sieve.cs.js to store 1 into the array instead of 'true') now gives roughly the same performance as Firefox.

Another observation is that it is also quicker just to allocate another Uint8Array rather than clear the current one manually. The allocation path initializes the contents to 0, usually using memset or similar which is much faster than individual 1 byte sets using JS semantics. This probably increases memory usage somewhat and GC overhead, but I think it still comes out better.

I can reproduce the roughly 10x difference locally with the two attached simplified microbenchmarks. ta_set_bool.js runs in ~2.8 seconds, ta_set_int.js runs in ~0.26. I'll look into why that happens a bit more - we shouldn't expect to pay such a high price just for a bool -> int conversion.
ta_set_bool.js
234 bytes View Download
ta_set_int.js
227 bytes View Download
This happens because we get a StoreIC miss for storing booleans into typed arrays (well, for anything that isn't already a number). We end up in the runtime for every store/conversion. We could add this case to the generic handler (KeyedStoreGenericAssembler::EmitGenericElementStore). Initial prototyping shows that this will be roughly 2x slower for true/false compared to 0/1 (i.e. 5x faster than currently.)
Labels: -Performance Performance-Browser
Any updates?
Labels: -OS-All
Owner: bmeu...@chromium.org
Thanks to Peter for all the investigation. I've put this into a simple micro-benchmark:

=============< bench-typed-array-bool.js >===========================
if (typeof console === 'undefined') console = {log:print};

const arr = new Uint8Array(2000);

function typedArrayStoreBool() {
  for (let j = 0; j < arr.length; j++) {
    arr[j] = true;
  }
  for (let j = 0; j < arr.length; j++) {
    arr[j] = false;
  }
}

function typedArrayStoreInt() {
  for (let j = 0; j < arr.length; j++) {
    arr[j] = 1;
  }
  for (let j = 0; j < arr.length; j++) {
    arr[j] = 0;
  }
}

var TESTS = [
    typedArrayStoreBool,
    typedArrayStoreInt
];
var n = 1e4;

function test(fn) {
  var result;
  for (var i = 0; i < n; ++i) result = fn();
  return result;
}

for (var j = 0; j < TESTS.length; ++j) {
  test(TESTS[j]);
}

for (var j = 0; j < TESTS.length; ++j) {
  var startTime = Date.now();
  test(TESTS[j]);
  console.log(TESTS[j].name + ':', (Date.now() - startTime), 'ms.');
}
=====================================================================

Running on ToT we see:

=====================================================================
typedArrayStoreBool: 2975 ms.
typedArrayStoreInt: 44 ms.
=====================================================================

It seems to me that changing the ICs to stay MONOMORPHIC for Oddballs and letting TurboFan truncate them to Numbers appropriately would do the trick and make both code paths same performance-wise.
Project Member Comment 14 by bugdroid1@chromium.org, Oct 2
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/e65c0b257898ec1773ca01235c4d8cf02abbdbbd

commit e65c0b257898ec1773ca01235c4d8cf02abbdbbd
Author: Benedikt Meurer <bmeurer@chromium.org>
Date: Mon Oct 02 09:21:37 2017

[turbofan] Don't go MEGAMORPHIC when storing oddballs to typed arrays.

The KEYED_STORE_IC was never able to deal with stores to typed arrays
where the value being stored is not already a Number (i.e. either a Smi
or a HeapNumber). By extending it to also handle Oddballs (i.e. true,
false, undefined and null) and teaching TurboFan to also perform the
appropriate check plus the truncation to Number, we can easily support
this use case as well.

On the micro-benchmark in the bug report, we go from

  typedArrayStoreBool: 2975 ms.
  typedArrayStoreInt: 44 ms.

to

  typedArrayStoreBool: 43 ms.
  typedArrayStoreInt: 44 ms.

so that's roughly a 70x performance boost.

Bug:  chromium:287773 
Change-Id: I227419aeabc3f5b6793aa280a95448d03ac2f2dd
Reviewed-on: https://chromium-review.googlesource.com/691731
Commit-Queue: Benedikt Meurer <bmeurer@chromium.org>
Reviewed-by: Michael Stanton <mvstanton@chromium.org>
Reviewed-by: Jaroslav Sevcik <jarin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#48257}
[modify] https://crrev.com/e65c0b257898ec1773ca01235c4d8cf02abbdbbd/src/code-stub-assembler.cc
[modify] https://crrev.com/e65c0b257898ec1773ca01235c4d8cf02abbdbbd/src/compiler/js-native-context-specialization.cc

Status: Fixed
Sign in to add a comment