New issue
Advanced search Search tips
Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Feb 10
Cc:
Components:
HW: All
NextAction: ----
OS: All
Priority: 1
Type: Bug

Blocking:
issue chromium:808503



Sign in to add a comment

Gracefully handle out-of-bounds loads

Project Member Reported by bmeu...@chromium.org, Nov 2 2017

Issue description

Currently V8 doesn't deal well with out-of-bounds element loads in the KeyedLoadIC, but rather immediately transitions the IC to MEGAMORPHIC state (except for Strings which were fixed recently with  http://crbug.com/v8/7014 ). This was sort of a good strategy in the past when we actively advised developers to be careful with out-of-bounds accesses, however we still see this pattern a lot in the wild. One especially interesting version of the pattern is a loop like this:

while ((element = array[i++]) != null) {
  ...
}

This loop pumps all entries from the array (assuming that null or undefined are not valid entries) and stops on the first out-of-bounds read. The KeyedLoadIC here will go MEGAMORPHIC right when the loop finished for the first time. This pattern is popular for example in jQuery, as described in various places:

- http://ripsawridge.github.io/articles/blink-mysterium/
- https://github.com/jquery/jquery/pull/3769
- https://github.com/jquery/sizzle/pull/407

Ideally we'd convince everyone to not do this. But of course this is not possible, and due to the amazing self-healing in JavaScript, it's really hard to spot and avoid these kinds of bugs. So instead we should mitigate the problem and deal with these bugs gracefully, avoiding unnecessary performance cliffs.

Here's a micro-benchmark that illustrates the problem:

===========< bench-oob.js >====================================
"use strict";

function testInBounds(...args) {
  return args[0];
}
function testOutOfBounds(...args) {
  return args[1];
}

const N = 1e7;
const F = [testInBounds, testOutOfBounds];

function test(f, n) {
  var result;
  for (var i = 0; i < n; ++i) {
    result = f(1);
  }
  return result;
}

// Warmup
test(x => x, 50);
for (var i = 0; i < 10; ++i) {
  for (var f of F) test(f, 50);
}

// Measure
for (var f of F) {
  var start = Date.now();
  var o = test(f, N);
  var end = Date.now();
  print(f.name + ": " + (end - start) + " ms.");
}
===============================================================

On V8 ToT we see:

===============================================================
$ ./d8-master bench-oob.js
testInBounds: 106 ms.
testOutOfBounds: 288 ms.
===============================================================

So almost 3x worse for this very simple example.

We also hit this problem in the six-speed rest benchmarks:

- https://github.com/mozilla/arewefastyet/blob/master/benchmarks/six-speed/tests/rest.es5
- https://github.com/mozilla/arewefastyet/blob/master/benchmarks/six-speed/tests/rest.es6

Although these are arguably very bad tests.
 
Cc: mathias@chromium.org
Project Member

Comment 2 by bugdroid1@chromium.org, Nov 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/b7168573ed011113746873fd92b71e8bcc9dddb1

commit b7168573ed011113746873fd92b71e8bcc9dddb1
Author: Benedikt Meurer <bmeurer@chromium.org>
Date: Fri Nov 03 07:35:14 2017

[turbofan] Generalized OOB support for KeyedLoadIC.

This extends the support in TurboFan and the ICs for OOB loads to also
apply to typed arrays and receivers whose prototype chain is protected
by the "no elements" protector (aka the Array protector). TurboFan will
generate code to materialize undefined instead when it sees a load that
has the OOB bit set and add an appropriate code dependency on the global
protector. For typed arrays it doesn't even need to check the global
protector since elements are never looked up in the prototype chain
for typed arrays.

In the simple micro-benchmark from the bug we go from

  testInBounds: 103 ms.
  testOutOfBounds: 289 ms.

to

  testInBounds: 103 ms.
  testOutOfBounds: 102 ms.

which fixes the 3x slowdown and thus addresses the performance cliff. In
general it's still beneficial to make sure that you don't access out of
bounds, especially once we introduce a bounds check elimination pass to
TurboFan.

This also seems to improve the jQuery benchmark on the Speedometer test
suite by like 1-2% on average. And the SixSpeed rest benchmarks go from

  rest-es5: 25 ms.
  rest-es6: 23 ms.

to

  rest-es5: 6 ms.
  rest-es6: 4 ms.

so a solid 5.7x improvement there.

Bug: v8:6936,  v8:7014 ,  v8:7027 
Change-Id: Ie99699c69cc40057512e72fd40ae28107216c423
Reviewed-on: https://chromium-review.googlesource.com/750089
Commit-Queue: Benedikt Meurer <bmeurer@chromium.org>
Reviewed-by: Benedikt Meurer <bmeurer@chromium.org>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Cr-Commit-Position: refs/heads/master@{#49095}
[modify] https://crrev.com/b7168573ed011113746873fd92b71e8bcc9dddb1/src/code-stub-assembler.cc
[modify] https://crrev.com/b7168573ed011113746873fd92b71e8bcc9dddb1/src/compiler/js-native-context-specialization.cc
[modify] https://crrev.com/b7168573ed011113746873fd92b71e8bcc9dddb1/src/compiler/js-native-context-specialization.h
[modify] https://crrev.com/b7168573ed011113746873fd92b71e8bcc9dddb1/src/feedback-vector.cc
[modify] https://crrev.com/b7168573ed011113746873fd92b71e8bcc9dddb1/src/ic/accessor-assembler.cc
[modify] https://crrev.com/b7168573ed011113746873fd92b71e8bcc9dddb1/src/ic/handler-configuration-inl.h
[modify] https://crrev.com/b7168573ed011113746873fd92b71e8bcc9dddb1/src/ic/handler-configuration.cc
[modify] https://crrev.com/b7168573ed011113746873fd92b71e8bcc9dddb1/src/ic/handler-configuration.h
[modify] https://crrev.com/b7168573ed011113746873fd92b71e8bcc9dddb1/src/ic/ic.cc
[modify] https://crrev.com/b7168573ed011113746873fd92b71e8bcc9dddb1/src/ic/ic.h

Cc: jkummerow@chromium.org
It seems that we now have two bits in the LoadHandler that essentially express the same: "convert hole to undefined" and "allow out of bounds" both mean that the prototype chain needs to be checked and then undefined is returned. I guess we could unify those?

WDYT verwaest@?
Status: Fixed (was: Started)
Project Member

Comment 5 by bugdroid1@chromium.org, Nov 4 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/fd150c79884f4ae522dd5c80350e90f4d8a7a2ba

commit fd150c79884f4ae522dd5c80350e90f4d8a7a2ba
Author: Benedikt Meurer <bmeurer@chromium.org>
Date: Sat Nov 04 12:06:31 2017

[turbofan] Generate the correct bounds when the array protector isn't valid.

The condition for bounds check generation was not in sync with the
condition that was used for the actual access, which lead to invalid
memory accesses when the array protector was invalid.

Tbr: tebbi@chromium.org
Bug:  chromium:781506 ,  chromium:781494 ,  chromium:781457 ,  chromium:781285 ,  chromium:781381 ,  chromium:781380 , v8:6936,  v8:7014 ,  v8:7027 
Change-Id: Ia5b2ad02940292572ed9b37abd3f9ffaa6d7a26b
Reviewed-on: https://chromium-review.googlesource.com/753590
Reviewed-by: Benedikt Meurer <bmeurer@chromium.org>
Commit-Queue: Benedikt Meurer <bmeurer@chromium.org>
Cr-Commit-Position: refs/heads/master@{#49124}
[modify] https://crrev.com/fd150c79884f4ae522dd5c80350e90f4d8a7a2ba/src/compiler/js-native-context-specialization.cc
[add] https://crrev.com/fd150c79884f4ae522dd5c80350e90f4d8a7a2ba/test/mjsunit/regress/regress-crbug-781506-1.js
[add] https://crrev.com/fd150c79884f4ae522dd5c80350e90f4d8a7a2ba/test/mjsunit/regress/regress-crbug-781506-2.js
[add] https://crrev.com/fd150c79884f4ae522dd5c80350e90f4d8a7a2ba/test/mjsunit/regress/regress-crbug-781506-3.js

Project Member

Comment 6 by bugdroid1@chromium.org, Nov 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/a9a1671345fa28c23856cc15e73ed24af400dd2a

commit a9a1671345fa28c23856cc15e73ed24af400dd2a
Author: Benedikt Meurer <bmeurer@google.com>
Date: Mon Nov 20 09:43:35 2017

[cleanup] Rename "array protector" to "no elements protector".

The "array protector" now guards the Object.prototype, the
Array.prototype and the String.prototype, so the name was a
bit misleading nowadays. So the new name "no elements protector"
was chosen.

Bug: v8:6936,  v8:7014 ,  v8:7027 
Change-Id: I9a9d7caa2caf0ac9e78cc6658de2f0506970dfa2
Reviewed-on: https://chromium-review.googlesource.com/778162
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Reviewed-by: Camillo Bruni <cbruni@chromium.org>
Commit-Queue: Benedikt Meurer <bmeurer@chromium.org>
Cr-Commit-Position: refs/heads/master@{#49471}
[modify] https://crrev.com/a9a1671345fa28c23856cc15e73ed24af400dd2a/src/builtins/builtins-array-gen.cc
[modify] https://crrev.com/a9a1671345fa28c23856cc15e73ed24af400dd2a/src/builtins/builtins-call-gen.cc
[modify] https://crrev.com/a9a1671345fa28c23856cc15e73ed24af400dd2a/src/code-stub-assembler.cc
[modify] https://crrev.com/a9a1671345fa28c23856cc15e73ed24af400dd2a/src/code-stub-assembler.h
[modify] https://crrev.com/a9a1671345fa28c23856cc15e73ed24af400dd2a/src/compiler/js-builtin-reducer.cc
[modify] https://crrev.com/a9a1671345fa28c23856cc15e73ed24af400dd2a/src/compiler/js-call-reducer.cc
[modify] https://crrev.com/a9a1671345fa28c23856cc15e73ed24af400dd2a/src/compiler/js-native-context-specialization.cc
[modify] https://crrev.com/a9a1671345fa28c23856cc15e73ed24af400dd2a/src/elements.cc
[modify] https://crrev.com/a9a1671345fa28c23856cc15e73ed24af400dd2a/src/heap/heap.h
[modify] https://crrev.com/a9a1671345fa28c23856cc15e73ed24af400dd2a/src/heap/setup-heap-internal.cc
[modify] https://crrev.com/a9a1671345fa28c23856cc15e73ed24af400dd2a/src/ic/accessor-assembler.cc
[modify] https://crrev.com/a9a1671345fa28c23856cc15e73ed24af400dd2a/src/ic/ic.cc
[modify] https://crrev.com/a9a1671345fa28c23856cc15e73ed24af400dd2a/src/isolate.cc
[modify] https://crrev.com/a9a1671345fa28c23856cc15e73ed24af400dd2a/src/isolate.h
[modify] https://crrev.com/a9a1671345fa28c23856cc15e73ed24af400dd2a/src/objects.cc
[modify] https://crrev.com/a9a1671345fa28c23856cc15e73ed24af400dd2a/test/cctest/test-api.cc
[modify] https://crrev.com/a9a1671345fa28c23856cc15e73ed24af400dd2a/tools/v8heapconst.py

Blocking: chromium:808503
I personally feel like this issue is not yet fixed.

The deopt can still hit very hard.

```js
function deopt(val, offset) {
  if (val[offset] === undefined)
    throw new Error('You shall not pass')

  return val[offset++] +
    val[offset++] * 2 ** 8 +
    val[offset++] * 2 ** 16 +
    val[offset] * 2 ** 24
}

function bench(fn, input) {
  console.time(`Runtime ${fn.name}`)
  for (var i = 0; i < 1e8; i++) {
    fn(input, 0)
  }
  console.timeEnd(`Runtime ${fn.name}`)
}

function fail(fn, val) {
  try {
    fn(input, val)
  } catch (e) {}
}

const input = Buffer.allocUnsafe(8)
intput[5] = 15

[deopt, deopt, deopt, deopt].forEach((fn, i) => {
  if (i === 1) {
    fail(deopt, -1)
  }
  bench(fn, input)
})
```

This yields the following:

Runtime deopt: 69.718ms
Runtime deopt: 1095.463ms
Runtime deopt: 1087.021ms
Runtime deopt: 1111.286ms

It highly depends on the used input to produce the OOB. But using multiple "good values" will also badly deopt.
#8: Beware of microbenchmarks! In the fast case, the optimizing compiler can actually drop all the array accesses, because the values aren't used for anything. So you're comparing the difference between doing nothing and doing *something*.

Benedikt: it seems that negative indices still cause the KeyedLoadIC to transition to megamorphic state. That makes sense for JSArrays, where negative indices have to be converted to a string and handled as a named property load. For TypedArrays, we could make them take the non-generic OOB path instead (if we care about this case -- I'm not sure).

If the "-1" index is replaced by a positive OOB access (e.g. "100"), then the performance delta shrinks to about 2x, which is probably unavoidable, as potentially-OOB accesses always must perform more work than in-bounds accesses. In real code, the difference should be smaller, but in a microbenchmark 2x seems plausible.

Comment 10 Deleted

The result when fixing the benchmark is still significant:

```js
function deopt(val, offset) {
  if (val[offset] === undefined)
    throw new Error('You shall not pass')

  return val[offset++] +
    val[offset++] * 2 ** 8 +
    val[offset++] * 2 ** 16 +
    val[offset] * 2 ** 24
}

function bench(fn, input) {
  var tmp = 0
  console.time(`Runtime ${fn.name}`)
  for (var i = 0; i < 1e8; i++) {
    tmp += fn(input, i%5)
  }
  console.timeEnd(`Runtime ${fn.name}`)
  return tmp
}

function fail(fn, val) {
  try {
    fn(input, val)
  } catch (e) {}
}

const input = new Uint8Array(8)

[deopt, deopt, deopt, deopt].forEach((fn, i) => {
  if (i === 1) {
    fail(deopt, '0')
  }
  bench(fn, input)
})
```

Runtime deopt: 627.946ms
Runtime deopt: 1820.517ms
Runtime deopt: 1813.852ms
Runtime deopt: 1830.815ms

And without fail call:

Runtime deopt: 648.320ms
Runtime deopt: 598.450ms
Runtime deopt: 572.704ms
Runtime deopt: 573.872ms

OOB access with a offset between TypedArray#length - (2 ** 32 - 2):

Runtime deopt: 620.592ms
Runtime deopt: 840.127ms
Runtime deopt: 842.346ms
Runtime deopt: 839.631ms

This deoptimization will also be caused by e.g. `null`, `undefined`, `{}`, by numbers above 2 ** 32 and by valid accesses with strings e.g. '0'.

Having a small overhead for OOB is acceptable out of my perspective and expected but this performance cliff here is still pretty drastic.

(I removed my former comment as it still contained a mistake)
I believe that the difference between 620ms and 840ms is not so bad, given how much more work the optimized code has to do if it needs to handle the OOB case without deopt (note that handling OOB gracefully means that we have to compute the additions and multiplications as double precision floating point operations rather than integer ones). Could you explain why you think this can be faster?
#12: I have no issue with the minor performance overhead from 620ms to 840ms. That is fine. I am worried about the case from 627ms to 1820ms.

As I do not know that much about the V8 internals, I can not properly tell how it is possible to optimize those cases further.
But I guess it should be possible to have a typecheck and return undefined right away if the type is not JS `number` or `string`. That would already prevent the deopt in many of my examples.
Checking if the number if greater or equal zero or above 2 ** 32 - 2 should probably also be fine. So those deopts would be gone as well.
Handling strings could maybe be done by first converting them to a integer and have a NaN check in that case. That is likely still quite some overhead but I guess much less than the current deopt.
I am not sure how much performance it costs to check for floating point numbers, so I can not tell if it is possible to have a early return in that case.

My ideas here might be naive as I do not know the V8 internals but I hope there is still something possible to optimize.
Right, I thought you were asking about the OOB case.

For the other cases, we know how to optimize them, but there are trade-offs. If we generate code for all that you suggest, we will pay with compile time, memory and complexity. In your particular examples, I am not convinced we want to pay that price.

I am not sure what you mean by "typecheck and return undefined right away". The spec says we need to call toString method on an object in that case (or perhaps valueOf first, because we do the increment). Since we could be calling arbitrary function there, we could not reuse the shape check for the val array, the length load and the backing store load for any of he subsequent loads, so it would not be very fast anyway. Naturally, we could also track what types we have seen for indices, but that is a lot of complexity in the IC system that we do not want to introduce if it is not necessary. (Also note that collecting type feedback costs precious startup time, so we need to be super careful there.) 

I guess it all boils down to the question whether you have a real world use case where those optimizations matter. We fixed the OOB case because there was some widely used code that suffers from OOB being slow (for example, jQuery).
As a side note, a word on terms: a "deopt" is the specific event where during execution of optimized code for a given function, an embedded check detects that an assumption made during compilation (that enabled optimizations) no longer holds, so the optimized code has to be thrown away -- the function is "de-optimized". A few such deoptimizations are perfectly fine and not a problem, as functions will get reoptimized later (with updated assumptions).

In all your cases, there are no deopt events that would significantly impact performance. The issue is that during bytecode execution, certain type feedback is collected for "weird" index values, which then causes the optimizing compiler to generate slower optimized code than it otherwise would. As #12/#14 explained, that is mostly mandated by the fact that a JS engine simply has to do more work in those cases.
Status: Started (was: Fixed)
I think it makes total sense to support negative indices as well. So I'll add that.

As jarin@ pointed out, supporting arbitrary values is not going to be easy and probably not going to be fast either. I don't see a compelling argument why string indices for typed arrays should be fast.
Thanks a lot for pointing this out. I did not think much about the overhead before but it makes a lot of sense of course.

bmeurer@ thanks for having another look at it and to add that.

When it comes down to some other "more real world" cases that might be used to cause a OOB it will probably mainly be `undefined`, `null` and `NaN`. Especially undefined and `NaN` are cases that I would also like to support.
Some people might also use strings as offset.
Besides those values I do not think that anyone would really put in any other faulty values.

I can not tell how much extra overhead it would mean to add some of these as well, so it is of course something you have to decide on.
Fix in flight for negative indices: https://chromium-review.googlesource.com/c/v8/v8/+/911574

For NaN, undefined and null, this is not easy, since those are looked up as strings even on typed arrays. Wouldn't be better to spit out warnings when you accidentially add a property "NaN" or "undefined" to an object? Or try to look that up?

Mathias had been thinking about making the "array abuse" flag working again.
Project Member

Comment 19 by bugdroid1@chromium.org, Feb 9

The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/317fad950e8743656f78175253646365d2512096

commit 317fad950e8743656f78175253646365d2512096
Author: Benedikt Meurer <bmeurer@chromium.org>
Date: Fri Feb 09 20:05:37 2018

[ic] Support negative indices for typed array OOB accesses.

Extend the current OOB support for typed arrays to also handle the
negative integer indices in the fast-path. This is safe because in
ECMAScript we never look up integer indexed properties (including
negative indices) on typed arrays in the prototype chain.

This reduces the performance cliff shown in the benchmark on the
relevant bug from

  console.timeEnd: Runtime deopt, 596.185000
  console.timeEnd: Runtime deopt, 1444.289000
  console.timeEnd: Runtime deopt, 1445.191000
  console.timeEnd: Runtime deopt, 1443.008000

to

  console.timeEnd: Runtime deopt, 590.017000
  console.timeEnd: Runtime deopt, 784.899000
  console.timeEnd: Runtime deopt, 792.428000
  console.timeEnd: Runtime deopt, 786.740000

which corresponds to a 2x improvement overall. It's not for free,
especially not in this benchmark, but the cliff isn't as bad as
it was previously.

Bug:  v8:7027 
Change-Id: Icf8a7ee87bb7ebc54f82c1b9166fc5e78c12bc0e
Cq-Include-Trybots: master.tryserver.chromium.linux:linux_chromium_rel_ng
Reviewed-on: https://chromium-review.googlesource.com/911574
Reviewed-by: Jakob Kummerow <jkummerow@chromium.org>
Commit-Queue: Benedikt Meurer <bmeurer@chromium.org>
Cr-Commit-Position: refs/heads/master@{#51222}
[modify] https://crrev.com/317fad950e8743656f78175253646365d2512096/src/code-stub-assembler.cc
[modify] https://crrev.com/317fad950e8743656f78175253646365d2512096/src/compiler/js-native-context-specialization.cc
[modify] https://crrev.com/317fad950e8743656f78175253646365d2512096/src/compiler/type-cache.h
[modify] https://crrev.com/317fad950e8743656f78175253646365d2512096/src/ic/accessor-assembler.cc
[modify] https://crrev.com/317fad950e8743656f78175253646365d2512096/src/ic/ic.cc
[modify] https://crrev.com/317fad950e8743656f78175253646365d2512096/src/ic/keyed-store-generic.cc

Status: Fixed (was: Started)

Sign in to add a comment