Project: v8 Issues People Development process History Sign in
New issue
Advanced search Search tips
Starred by 2 users
Status: Fixed
Owner:
Closed: Aug 8
Cc:
Components:
HW: All
OS: All
Priority: 2
Type: Bug



Sign in to add a comment
object creation seems to slow down
Reported by matteo.c...@gmail.com, Aug 2 Back to list
Version: 6.0.286.45
OS: Mac OS X 10.12
Architecture: x64

What steps will reproduce the problem?

run: https://gist.github.com/mcollina/9eb14178619460789f9db20e06ed759c

Using V8 6.0.286.45 (from node), I got:

empty loop: 62.695ms
literal: 2367.563ms

With V8 5.8.283.41 (Node 8.2.1), I got:

empty loop: 47.828ms
literal: 1010.933ms

While empty-looping has slowed down, allocating objects seems to have slow 2x.

Things are getting better on V8 6.1.459:

empty loop: 67.859ms
literal: 1758.019ms


 
Cc: bmeu...@chromium.org cbruni@chromium.org
Labels: Performance HW-All OS-All Priority-2
Status: Available
Maybe related to object literals? +cc bmeurer, cbruni.
Owner: jgruber@chromium.org
I'll have a look tomorrow.
I have checked also with constructors and classes, and I have observed the same behavior. I'm not sure how to measure the impact of this in production applications.
I would bet this is related to the loop being simply eliminated. I expect there are different results if there's a meaningful effect occurring in the loop body
then again, I dunno --- it seems like on OSR entry, the `literals` function should see similar benefits, I would be surprised if objects were still allocated/the loop was still there. But maybe the baseline case + GC pauses would be enough to make the 1000-2000ms difference.

Anyways, leaving the discussion to more knowledgable people :)
It does seem the loop is not eliminated and the objects are being created. 

My bad the numbers above are for V8 5.1.281.103 (Node 6).

empty loop: 48.482ms
literal: 1039.456ms

V8 5.8 has the same performance of V8 6.0.

Both versions are in the 1550-1560 Scavenge run cycles.
Cc: u...@chromium.org
I think your benchmark is not measuring what you want to measure. This seems to be dominated by the store buffer overflow on the assignment to the "obj" global object property (in case of d8) or context slot (in case of node). If you modify the benchmark slightly (in a safe way that both d8 and node behave similarly), then you actually measure the allocation overhead I think:

========================================
(function() {
  var max = 100000000

  console.time('empty loop')

  function loop () {
    for (var i = 0; i < max; i++) {}
  }

  loop()

  console.timeEnd('empty loop')

  console.time('literal')

  function literal () {
    var obj = null
    for (var i = 0; i < max; i++) {
        obj = { i }
      }
    return obj;
  }

  literal()
  console.timeEnd('literal')
})();
========================================

Now the results look roughly the same (within noise):

============================================================
$ ./d8-6.0.286.36 matteo.js
empty loop: 108.429000
literal: 468.233000
$ ./d8-6.1.534.15 matteo.js
empty loop: 108.305000
literal: 471.232000
$ out/Release/d8 matteo.js
empty loop: 108.241000
literal: 480.974000
$ ~/Applications/node-v6.10.0-linux-x64/bin/node matteo.js
empty loop: 81.328ms
literal: 460.296ms
$ ~/Applications/node-v8.1.2-linux-x64/bin/node matteo.js
empty loop: 82.764ms
literal: 459.173ms
============================================================

But it's also possible that I'm missing something here.
Just to clarify, why the previous one regressed between V8 5.1 and V8 6.0?
Assigning things to an outer context has regressed?

(function () {
var max = 100000000

console.time('empty loop')

function loop () {
  for (var i = 0; i < max; i++) {}
}

loop()

console.timeEnd('empty loop')

console.time('assignment')
var n = 0

function literal () {
  for (var i = 0; i < max; i++) {
    n = i
  }
}

literal()
console.timeEnd('assignment')
})()

the above take the same amount of time in both Node 6.11 and 8.2.1, but it increases 30% in node master with V8 60, and it drops again with V8 61.
Cc: mlippautz@chromium.org
Components: GC
AFAIK the way the store buffer is implemented changed significantly since 5.1 (which is ancient BTW). I added the GC folks to help answering the question.
Attached is a profile of the initial benchmark. We indeed spend around 34% of time on store buffer management. 21% just on '__lll_lock_wait', which I assume is also part of store buffers.

Anyway, the modified benchmark from #7 gives me the following results on ToT:

$ out/release/d8 ~/crbug-v8-6664-obj-creation.js 
empty loop: 112.856000
literal: 80.606000
crbug-v8-6664.png
104 KB View Download
crbug-v8-6664-obj-creation.js
303 bytes View Download
FYI: the _lll_lock_wait calls come from StoreBuffer::FlipStoreBuffers.
Cc: -u...@chromium.org jgruber@chromium.org
Owner: u...@chromium.org
Bisected this down to:

9d1488e4b0b3d9b2a630c602346f1b8417f2a7c4 is the first bad commit
commit 9d1488e4b0b3d9b2a630c602346f1b8417f2a7c4
Author: hpayer <hpayer@chromium.org>
Date:   Wed Nov 30 04:17:00 2016 -0800

    [heap] Reduce store buffer size to increase chance to run concurrent store buffer processing thread more often.

Prior to this CL: literal: 1644.279
After    this CL: literal: 2222.33

Increasing kStoreBufferSize decreases the regression.

Assigning to current memory sheriff. Ulan, could you please take a look?
jgruber@, my numbers are bit different. I run 50 times each of the three revisions (raw results attached).

Averages [with min-max range]:
5.1.281.75 (91b368): literal: 1232.2  [1222-1248]
before CL  (4d75ea): literal: 2185.86 [1892-2478]
after CL   (9d1488): literal: 2511.24 [2262-2800]

The regression of the CL was about 325ms.
There seems to be another regression of 950ms between the 5.1 and the 4d75ea.

Note that at 4d75ea the variability is huge. Maybe you hit the lower end of the range in #12?

5.1.txt
751 bytes View Download
before.txt
751 bytes View Download
after.txt
750 bytes View Download
create.js
288 bytes View Download
Status: Assigned
> There seems to be another regression of 950ms between the 5.1 and the 4d75ea.
About 200ms regression happens before 1fb449.
I bisected the remaining 750ms to

commit a9e6bbba263c98090f96bb0dccff09d8ffb86c0a (refs/bisect/bad)
Author: hpayer <hpayer@chromium.org>
Date:   Fri Nov 11 06:00:55 2016 -0800

    [heap] Reland concurrent store buffer processing.
    
    BUG=chromium:648973,  chromium:648568 
    
    Review-Url: https://codereview.chromium.org/2493083003
    Cr-Commit-Position: refs/heads/master@{#40928}

So most of the regression comes from the store buffer changes.
Labels: HW-x64
This simple optimization removes about 1000ms from the runtime on TOT:
https://chromium-review.googlesource.com/c/600908

Labels: -HW-x64
Can some of you check if this is what is happening in https://github.com/nodejs/node/blob/master/benchmark/timers/set-immediate-breadth.js


This is the comparison between master and node 6.x:

timers/set-immediate-breadth.js millions=10           -35.94 %        *** 1.835500e-05


It is using a similar pattern:
https://github.com/nodejs/node/blob/0d22858d67f5f8f7959a55ceca23adafe12827d5/lib/timers.js#L686-L733

I have tried applying the patch and it did not help, so it might be something different after all.
#17: Are store buffers prominent in --prof output? (Run the benchmark with --prof, then ${v8dir}/tools/linux-tick-processor <produced_output_file>.log)
Project Member Comment 19 by bugdroid1@chromium.org, Aug 7
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/35f9b26601bb50d9fe968c4a6069fbb1968f008a

commit 35f9b26601bb50d9fe968c4a6069fbb1968f008a
Author: Ulan Degenbaev <ulan@chromium.org>
Date: Mon Aug 07 09:09:32 2017

[heap] De-duplicate insertions to the old-to-new remembered set.

Bug:  v8:6663 
Change-Id: I8bf7169c21141a34e3bcb0bb2193ceb1746b33b2
Reviewed-on: https://chromium-review.googlesource.com/600908
Reviewed-by: Michael Lippautz <mlippautz@chromium.org>
Commit-Queue: Ulan Degenbaev <ulan@chromium.org>
Cr-Commit-Position: refs/heads/master@{#47186}
[modify] https://crrev.com/35f9b26601bb50d9fe968c4a6069fbb1968f008a/src/heap/store-buffer.cc

Store buffer are not prominent: https://gist.github.com/mcollina/c5d25814d540cad6e9e24098ce8c446c.

It's something else.
I opened a new bug for the issues in #17 and #20: https://crbug.com/v8/6686.

Ulan, can we close this one?
Status: Fixed
Sign in to add a comment