New issue
Advanced search Search tips
Starred by 9 users
Status: Fixed
Owner: ----
Closed: May 2013
HW: ----
OS: ----
Priority: ----
Type: ----



Sign in to add a comment
15x slower than spidermonkey on skinning benchmark
Reported by alonza...@gmail.com, Jul 7 2012 Back to list
Attached is a benchmark of vertex skinning compiled with emscripten, in HTML and shell versions (the latter is now possible after typed arrays were added to d8, which is great). After running it prints out the number of milliseconds it took.

In both the shell and the browser, v8 takes 15 seconds while spidermonkey takes 1 second. Similar ratios shows up in a few other emscripten-compiled benchmarks and projects as well. Please let me know if there is something we can tweak in emscripten to work around this problem.

 
skinning.html
99 KB View Download
skinning.js
99 KB View Download
I looked at it briefly sometime ago and here are my observations.

Benchmark spends all the time in a single function that is called only once. 

It has many locals and we fail to perform OSR in it due to constraints on LUnallocated operand encoding.

However just fixing this is not enough. If I steal some bits in the encoding to increase number of locals that OSR can handle I see the following: we OSR function once then deoptimize on soft-deoptimization (@deoptimize) and never attempt to OSR function again unless I artificially bump loop nesting level to kMaxLoopNestingMarker immediately. I did not look why this happens, but it might be a caused by a bug in our stubs that perform counting on back edges.

It seems that there are two major issues here :

1) LUnallocated encoding prevents us from doing an OSR;
2) some issue with back edge counting causes us not to re-attempt OSR again;

Additional issue is soft-deoptimization itself, it might be that type propagation inside the function could have allowed us to avoid it, but I did not analyze that. 
Regarding LUnallocated encoding: I've seen the same on the Emscripten version of Box2D at http://kripken.github.com/misc-js-benchmarks/box2d/.

I'll submit a small CL so that we can see the reason for disabling optimization a bit easier.
Here is an updated version of this benchmark with additional emscripten optimizations not in the previous version (we can now remove more temporary variables now than before which I hoped would help). However v8 is still 14.22x slower on this benchmark.
skinning.js
49.9 KB View Download
Status: Accepted
I've tried the skinning.js from comment #3 with our current v8 from bleeding_edge, and it finishes after 30ms, while the skinning.js from the original post still takes 13.2 seconds. In the 30ms run, nothing is actually crankshafted (apart from valueOf/IsPrimitive/toString), but I can't see the reason for this. There are no deopts and there is no OSR, which in summary seems a bit strange. Although 30ms looks OK compared to 13.2 seconds, we should look into this, I think.

Which version of v8 have you used for your testing? Is it possible to get a non-minified version of the latest skinning.js?
Comment 5 by alonza...@gmail.com, Oct 31 2012
Sorry, I forgot to include the commandline arguments in that last version. Attached is a version with the arguments, also non-minified as you asked.

skinning.nice.js
119 KB View Download
A simple test case to demonstrate the 'many locals' issue. V8 is more than 3X slower than SpiderMonkey.  
locals.js
7.1 KB View Download
Here's an updated version of this benchmark. v8 (latest svn) is now 4.5x slower than spidermonkey on my machine, which is a significant improvement over the original 15x from before.

skinning.js
105 KB View Download
I took a quick look, and things are a bit strange: As before, basically all time is spent in a single function (_main). The problem is that we do OSR (on-stack replacement) very late, so we run almost all the time in non-optimized code. Fractions of a second before we terminate, we finally do OSR, deopting almost immediately, because we boldly go where no v8 has gone before... :-P (not nice, but not the main problem)

In a nutshell: The benchmark currently measures the performance of our non-optimized code, because we do an extremely bad OSR decision. Someone should definitely look into this.
Status: Fixed
We've fixed this a while ago.
Sign in to add a comment