New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 769607 link

Starred by 7 users

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Oct 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 1
Type: Bug-Regression



Sign in to add a comment

asm.js loading time is extremally slow compared against FF(regression)

Reported by caiiiy...@gmail.com, Sep 28 2017

Issue description

UserAgent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0

Steps to reproduce the problem:
1. Open attached html file in chrome
2. Wait until script prints 'Ok' in black window

What is the expected behavior?
Html page loads immediately and print Ok. (like in FF)

What went wrong?
Page loads up to 1-2 minutes (evenmore om mobiles). Browser does not respond.

Did this work before? Yes Don't remember, but sure that before it works fast

Chrome version: 61.0.3163.100 57c9d07b416b5a2ea23d28247300e4af36329bdc-refs/branch-heads/3163@{#1250} V8 6.1.534.41  Channel: stable
OS Version: Ubuntu 16.04.2 LTS
Flash Version: 27.0.0.130

Also affects chromium 61.0.3163.79

This html page was produced by emscipten compiler. You can build this page with this command:

./em++ bug.cpp --std=c++11 -O3 -g2 -o  bug.html

This cpp file is part of some project, and this code is used to initialize project resources. It compiles into big functions '__GLOBAL__sub_I_bug_cpp' that takes very long time to execute. But if you remove "use asm"; from js, then this problem is disappear. So I think that problem is related to asm.js.

In latest profile session I saw that function called js-to-wasm#24:export:js-to-wasm#24 taeks 24 sec. I have no idea what this function should do.

If you run this page in FF, then it prints Ok after 1 second and even less.

Also I found if I replace all string literals in cpp to "", and recompile then page start loads faster, but still much slower then FF.

Original thread on emscripten-discuss (little more info):
https://groups.google.com/forum/#!topic/emscripten-discuss/ViKh5zwWKow
 
bug.html
101 KB View Download
bug.js
1.3 MB View Download
bug.cpp
1.0 MB View Download

Comment 1 by caiiiy...@gmail.com, Sep 28 2017

Sorry, I attached wrong file. To avoid errors I make an archive
bug.html.mem
402 bytes Download
bug.zip
298 KB Download
Components: -Blink Blink>JavaScript
Cc: ligim...@chromium.org
Labels: -Pri-2 Needs-Triage-M61 Needs-Bisect Pri-1

Comment 4 by danno@chromium.org, Sep 28 2017

Cc: titzer@chromium.org bradnelson@chromium.org mstarzinger@chromium.org
Cc: ahaas@chromium.org clemensh@chromium.org
Components: -Blink>JavaScript Blink>JavaScript>WebAssembly
Labels: -Needs-Bisect -Via-Wizard-Javascript
Owner: bradnelson@chromium.org
Status: Assigned (was: Unconfirmed)
Context for folks following along on this one.

This program consists of a trivial main, plus 1.5 MB of static data that gets initialized in code:

const ResRecord RR3R4_R8R2TR2[] =
{
    {(ResR3ntry*)R4R3T_R4TUR11_R2UTOHR3R2R5, 1},
    {(ResR3ntry*)R4R3T_R4TUR11_R2UTOHR3R2R5_R2R5R7R6NR3R8, 1},
    {(ResR3ntry*)R4R3T_R4TUR11_R1R2R5R5OON_R6UN, 1},
    {(ResR3ntry*)R4R3T_R4TUR11_R1R3R2M, 1},
    {(ResR3ntry*)R4R3T_R4TUR11_R4TUR11132, 1},
....

int main(int argc, char** argv) {
    printf("Ok\n");
    return 0;
}

In asm.js, the static initialization turns into a single 1.5 MB function that gets called before main.
Page after page of this sort of thing:
 HEAPF32[i4 + 772 >> 2] = 1087.0;
 HEAPF32[i4 + 776 >> 2] = 93.0;
 HEAPF32[i4 + 780 >> 2] = 35.0;
 HEAPF32[i4 + 784 >> 2] = 144.0;
 HEAPF32[i4 + 788 >> 2] = 1124.0;
 HEAPF32[i4 + 792 >> 2] = 93.0;
 HEAPF32[i4 + 796 >> 2] = 35.0;
 HEAPF32[i4 + 800 >> 2] = 0.0;
 HEAPF32[i4 + 804 >> 2] = 0.0;

Validation + initial compile is plenty fast:
bug.js:1359: Converted asm.js to WebAssembly: success, asm->wasm: 29.695 ms, compile: 1.081 ms, 679084 bytes
bug.js:1359: Instantiated asm.js: success, 1.419 ms

The issue is that lazy compile doesn't buy us anything as the static initialize function gets hit and takes ~8 secs to compile.
The same program back pre-validator is even slower on the same function with --always-opt, but since the init function was never hot, it was done with fullcodegen.

It seems we're spending all out time scheduling:
                             -----------------------------------------------------------------------------------
                test codegen      1.771 (  0.0%)     2264272 (  0.6%)      82352      82352

                             -----------------------------------------------------------------------------------
                stub codegen      0.199 (  0.0%)      107824 (  0.0%)      26624      26624

           Wasm optimization     65.031 (  0.8%)     3981072 (  1.0%)    2119856    2119856
         late graph trimming     25.698 (  0.3%)     1921096 (  0.5%)    1694208    1694208
                  scheduling   7293.394 ( 85.4%)    14893984 (  3.8%)    9137512    9137512
         select instructions     21.502 (  0.3%)    15695072 (  4.1%)   12143048   13189368
                             -----------------------------------------------------------------------------------
                initializing   7407.462 ( 86.8%)    38129568 (  9.8%)   13189368   13189368


Need to dig into profiling more but this seems relatively hot:
31.09% v8::internal::compiler::ScheduleEarlyNodeVisitor::VisitNode(v8::internal::compiler::Node*)

Overall, it seems we have something non-linear in scheduling.
I wonder if short term we could try something like turning on the interpreter for functions over a size threshold?

Comment 6 by titzer@chromium.org, Sep 29 2017

That's a really pathological program. I would encourage the reporter to consider initializing the memory in a more efficient way, because engines will trip over this in one way or another.

In V8 we have two options:

1. Interpreter (as mentioned by Brad, a size heuristic would be fine)
2. Baseline compile

I'd prefer not to employ the interpreter as a production tier, but it can work as a stopgap measure.

Comment 7 by titzer@chromium.org, Sep 29 2017

BTW are their any branches in the control flow?

Another hotspot in the scheduler is computing the dominator tree in programs with long chains of diamonds. There are some tricks we could do to mitigate the hotspots, but I expect that the scheduling time is still going to be massive.
There are some loops as well in the init function (for big spans of zeros).
 
We might also be able to look for a store only pattern like this in the validator, but while ideally we could turn it into a data section, emscripten's js glue is expecting to call this function explicitly, so we'd have to be careful.

Labels: -Pri-1 -Needs-Triage-M61 Pri-2
Actually these can't be made into data (they come out as code because they happen after an allocation).

Another option is to have the asm-parser chop functions up above a certain size up into sub-functions.
For this pathological case might not be worth it, but noticing that unity has 64KB functions and epic has 100KB functions, might be worth it. Need to measure how steep the non-linear behavior is.

Lowering priority slightly.

I want to notice, that this cpp code is not handwrited. It's a result of working assets pipeline from one of famous game engine. I'am talking about big game that uses 100-200Mb of different resources, and initialize in that way only 1-2Mb. It works for native environment but maybe not fit for js. Anyway this code can't be rewritten, I can only remove "use asm" for now.
@caiiiycuk, Useful to know it's not synthetic.
Will investigate the possibility of mitigation.

I also found a problem in Chromium 61 that didn't exist in 60 and that has similar symptoms (code I have in a WebWorker, an emscripten-compiled version of the "crunch" texture decompressor), loops on 100% apparently forever (I haven't seen it finish yet).

The same code loads in 60 in under a second.

I have gdb'd that part and the gdb suggests that this function containing an infinite loop is never exited:

https://github.com/v8/v8/blob/17d86d76fb6e452436352b0b7cc19051902d2e44/src/asmjs/asm-parser.cc#L2475-L2503

The working and not-working relevant versions of v8 are, according to chrome://version:

https://github.com/v8/v8/blame/6.0.286/src/asmjs/asm-parser.cc#L2423
https://github.com/v8/v8/blame/6.1.534/src/asmjs/asm-parser.cc#L2425

I haven't managed to make a minimal repro, but I could with relatively little effort make available to Chromium devs a non-minimal version of the site that exhibits the problem (please let me know if that would help).

Again though, I'm not sure if it's the same bug or a slightly different one; mine seems to be in the asm.js parser.
With the help of Torne on #chromium IRC and binary build bisection (https://www.chromium.org/developers/bisect-builds-py and the mentioned OmahaProxy to look up the right numbers), I found that my problem was introduced in commit

  Adding field trial for asm.js -> Wasm by Brad Nelson ยท 3 months ago

And setting chrome://flags/#enable-asm-webassembly fixes my problem.
Owner: aseemgarg@chromium.org
Aseem will look into if we can chop up large functions as they pass through asm-parser.

I filed a separate bug for the infinite-loop problem in https://bugs.chromium.org/p/chromium/issues/detail?id=771428 and also attached a patch.
Labels: -Pri-2 ReleaseBlock-Stable M-63 Pri-1
Adding blocker labels so this regression does not gets overlooked.
aseemgarg@,
Friendly ping to get an update on this issue as it is marked as stable blocker.
Thanks..!
Status: Started (was: Assigned)
Getting started on it now.
Thanks for the update.

FYI: M63 is branching soon, we will be taking only critical merges. It would be great to have a fix ASAP.
caiiiycuk@gmail.com, btw, which engine is this coming from?

bradnelson@chromium.org, Sorry I can't say the name because of NDA (it's proprietary engine), it's very similar to cocos2d and I think it's inspired by cocos2d.
aseemgarg@,
Friendly ping to get an update on this issue as it is marked as stable blocker.
Thanks..!

I should be done with the CL today. But, we won't be landing it just yet. We still need to figure out if the problem is big enough to warrant a solution (it does help in this case). We will have to leave the bug open for a short while.
Gentle ping to get an update on this issue as per C#23 as it is marked as stable blocker.

Thanks..!

Created CL - https://chromium-review.googlesource.com/c/chromium/src/+/707747
It is failing on poppler.js that tries to trace the generated wasm code. I still need to figure out what's the right solution here. I'll update once we agree on a strategy..
Sorry linked wrong CL in previous comment. See the following instead:
https://chromium-review.googlesource.com/c/v8/v8/+/728305
Labels: -ReleaseBlock-Stable -M-63
Since we are not sure we'll address it just yet, removing the blocker label. 

I will be investigating whether this solution (breaking the function) has any wins for other benchmarks. If it doesn't, we'll probably let this regression stay and let either the baseline compiler take care of it when it comes or look for a different solution.
Status: WontFix (was: Started)
I investigated splitting asm and we didn't see any wins for other benchmarks (presumably because they don't have such large functions). Hence, we've decided not to fix it for now. We do need to address this in WebAssembly with larger functions under discussion. Filed the following bug to track that: https://bugs.chromium.org/p/chromium/issues/detail?id=780165
See the following for perf impact of splitting for other benchmarks:
https://docs.google.com/a/google.com/document/d/1D7sTUjfFAfn_Bz8B5SfCJcSbQjZ1JA6sTZ_LA8VUlYs/edit?usp=sharing

Sign in to add a comment