Issue metadata
Sign in to add a comment
|
asm.js loading time is extremally slow compared against FF(regression)
Reported by
caiiiy...@gmail.com,
Sep 28 2017
|
||||||||||||||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0 Steps to reproduce the problem: 1. Open attached html file in chrome 2. Wait until script prints 'Ok' in black window What is the expected behavior? Html page loads immediately and print Ok. (like in FF) What went wrong? Page loads up to 1-2 minutes (evenmore om mobiles). Browser does not respond. Did this work before? Yes Don't remember, but sure that before it works fast Chrome version: 61.0.3163.100 57c9d07b416b5a2ea23d28247300e4af36329bdc-refs/branch-heads/3163@{#1250} V8 6.1.534.41 Channel: stable OS Version: Ubuntu 16.04.2 LTS Flash Version: 27.0.0.130 Also affects chromium 61.0.3163.79 This html page was produced by emscipten compiler. You can build this page with this command: ./em++ bug.cpp --std=c++11 -O3 -g2 -o bug.html This cpp file is part of some project, and this code is used to initialize project resources. It compiles into big functions '__GLOBAL__sub_I_bug_cpp' that takes very long time to execute. But if you remove "use asm"; from js, then this problem is disappear. So I think that problem is related to asm.js. In latest profile session I saw that function called js-to-wasm#24:export:js-to-wasm#24 taeks 24 sec. I have no idea what this function should do. If you run this page in FF, then it prints Ok after 1 second and even less. Also I found if I replace all string literals in cpp to "", and recompile then page start loads faster, but still much slower then FF. Original thread on emscripten-discuss (little more info): https://groups.google.com/forum/#!topic/emscripten-discuss/ViKh5zwWKow
,
Sep 28 2017
,
Sep 28 2017
,
Sep 28 2017
,
Sep 29 2017
Context for folks following along on this one.
This program consists of a trivial main, plus 1.5 MB of static data that gets initialized in code:
const ResRecord RR3R4_R8R2TR2[] =
{
{(ResR3ntry*)R4R3T_R4TUR11_R2UTOHR3R2R5, 1},
{(ResR3ntry*)R4R3T_R4TUR11_R2UTOHR3R2R5_R2R5R7R6NR3R8, 1},
{(ResR3ntry*)R4R3T_R4TUR11_R1R2R5R5OON_R6UN, 1},
{(ResR3ntry*)R4R3T_R4TUR11_R1R3R2M, 1},
{(ResR3ntry*)R4R3T_R4TUR11_R4TUR11132, 1},
....
int main(int argc, char** argv) {
printf("Ok\n");
return 0;
}
In asm.js, the static initialization turns into a single 1.5 MB function that gets called before main.
Page after page of this sort of thing:
HEAPF32[i4 + 772 >> 2] = 1087.0;
HEAPF32[i4 + 776 >> 2] = 93.0;
HEAPF32[i4 + 780 >> 2] = 35.0;
HEAPF32[i4 + 784 >> 2] = 144.0;
HEAPF32[i4 + 788 >> 2] = 1124.0;
HEAPF32[i4 + 792 >> 2] = 93.0;
HEAPF32[i4 + 796 >> 2] = 35.0;
HEAPF32[i4 + 800 >> 2] = 0.0;
HEAPF32[i4 + 804 >> 2] = 0.0;
Validation + initial compile is plenty fast:
bug.js:1359: Converted asm.js to WebAssembly: success, asm->wasm: 29.695 ms, compile: 1.081 ms, 679084 bytes
bug.js:1359: Instantiated asm.js: success, 1.419 ms
The issue is that lazy compile doesn't buy us anything as the static initialize function gets hit and takes ~8 secs to compile.
The same program back pre-validator is even slower on the same function with --always-opt, but since the init function was never hot, it was done with fullcodegen.
It seems we're spending all out time scheduling:
-----------------------------------------------------------------------------------
test codegen 1.771 ( 0.0%) 2264272 ( 0.6%) 82352 82352
-----------------------------------------------------------------------------------
stub codegen 0.199 ( 0.0%) 107824 ( 0.0%) 26624 26624
Wasm optimization 65.031 ( 0.8%) 3981072 ( 1.0%) 2119856 2119856
late graph trimming 25.698 ( 0.3%) 1921096 ( 0.5%) 1694208 1694208
scheduling 7293.394 ( 85.4%) 14893984 ( 3.8%) 9137512 9137512
select instructions 21.502 ( 0.3%) 15695072 ( 4.1%) 12143048 13189368
-----------------------------------------------------------------------------------
initializing 7407.462 ( 86.8%) 38129568 ( 9.8%) 13189368 13189368
Need to dig into profiling more but this seems relatively hot:
31.09% v8::internal::compiler::ScheduleEarlyNodeVisitor::VisitNode(v8::internal::compiler::Node*)
Overall, it seems we have something non-linear in scheduling.
I wonder if short term we could try something like turning on the interpreter for functions over a size threshold?
,
Sep 29 2017
That's a really pathological program. I would encourage the reporter to consider initializing the memory in a more efficient way, because engines will trip over this in one way or another. In V8 we have two options: 1. Interpreter (as mentioned by Brad, a size heuristic would be fine) 2. Baseline compile I'd prefer not to employ the interpreter as a production tier, but it can work as a stopgap measure.
,
Sep 29 2017
BTW are their any branches in the control flow? Another hotspot in the scheduler is computing the dominator tree in programs with long chains of diamonds. There are some tricks we could do to mitigate the hotspots, but I expect that the scheduling time is still going to be massive.
,
Sep 29 2017
There are some loops as well in the init function (for big spans of zeros). We might also be able to look for a store only pattern like this in the validator, but while ideally we could turn it into a data section, emscripten's js glue is expecting to call this function explicitly, so we'd have to be careful.
,
Sep 30 2017
Actually these can't be made into data (they come out as code because they happen after an allocation). Another option is to have the asm-parser chop functions up above a certain size up into sub-functions. For this pathological case might not be worth it, but noticing that unity has 64KB functions and epic has 100KB functions, might be worth it. Need to measure how steep the non-linear behavior is. Lowering priority slightly.
,
Sep 30 2017
I want to notice, that this cpp code is not handwrited. It's a result of working assets pipeline from one of famous game engine. I'am talking about big game that uses 100-200Mb of different resources, and initialize in that way only 1-2Mb. It works for native environment but maybe not fit for js. Anyway this code can't be rewritten, I can only remove "use asm" for now.
,
Oct 2 2017
@caiiiycuk, Useful to know it's not synthetic. Will investigate the possibility of mitigation.
,
Oct 2 2017
I also found a problem in Chromium 61 that didn't exist in 60 and that has similar symptoms (code I have in a WebWorker, an emscripten-compiled version of the "crunch" texture decompressor), loops on 100% apparently forever (I haven't seen it finish yet). The same code loads in 60 in under a second. I have gdb'd that part and the gdb suggests that this function containing an infinite loop is never exited: https://github.com/v8/v8/blob/17d86d76fb6e452436352b0b7cc19051902d2e44/src/asmjs/asm-parser.cc#L2475-L2503 The working and not-working relevant versions of v8 are, according to chrome://version: https://github.com/v8/v8/blame/6.0.286/src/asmjs/asm-parser.cc#L2423 https://github.com/v8/v8/blame/6.1.534/src/asmjs/asm-parser.cc#L2425 I haven't managed to make a minimal repro, but I could with relatively little effort make available to Chromium devs a non-minimal version of the site that exhibits the problem (please let me know if that would help). Again though, I'm not sure if it's the same bug or a slightly different one; mine seems to be in the asm.js parser.
,
Oct 2 2017
With the help of Torne on #chromium IRC and binary build bisection (https://www.chromium.org/developers/bisect-builds-py and the mentioned OmahaProxy to look up the right numbers), I found that my problem was introduced in commit Adding field trial for asm.js -> Wasm by Brad Nelson ยท 3 months ago And setting chrome://flags/#enable-asm-webassembly fixes my problem.
,
Oct 4 2017
Aseem will look into if we can chop up large functions as they pass through asm-parser.
,
Oct 4 2017
I filed a separate bug for the infinite-loop problem in https://bugs.chromium.org/p/chromium/issues/detail?id=771428 and also attached a patch.
,
Oct 4 2017
Adding blocker labels so this regression does not gets overlooked.
,
Oct 10 2017
aseemgarg@, Friendly ping to get an update on this issue as it is marked as stable blocker. Thanks..!
,
Oct 10 2017
Getting started on it now.
,
Oct 11 2017
Thanks for the update. FYI: M63 is branching soon, we will be taking only critical merges. It would be great to have a fix ASAP.
,
Oct 12 2017
caiiiycuk@gmail.com, btw, which engine is this coming from?
,
Oct 12 2017
bradnelson@chromium.org, Sorry I can't say the name because of NDA (it's proprietary engine), it's very similar to cocos2d and I think it's inspired by cocos2d.
,
Oct 18 2017
aseemgarg@, Friendly ping to get an update on this issue as it is marked as stable blocker. Thanks..!
,
Oct 18 2017
I should be done with the CL today. But, we won't be landing it just yet. We still need to figure out if the problem is big enough to warrant a solution (it does help in this case). We will have to leave the bug open for a short while.
,
Oct 24 2017
Gentle ping to get an update on this issue as per C#23 as it is marked as stable blocker. Thanks..!
,
Oct 24 2017
Created CL - https://chromium-review.googlesource.com/c/chromium/src/+/707747 It is failing on poppler.js that tries to trace the generated wasm code. I still need to figure out what's the right solution here. I'll update once we agree on a strategy..
,
Oct 24 2017
Sorry linked wrong CL in previous comment. See the following instead: https://chromium-review.googlesource.com/c/v8/v8/+/728305
,
Oct 25 2017
Since we are not sure we'll address it just yet, removing the blocker label. I will be investigating whether this solution (breaking the function) has any wins for other benchmarks. If it doesn't, we'll probably let this regression stay and let either the baseline compiler take care of it when it comes or look for a different solution.
,
Oct 31 2017
I investigated splitting asm and we didn't see any wins for other benchmarks (presumably because they don't have such large functions). Hence, we've decided not to fix it for now. We do need to address this in WebAssembly with larger functions under discussion. Filed the following bug to track that: https://bugs.chromium.org/p/chromium/issues/detail?id=780165 See the following for perf impact of splitting for other benchmarks: https://docs.google.com/a/google.com/document/d/1D7sTUjfFAfn_Bz8B5SfCJcSbQjZ1JA6sTZ_LA8VUlYs/edit?usp=sharing |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by caiiiy...@gmail.com
, Sep 28 2017402 bytes
402 bytes Download
298 KB
298 KB Download