Surprising performance results in GopherWasm vs GopherJS in Chrome vs Firefox |
|||||||
Issue descriptionFirefox seems to be performing better with wasm binaries created from Go code. Article: https://dev.to/hajimehoshi/gopherjs-vs-webassembly-for-go-148m Main takeway: (GopherJS on Chrome) > (WebAssembly on Firefox) > (GopherJS on Firefox) > (WebAssembly on Chrome)
,
Jun 18 2018
Assigning herhut@ to take a look.
,
Jun 20 2018
,
Jun 20 2018
I have investigated this a bit and it seems that "syscall/js.valueCall" in wasm_exec.js is the culprit. All invokes from the wasm code to the JavaScript side get funneled through this interface (including webGL calls). According to the profilers, Chrome spends a lot more time there than Firefox. sigurds@, could you take a look?
,
Jun 20 2018
I've taken a look and most of the time is spend in WASM, not in the call-back to javascript. This can be verified by disabling untrusted code mitigations google-chrome-unstable --js-flags="--nountrusted-code-mitigations" --no-sandbox which enables switches in wasm code again (16pfs with mitigations to 40fps without). I don't think this is a JS issue.
,
Sep 27
,
Sep 28
,
Sep 28
After more investigation, the root cause seems to be that we do particularly bad register allocation for switches inside of loops in the presence of many live locals. Go compiles its code into a switch over basic block numbers to allow it to jump between blocks and also to resume at a basic block position inside a function after a yield or de-scheduling. Hence all compiled go code is affected by this. Furthermore, as liftoff uses branch trees to implement switches, go code will also not perform well when compiled with liftoff. I will use this bug to track benchmarks that are affected by this pattern.
,
Sep 28
,
Jan 8
herhut@ - Will the fixes mentioned in v8:8232 #4 also improve this use case?
,
Jan 8
Unfortunately those changes won't impact the go performance too much. However, improvements for the go case are also underway.
,
Jan 18
(4 days ago)
I investigated some more and another issue is the particular pattern that globals are used in the go code. We see a lot of SetGlobal<n> GetGlobal<n> which currently gets compiled naively into a store followed by a load. The load is redundant but we current do not have a redundant load elimination that would work for webassembly.
,
Today
(20 hours ago)
The following revision refers to this bug: https://chromium.googlesource.com/v8/v8.git/+/f25656aee284d3f32aed01895c4b5d1a89c02166 commit f25656aee284d3f32aed01895c4b5d1a89c02166 Author: Stephan Herhut <herhut@chromium.org> Date: Tue Jan 22 15:42:04 2019 [wasm] Avoid zero extension after truncate In wasm code, we sometimes see the pattern <some 64 bit expression> i32.wrap/i64 i32.load where we generate an instruction to extend the 32 bit offset into a zero extended 64 bit value for the actual load. However, the preceeding truncate already yields a zero extended 32 bit value, so the extra instruction is not needed. Even more, it might get in the way of munching more computation into the final load. This change adds information about the zero extending behavior to the existing optimization that avoids the zero extension. Bug: chromium:853685 Change-Id: Iab9179379923ecb88651df6091b3d9408341cf4c Reviewed-on: https://chromium-review.googlesource.com/c/1421839 Reviewed-by: Michael Starzinger <mstarzinger@chromium.org> Reviewed-by: Jaroslav Sevcik <jarin@chromium.org> Commit-Queue: Stephan Herhut <herhut@chromium.org> Cr-Commit-Position: refs/heads/master@{#58998} [modify] https://crrev.com/f25656aee284d3f32aed01895c4b5d1a89c02166/src/compiler/backend/x64/instruction-selector-x64.cc |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by hablich@chromium.org
, Jun 18 2018Components: Blink>JavaScript>WebAssembly
Labels: -Pri-3 Pri-2
Status: Available (was: Unconfirmed)