Issue metadata
Sign in to add a comment
|
2.1%-51.5% regression in asm.js perf at 41671:41671 |
||||||||||||||||||||
Issue descriptionThis seems to regress a range of benchmarks on ia32: https://chromium.googlesource.com/v8/v8/+/5c1babcc16e1b568d88af9eb4bc84e050e631039 Including: Emscripten for Fannkuch, Zlib, MemOps, Life, Bullet AreWeFastYet for BulletLoadTime, Fasta, Fannkuch JetStream for towers.c, quicksort.c, gcc-loops.cpp, bigfib.cpp
,
Dec 13 2016
,
Dec 15 2016
Hi Bradnelson, I cannot reproduce these regression on a HSW Ubuntu 14.04 machine. And AreWeFastYes shows improvement of jetstream-gcc-loops.cpp on a Win 8 32-bit machine: https://arewefastyet.com/#machine=17&view=breakdown&suite=jetstream. Could you please share some info about what platform did these regression occur on and were any d8 runtime flags used? Very appreciate your help and I'm looking forward to figure out the reason for these regressions. Thank you!
,
Dec 15 2016
Hi Shiyu, Our internal Chromium performance monitoring showed a big regression on the Emscripten MemOps and Life benchmarks on ia32 with the change. That's with the --validate-asm flag, btw. Fasta also seems affected.
,
Dec 17 2016
FWIW, the revert seems to have regressed (i.e. these had been improved by the change), these items: 673861 ❌ 41696 internal.client.v8 ia32 v8 Embenchen-asm_wasm/ZLib 2.2% 673861 ❌ 41696 internal.client.v8 ia32 v8 Massive-asm_wasm/SQLite 4.8% 673861 ❌ 41696 internal.client.v8 ia32 v8 Embenchen-asm_wasm/Corrections 1.5% 673861 ❌ 41696 internal.client.v8 ia32 v8 Emscripten-asm_wasm/Primes 6.6% 673861 ❌ 41696 internal.client.v8 ia32 v8 JetStream-asm_wasm/dry.c 20.5% Whereas, the original change regressed these: 673861 ❌ 41671 internal.client.v8 ia32 v8 Emscripten-asm_wasm/MemOps 50.7% 673861 ❌ 41671 internal.client.v8 ia32 v8 Emscripten-asm_wasm/Fasta 8.2% 673861 ❌ 41671 internal.client.v8 ia32 v8 AreWeFastYet-asm_wasm/Fasta 7.7% 673861 ❌ 41671 internal.client.v8 ia32 v8 JetStream-asm_wasm/towers.c 4.1% 673861 ❌ 41671 internal.client.v8 ia32 v8 Emscripten-asm_wasm/ZLib 2.4% 673861 ❌ 41671 internal.client.v8 ia32 v8 Emscripten-asm_wasm/Life 5.2% 673861 ❌ 41671 internal.client.v8 ia32 v8 JetStream-asm_wasm/bigfib.cpp 2.4% 673861 ❌ 41671 internal.client.v8 ia32 v8 AreWeFastYet-asm_wasm/Life 9.6% 673861 ❌ 41671 internal.client.v8 ia32 v8 JetStream-asm_wasm/gcc-loops.cpp 3.3% 673861 ❌ 41671 internal.client.v8 ia32 v8 JetStream-asm_wasm/quicksort.c 2.6% 673861 ❌ 41671 internal.client.v8 ia32 v8 AreWeFastYet-asm_wasm/BulletLoadTime 2.3% 673861 ❌ 41671 internal.client.v8 ia32 v8 AreWeFastYet-asm_wasm/MemOps 51.5% 673861 ❌ 41671 internal.client.v8 ia32 v8 Emscripten-asm_wasm/Bullet 2.1% 673861 ❌ 41671 internal.client.v8 ia32 v8 AreWeFastYet-asm_wasm/Skinning 1.3% 673861 ❌ 41671 internal.client.v8 ia32 v8 Embenchen-asm_wasm/MemOps 50.7% 673861 ❌ 41671 internal.client.v8 ia32 v8 Emscripten-asm_wasm/Fannkuch 4.8% 673861 ❌ 41671 internal.client.v8 ia32 v8 Embenchen-asm_wasm/Fasta 7.7% 673861 ❌ 41671 internal.client.v8 ia32 v8 JetStream-Ignition/towers.c 4.0% 673861 ❌ 41671 internal.client.v8 ia32 v8 Embenchen-asm_wasm/Fannkuch 3.9% 673861 ❌ 41671 internal.client.v8 ia32 v8 AreWeFastYet-asm_wasm/Fannkuch 4.1%
,
Dec 17 2016
Digging into memops, I think some case with a zero immediate isn't getting handled right. It looks like instead its ending up being first loaded into a register (78 below) < 67 83e2fc and edx,0xfc < 70 83c204 add edx,0x4 < 73 81fa00001000 cmp edx,0x100000 ;; wasm memory size reference < 79 0f83c6020000 jnc 795 (0x4d189bfb) < 85 8b9200000000 mov edx,[edx+0x0] ;; wasm memory reference < 91 81fa00001000 cmp edx,0x100000 ;; wasm memory size reference < 97 0f83ad020000 jnc 788 (0x4d189bf4) < 103 0fbe9200000000 movsx_b edx,[edx+0x0] ;; wasm memory reference < 110 83ea30 sub edx,0x30 < 113 33f6 xor esi,esi < 115 b804000000 mov eax,0x4 < 120 83fa06 cmp edx,0x6 < 123 0f8307000000 jnc 136 (0x4d189968) < 129 ff2495049c184d jmp [edx*4+0x4d189c04] ;; internal reference --- > 72 83e2fc and edx,0xfc > 75 83c204 add edx,0x4 > 78 bf00000000 mov edi,(nil) ;; wasm memory reference > 83 81fa00001000 cmp edx,0x100000 ;; wasm memory size reference > 89 0f83bb020000 jnc 794 (0x5dd09c7a) > 95 8b1417 mov edx,[edi+edx*1] > 98 81fa00001000 cmp edx,0x100000 ;; wasm memory size reference > 104 0f83a5020000 jnc 787 (0x5dd09c73) > 110 0fbe1417 movsx_b edx,[edi+edx*1] > 114 83ea30 sub edx,0x30 > 117 33f6 xor esi,esi > 119 b804000000 mov eax,0x4 > 124 83fa06 cmp edx,0x6 > 127 0f8307000000 jnc 140 (0x5dd099ec) > 133 ff2495849cd05d jmp [edx*4+0x5dd09c84] ;; internal reference
,
Dec 21 2016
Hi Bradnelson, thanks for the detailed performance data and analyses above! Yes, the regression is caused by wrongly loading an immediate into a register in some asm-wasm cases. I have fixed this issue and applied the optimization for asm-wasm pipeline in https://codereview.chromium.org/2593483002/. A few improvements are observed for some asm-wasm cases. Please take a look. Thanks! |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by bradnelson@google.com
, Dec 13 2016