New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 673861 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Last visit > 30 days ago
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression



Sign in to add a comment

2.1%-51.5% regression in asm.js perf at 41671:41671

Project Member Reported by bradnelson@google.com, Dec 13 2016

Issue description

This seems to regress a range of benchmarks on ia32:
https://chromium.googlesource.com/v8/v8/+/5c1babcc16e1b568d88af9eb4bc84e050e631039

Including:
Emscripten for Fannkuch, Zlib, MemOps, Life, Bullet
AreWeFastYet for BulletLoadTime, Fasta, Fannkuch
JetStream for towers.c, quicksort.c, gcc-loops.cpp, bigfib.cpp


 
Cc: bmeu...@chromium.org
Hi Bradnelson, I cannot reproduce these regression on a HSW Ubuntu 14.04 machine. And AreWeFastYes shows improvement of jetstream-gcc-loops.cpp on a Win 8 32-bit machine: https://arewefastyet.com/#machine=17&view=breakdown&suite=jetstream.
Could you please share some info about what platform did these regression occur on and were any d8 runtime flags used? Very appreciate your help and I'm looking forward to figure out the reason for these regressions. Thank you!

Comment 4 by titzer@chromium.org, Dec 15 2016

Hi Shiyu,

Our internal Chromium performance monitoring showed a big regression on the Emscripten MemOps and Life benchmarks on ia32 with the change. That's with the --validate-asm flag, btw. Fasta also seems affected.
FWIW, the revert seems to have regressed (i.e. these had been improved by the change), these items:

		673861 ❌  	41696	internal.client.v8	ia32	v8	Embenchen-asm_wasm/ZLib	2.2%
		673861 ❌  	41696	internal.client.v8	ia32	v8	Massive-asm_wasm/SQLite	4.8%
		673861 ❌  	41696	internal.client.v8	ia32	v8	Embenchen-asm_wasm/Corrections	1.5%
		673861 ❌  	41696	internal.client.v8	ia32	v8	Emscripten-asm_wasm/Primes	6.6%
		673861 ❌  	41696	internal.client.v8	ia32	v8	JetStream-asm_wasm/dry.c	20.5%

Whereas, the original change regressed these:
		673861 ❌  	41671	internal.client.v8	ia32	v8	Emscripten-asm_wasm/MemOps	50.7%
		673861 ❌  	41671	internal.client.v8	ia32	v8	Emscripten-asm_wasm/Fasta	8.2%
		673861 ❌  	41671	internal.client.v8	ia32	v8	AreWeFastYet-asm_wasm/Fasta	7.7%
		673861 ❌  	41671	internal.client.v8	ia32	v8	JetStream-asm_wasm/towers.c	4.1%
		673861 ❌  	41671	internal.client.v8	ia32	v8	Emscripten-asm_wasm/ZLib	2.4%
		673861 ❌  	41671	internal.client.v8	ia32	v8	Emscripten-asm_wasm/Life	5.2%
		673861 ❌  	41671	internal.client.v8	ia32	v8	JetStream-asm_wasm/bigfib.cpp	2.4%
		673861 ❌  	41671	internal.client.v8	ia32	v8	AreWeFastYet-asm_wasm/Life	9.6%
		673861 ❌  	41671	internal.client.v8	ia32	v8	JetStream-asm_wasm/gcc-loops.cpp	3.3%
		673861 ❌  	41671	internal.client.v8	ia32	v8	JetStream-asm_wasm/quicksort.c	2.6%
		673861 ❌  	41671	internal.client.v8	ia32	v8	AreWeFastYet-asm_wasm/BulletLoadTime	2.3%
		673861 ❌  	41671	internal.client.v8	ia32	v8	AreWeFastYet-asm_wasm/MemOps	51.5%
		673861 ❌  	41671	internal.client.v8	ia32	v8	Emscripten-asm_wasm/Bullet	2.1%
		673861 ❌  	41671	internal.client.v8	ia32	v8	AreWeFastYet-asm_wasm/Skinning	1.3%
		673861 ❌  	41671	internal.client.v8	ia32	v8	Embenchen-asm_wasm/MemOps	50.7%
		673861 ❌  	41671	internal.client.v8	ia32	v8	Emscripten-asm_wasm/Fannkuch	4.8%
		673861 ❌  	41671	internal.client.v8	ia32	v8	Embenchen-asm_wasm/Fasta	7.7%
		673861 ❌  	41671	internal.client.v8	ia32	v8	JetStream-Ignition/towers.c	4.0%
		673861 ❌  	41671	internal.client.v8	ia32	v8	Embenchen-asm_wasm/Fannkuch	3.9%
		673861 ❌  	41671	internal.client.v8	ia32	v8	AreWeFastYet-asm_wasm/Fannkuch	4.1%

Digging into memops, I think some case with a zero immediate isn't getting handled right.
It looks like instead its ending up being first loaded into a register (78 below)

<  67 83e2fc and edx,0xfc
<  70 83c204 add edx,0x4
<  73 81fa00001000 cmp edx,0x100000 ;; wasm memory size reference
<  79 0f83c6020000 jnc 795 (0x4d189bfb)                                                                             
<  85 8b9200000000 mov edx,[edx+0x0] ;; wasm memory reference                                                       
<  91 81fa00001000 cmp edx,0x100000 ;; wasm memory size reference                                                   
<  97 0f83ad020000 jnc 788 (0x4d189bf4)                                                                             
<  103 0fbe9200000000 movsx_b edx,[edx+0x0] ;; wasm memory reference                                                
<  110 83ea30 sub edx,0x30                                                                                          
<  113 33f6 xor esi,esi                                                                                             
<  115 b804000000 mov eax,0x4                                                                                       
<  120 83fa06 cmp edx,0x6                                                                                           
<  123 0f8307000000 jnc 136 (0x4d189968)                                                                            
<  129 ff2495049c184d jmp [edx*4+0x4d189c04] ;; internal reference                                                  
---                                                                                                                 
>  72 83e2fc and edx,0xfc                                                                                           
>  75 83c204 add edx,0x4                                                                                            
>  78 bf00000000 mov edi,(nil) ;; wasm memory reference                                                             
>  83 81fa00001000 cmp edx,0x100000 ;; wasm memory size reference                                                   
>  89 0f83bb020000 jnc 794 (0x5dd09c7a)                                                                             
>  95 8b1417 mov edx,[edi+edx*1]                                                                                    
>  98 81fa00001000 cmp edx,0x100000 ;; wasm memory size reference                                                   
>  104 0f83a5020000 jnc 787 (0x5dd09c73)                                                                            
>  110 0fbe1417 movsx_b edx,[edi+edx*1]                                                                             
>  114 83ea30 sub edx,0x30                                                                                          
>  117 33f6 xor esi,esi                                                                                             
>  119 b804000000 mov eax,0x4                                                                                       
>  124 83fa06 cmp edx,0x6                                                                                           
>  127 0f8307000000 jnc 140 (0x5dd099ec)                                                                            
>  133 ff2495849cd05d jmp [edx*4+0x5dd09c84] ;; internal reference   
Hi Bradnelson, thanks for the detailed performance data and analyses above!

Yes, the regression is caused by wrongly loading an immediate into a register in some asm-wasm cases. I have fixed this issue and applied the optimization for asm-wasm pipeline in https://codereview.chromium.org/2593483002/. A few improvements are observed for some asm-wasm cases.

Please take a look. Thanks!

Sign in to add a comment