New issue
Advanced search Search tips

Issue 658131 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

Script fragments execution degrades the page-loading performance

Reported by lingyun....@intel.com, Oct 21 2016

Issue description

UserAgent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36
Platform: 8609.0.0 cyan

Steps to reproduce the problem:
1. Use Telemetry page_cycler to run the Gmail page in pageset top_25
2. Get the results of warm_times and the corresponding profiling traces 

What is the expected behavior?
Possibly merge consecutive script fragments to execute them together at once and reduce the execution overhead.

What went wrong?
From the attached trace of Gmail loading, we could observe that when parsing the same url resource, there are multiple times of executing 'EvaluateScript' due to consecutive script fragments in the way of '<script>..</script><script>..</script>', which introduces extra overhead in executing scripts. 

Did this work before? N/A 

Chrome version: 55.0.2841.0  Channel: n/a
OS Version: 
Flash Version:
 
gmail_trace.png
735 KB View Download
Cc: hong.zh...@intel.com
The CL to reduce scripts execution overhead and improve page-loading performance: 
https://codereview.chromium.org/2438263002

The attached Gmail trace is captured with the CL, which shows that consecutive script fragments are merged and executed at once.

We have tested the CL on Cyan ChromeBook, using Telemetry page_cycler to run the Gmail in pageset top_25, the results show ~20% performance improvement in warm_times.

gmail_trace_with_CL.png
543 KB View Download
add telemetry detailed results
results-without-CL.html
131 KB View Download
results-with-CL.html
133 KB View Download
with warm_times metrics, telemetry can deliver 1783.11 without the patch, and 1358.00 with it.
Add gmail html file and trace files w/ and w/o patch. Below are some break-down statistics (in ms).

Function	  w/ patch w/o patch    Function w/ patch w/o patch
parseHTML	  620.334  802.416			
HTMLScriptRunner
::execute	  609.045  784.206			
evaluateScript	  606.036  776.525			
v8.compile	  582.905  562.822	v8.run	  22.875   207.908
gmail_js.html
1.1 MB View Download
trace_gmail_js_withoutPatch.json.gz
2.8 MB Download
trace_gmail_js_withPatch.json.gz
3.6 MB Download
Add test html file and trace files w/ and w/o patch. Below are some break-down statistics (in ms).

Function	  w/ patch w/o patch	Function w/ patch  w/o patch			
HTMLScriptRunner
::execute	   1.349    45.274			
evaluateScript	   1.267    38.781			
v8.compile	   1.016    22.539	 v8.run	   0.198    10.044
Test-consecutiveScript.html
6.1 KB View Download
trace_consecutiveScripts_withoutPatch.json.gz
2.6 MB Download
trace_consecutiveScripts_withPatch.json.gz
3.0 MB Download
Add trace files of loading gmail w/ and w/o patch on Chromebook Cyan.
gmail_trace_without_patch.png
772 KB View Download
gmail_trace_without_patch.zip
4.0 MB Download
gmail_trace_with_patch.png
756 KB View Download
gmail_trace_with_patch.zip
3.7 MB Download
Cc: kinuko@chromium.org csharrison@chromium.org
Components: Blink>HTML>Script Blink>Loader
Adding Blink>Loader and Blink>HTML>Script. Thanks for the traces.

Comment 9 by kinuko@chromium.org, Nov 11 2016

Cc: jochen@chromium.org
Components: Blink>JavaScript
+jochen@ hoping he has more insights, it looks the major diff comes from v8.execute.
there is a constant overhead per script tag, and doing this once as opposed to several time has a noticeable difference.

I'm not sure, however, whether it's actually spec compliant to merge multiple script blocks, and I'd rather not add a one-off thing just for one test - web sites don't have to split up their script like this for no good reason.

Comment 11 by zcorpan@gmail.com, Nov 13 2016

I have not checked the CL, but a simple example to consider:

<div id=x>
 <script> x.remove() </script><script>alert('not reached')</script>
</div>

The second script should not run, because when the HTML parser inserts it (into the div), it's not in the Document, so it does not execute.

It's also easy to come up with examples using document.write() etc where either execution order would change or scripts that should not execute do, etc.

Also, what if the first script fails to compile? Or does compile but throws?
Labels: -Pri-2 Pri-3
Owner: hong.zh...@intel.com
Status: Assigned (was: Unconfirmed)
Zheng, I'm going to assign this to you as a proxy for your colleague Cai.

Cai, I think this investigation is worth pursuing. It would be good to understand the overhead of script tags and whether any of that work can be done more efficiently. I look forward to seeing your results investigating v8.execute, etc. as you mention in the review thread.

I strongly suspect that the semantic change of merging script tags will break some sites which have come to rely on the exact structure of the document, so we don't want to take this change as-is. There must be another approach that is both efficient and compatible.

It would be interesting to investigate this change on other hardware. I doubt it is Chrome OS specific but it would be interesting to know how the performance scales down to phones as well.
Thanks for your comments.

Since the major difference lies in v8.execute, I've collected the 'total_compiled_code_size' to see if the difference in compiled code leads to the performance gap in v8.execute, below are some data by running gmail_js.html locally.
                     w/ patch	w/o patch
parseHTML(ms)         613.364	815.857					
v8.compile(ms)        577.971	596.372	         
v8.execute(ms)        18.418	191.838
total_compiled_
code_size             934666    1013166

The v8.execute w/ patch is x10 faster (18.418 vs. 191.838) and the compiled code size is ~8% smaller than those of w/o patch. I tried to print the compiled code for comparison using CodeGenerator::PrintCode, but failed during building Chromium binary. I wonder if my direction is correct by analyzing the difference in compiled code from v8.compile. If yes, do I have a better way to get the compiled code instead of using CodeGenerator::PrintCode? jochen@ domoinicc@

Besides, I also collected the telemetry data of loading Gmail on Galaxy S6, the warm_times w/ and w/o patch are 2126ms and 2304ms respectively, which also demonstrates ~10% performance gap, so I think this may be a common issue.
A minor correction: the gmail loading warm_times for Samsung Galaxy S6 w/ and w/o patch should be 2126ms and 2403ms, respectively. 
Sorry for the typo in last comment.
I also do a break-down in v8.execute, the hotspot and the major performance gap is 'CALL_GENERATED_CODE', data shown as below. 
                        w/ patch	w/o patch
parseHTML(ms)            631.006	810.314					
v8.compile(ms)           595.478	593.862	         
v8.execute(ms)           18.697	        187.589
CALL_GENERATED_CODE(ms)  9.123          168.122

But now we are stuck in here and don't know how to have a deeper break-down inside 'CALL_GENERATED_CODE'. You guys are experts and may have more insights on this, hope we could get some help.
Interesting numbers, thank you for sharing.

Toon and Camillo gave a talk about more detailed V8 profiling at BlinkOn 6, there may be something useful here: https://www.youtube.com/watch?v=xCx4uC7mn6Y

Here's some instructions: https://github.com/v8/v8/wiki/Using%20V8%E2%80%99s%20internal%20profiler
my suspicion is that CALL_GENERATED_CODE does almost nothing except for the last <script> tag, we just invoke it 100x as often without the patch

Sign in to add a comment