Script fragments execution degrades the page-loading performance
Reported by
lingyun....@intel.com,
Oct 21 2016
|
||||
Issue descriptionUserAgent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36 Platform: 8609.0.0 cyan Steps to reproduce the problem: 1. Use Telemetry page_cycler to run the Gmail page in pageset top_25 2. Get the results of warm_times and the corresponding profiling traces What is the expected behavior? Possibly merge consecutive script fragments to execute them together at once and reduce the execution overhead. What went wrong? From the attached trace of Gmail loading, we could observe that when parsing the same url resource, there are multiple times of executing 'EvaluateScript' due to consecutive script fragments in the way of '<script>..</script><script>..</script>', which introduces extra overhead in executing scripts. Did this work before? N/A Chrome version: 55.0.2841.0 Channel: n/a OS Version: Flash Version:
,
Oct 21 2016
The CL to reduce scripts execution overhead and improve page-loading performance: https://codereview.chromium.org/2438263002 The attached Gmail trace is captured with the CL, which shows that consecutive script fragments are merged and executed at once. We have tested the CL on Cyan ChromeBook, using Telemetry page_cycler to run the Gmail in pageset top_25, the results show ~20% performance improvement in warm_times.
,
Oct 25 2016
add telemetry detailed results
,
Oct 25 2016
with warm_times metrics, telemetry can deliver 1783.11 without the patch, and 1358.00 with it.
,
Nov 7 2016
Add gmail html file and trace files w/ and w/o patch. Below are some break-down statistics (in ms). Function w/ patch w/o patch Function w/ patch w/o patch parseHTML 620.334 802.416 HTMLScriptRunner ::execute 609.045 784.206 evaluateScript 606.036 776.525 v8.compile 582.905 562.822 v8.run 22.875 207.908
,
Nov 7 2016
Add test html file and trace files w/ and w/o patch. Below are some break-down statistics (in ms). Function w/ patch w/o patch Function w/ patch w/o patch HTMLScriptRunner ::execute 1.349 45.274 evaluateScript 1.267 38.781 v8.compile 1.016 22.539 v8.run 0.198 10.044
,
Nov 11 2016
Add trace files of loading gmail w/ and w/o patch on Chromebook Cyan.
,
Nov 11 2016
Adding Blink>Loader and Blink>HTML>Script. Thanks for the traces.
,
Nov 11 2016
+jochen@ hoping he has more insights, it looks the major diff comes from v8.execute.
,
Nov 12 2016
there is a constant overhead per script tag, and doing this once as opposed to several time has a noticeable difference. I'm not sure, however, whether it's actually spec compliant to merge multiple script blocks, and I'd rather not add a one-off thing just for one test - web sites don't have to split up their script like this for no good reason.
,
Nov 13 2016
I have not checked the CL, but a simple example to consider:
<div id=x>
<script> x.remove() </script><script>alert('not reached')</script>
</div>
The second script should not run, because when the HTML parser inserts it (into the div), it's not in the Document, so it does not execute.
It's also easy to come up with examples using document.write() etc where either execution order would change or scripts that should not execute do, etc.
Also, what if the first script fails to compile? Or does compile but throws?
,
Nov 21 2016
Zheng, I'm going to assign this to you as a proxy for your colleague Cai. Cai, I think this investigation is worth pursuing. It would be good to understand the overhead of script tags and whether any of that work can be done more efficiently. I look forward to seeing your results investigating v8.execute, etc. as you mention in the review thread. I strongly suspect that the semantic change of merging script tags will break some sites which have come to rely on the exact structure of the document, so we don't want to take this change as-is. There must be another approach that is both efficient and compatible. It would be interesting to investigate this change on other hardware. I doubt it is Chrome OS specific but it would be interesting to know how the performance scales down to phones as well.
,
Nov 22 2016
Thanks for your comments.
Since the major difference lies in v8.execute, I've collected the 'total_compiled_code_size' to see if the difference in compiled code leads to the performance gap in v8.execute, below are some data by running gmail_js.html locally.
w/ patch w/o patch
parseHTML(ms) 613.364 815.857
v8.compile(ms) 577.971 596.372
v8.execute(ms) 18.418 191.838
total_compiled_
code_size 934666 1013166
The v8.execute w/ patch is x10 faster (18.418 vs. 191.838) and the compiled code size is ~8% smaller than those of w/o patch. I tried to print the compiled code for comparison using CodeGenerator::PrintCode, but failed during building Chromium binary. I wonder if my direction is correct by analyzing the difference in compiled code from v8.compile. If yes, do I have a better way to get the compiled code instead of using CodeGenerator::PrintCode? jochen@ domoinicc@
Besides, I also collected the telemetry data of loading Gmail on Galaxy S6, the warm_times w/ and w/o patch are 2126ms and 2304ms respectively, which also demonstrates ~10% performance gap, so I think this may be a common issue.
,
Nov 23 2016
A minor correction: the gmail loading warm_times for Samsung Galaxy S6 w/ and w/o patch should be 2126ms and 2403ms, respectively. Sorry for the typo in last comment.
,
Nov 30 2016
I also do a break-down in v8.execute, the hotspot and the major performance gap is 'CALL_GENERATED_CODE', data shown as below.
w/ patch w/o patch
parseHTML(ms) 631.006 810.314
v8.compile(ms) 595.478 593.862
v8.execute(ms) 18.697 187.589
CALL_GENERATED_CODE(ms) 9.123 168.122
But now we are stuck in here and don't know how to have a deeper break-down inside 'CALL_GENERATED_CODE'. You guys are experts and may have more insights on this, hope we could get some help.
,
Dec 5 2016
Interesting numbers, thank you for sharing. Toon and Camillo gave a talk about more detailed V8 profiling at BlinkOn 6, there may be something useful here: https://www.youtube.com/watch?v=xCx4uC7mn6Y Here's some instructions: https://github.com/v8/v8/wiki/Using%20V8%E2%80%99s%20internal%20profiler
,
Dec 5 2016
my suspicion is that CALL_GENERATED_CODE does almost nothing except for the last <script> tag, we just invoke it 100x as often without the patch |
||||
►
Sign in to add a comment |
||||
Comment 1 by hong.zh...@intel.com
, Oct 21 2016