New issue
Advanced search Search tips

Issue 593477 link

Starred by 10 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 3
Type: Feature

Blocked on:
issue 685573
issue v8:5516



Sign in to add a comment

v8 code caching does not seem to work well on Facebook

Reported by bmau...@fb.com, Mar 9 2016

Issue description

Version: Chrome 51 (canary)
OS: OS X

What steps will reproduce the problem?
(1) Load www.facebook.com multiple times in the browser
(2) exit the browser, open about:tracing and record all enabled by default categories

What is the expected output? What do you see instead?

I would expect to see the overhead of JS parsing nearly completely eliminated due to the V8 Code Caching feature. However we still see a huge amount of time spent in parsing. I've attached two traces, one of an initial load to facebook with a clean cache and one with a well primed cache.

In the initial load, we see that when a JS file is first downloaded a V8.CompileScript occurs with a V8.Parse child that has many V8.PreParse events. These events take a non-trivial amount of time -- v8.CompileScript takes 165 ms across the entire document.

On the repeated load, V8.CompileScript has a child of V8.CompileDeserialize. Looking at the v8 source this seems to be pulling the code out of the serialized cache. We see that V8.CompileScript has been reduced to 47 ms and that there are 85 instances of V8.CompileDeserialize, suggesting that the compiled code cache is being well used. Manually looking at the remaining V8.compileScript instances they seem to be small js files (i'm guessing there's a threshold) and JS inline in the HTML. I can't find any large JS files that weren't cached.

However on the repeated load we see that V8.ParseLazy takes 376ms worth of wall clock time -- about the same as the initial load. Looking at the v8 source code ParseLazy seems to be used when a function has been found but the inside of the function has not been parsed.

At Facebook we use a transpiler to combine individual modules into a single file. For example:


__d('getActiveElement',[],function (module){
var MY_CONST = 1;
module.exports = function getActiveElement() {
  ...
};
});

This way, the value MY_CONST is private to the getActiveElement. Based on these traces it seems like v8 is caching the outline of the file, but still has to call ParseLazy to parse the inner functions.

This seems fairly substantial in terms of the overall performance of Facebook -- in this trace parsing accounted for over 10% of all CPU time on the main thread. 
 
trace_initialload.json.gz
5.6 MB Download
trace_repeat_load.json.gz
4.7 MB Download
Labels: Needs-Bisect
Owner: horo@chromium.org
horo - who is the right person to dig into this? Naively this seems like something very much worth investigating.

Comment 3 by horo@chromium.org, Mar 10 2016

Owner: yangguo@chromium.org
I think that we can't get benefits of V8 code cache well if it's an inner function.
yangguo@ knows more about it.

Comment 4 by horo@chromium.org, Mar 10 2016

Cc: horo@chromium.org
Status: Assigned (was: Untriaged)
This is due to lazy parsing. When you compile a script, only the toplevel code gets compiled. Functions contained in it are remembered, but code for it is not generated until the function is actually called. The rationale is that in many cases it makes no sense to eagerly compile functions if we don't know whether they get called at all. Compiling them eagerly increases the memory footprint and may not even help if the code gets garbage collected before it ever gets executed.

For the code cache this is of course less than ideal, since using the code cache would only help you to bypass compiling the script, but not inner functions. Lazily compiling inner functions still requires a new parsing pass.

There is an exception to lazy parsing. Functions wrapped in brackets are compiled eagerly. This is because of the IIFE pattern. So if you define your functions instead of the usual

function foo() {}

wrapped into brackets

var foo = (function foo() {})

then you can make sure that it is eagerly compiled and included in the code cache. But this is a hack that might cease working at any point. Though it has worked for the past few years.

Long term the V8 team is working on transitioning to a bytecode interpreter for initial startup. Bytecode size might have a small enough memory footprint that we always compile it eagerly.
Labels: -Type-Bug Type-Feature
Labels: -Pri-2 Pri-3
Labels: -Needs-Bisect -OS-Mac OS-All
Cc: jochen@chromium.org vogelheim@chromium.org

Comment 10 by bmau...@fb.com, Mar 10 2016

The IIFE pattern is interesting. My main worry though is that we have enough users that don't have our JS in cache that I'd hate to eagerly do work for those users in order to speed up the cache.

Would it be worth experimenting with treating all functions as IIFE for the sake of code caching? It'd be great if we could get the best of both worlds here -- lazy parsing when the user is not caching, eager compilation if the result is going to be stored.
If a function is on the critical path of starting up, then compiling it eagerly is a win regardless of whether it will end up in the cache or not. It's not possible for V8 to tell upfront which function will be on the critical path though.

The issue of eagerly compiling all functions for code caching is that compiled code takes a considerable amount of memory. Being included in the code cache, it both increases the footprint on the hard disk and takes longer to serialize and deserialize.
Just to make sure I understand, is the conclusion here that we should mark this issue as blocked on the bytecode interpreter for initial startup, or are there other shorter term approaches we're planning on exploring too?
Components: -Blink>JavaScript>Performance Blink>JavaScript
Labels: Performance
Friendly ping
Cc: yangguo@chromium.org
Owner: verwa...@chromium.org
I don't think this is actionable from a code caching point of view. We will get eager compilation in combination with code caching once ignition launches, but in the meantime, maybe there are other options to explore to speed up start up.
Cc: cbruni@chromium.org rmcilroy@chromium.org hablich@chromium.org
As Yang mentioned, we are exploring eagerly compiling everything as part of the move to our Ignition bytecode interpreter (which you can test with the --ignition flag). Our initial measurements do show that at least on cold start-up this actually hurts Facebook performance, since Facebook is nicely tuned for Chrome.

Down the road, the advantage is avoiding multiple parse cycles, compiling to bytecode is cheaper than to native code, and reduced memory footprint. The downside is that we need to compile more code (so worse for pages that ship a lot of possibly unused code) and get slower initial JS execution.

We're also looking into speeding up our parser, but that obviously won't deliver as good performance as a code-cache hit.

Another note: There are 2 types of code caches, on-disk and in-memory. The on-disk cache behaves as Yang indicated. The in-memory cache will hit if the new page is opened in the same V8 isolate as a previous page. That cache should also contain unoptimized code from a previous run even if it was lazily compiled.
Just as a follow-up: we're currently exploring solving this by generating more data in our preparser: locations of all functions, their free variables and flags. That way we never need to look again at an inner function to compile an outer function. 

The described pattern, which I agree is surely very common, is currently hitting a pretty stupid path in V8: we even fully parse the inner functions, but then don't compile them and throw away the AST we generated. As a first step we'll support it with our preparser. Even worse, every n-level-nested function is parsed like that n times. If instead we keep around info about their free variables and function boundaries, we won't have to look at them again. The final result will be similar to what we discussed during BlinkOn, except V8-side generated on-the-fly; especially in the case of warm start-up.
Just checking in - what are the next steps for this?
Blockedon: v8:5516
Here's the latest design doc on skipping inner functions:
https://docs.google.com/document/u/1/d/1TqpdGeLmURL2gc18s6PwNeyZOvayQJtJ16TCn0BEt48/edit?usp=sharing

We're planning a few sessions at BlinkOn to discuss progress on the general overhaul of the parser.

Comment 20 by marja@chromium.org, Jan 26 2017

Skipping inner functions will make parsing faster, but won't cause any more code to end up in the code cache.

At the moment there's no way to store whatever data we'd get from inner functions to speed up the subsequent parse, if we're going to generate a code cache. It's either-or.

However, what will really help here is the "incremental code caching" approach (-> vogelheim@), where we'd add functions into the code cache as we compile them, even though they weren't part of the initial compile.

Comment 21 by marja@chromium.org, Jan 26 2017

Blockedon: 685573
I think Yang's comment is still based on our assumption back then that we should just eagerly compile everything. However, that would just burn through battery and memory if large portions of shipped code actually go unused.

We do want both incremental code caching and caching of the metadata we're going to generate to avoid reparsing. It shouldn't be either-or in the long run.

Comment 23 by wen...@gmail.com, Feb 23 2017

The design doc posted in #19 is not public. Could it be made public accessible? Thanks in advance.
Labels: -Performance Performance-Loading

Sign in to add a comment