New issue
Advanced search Search tips

Issue 714030 link

Starred by 2 users

Issue metadata

Status: Fixed
Closed: Nov 21
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

issue 351146

Sign in to add a comment

ScopedFrameBlamer is too expensive for custom elements

Project Member Reported by, Apr 21 2017

Issue description

Chrome Version: ToT/r465173
OS: Linux/all

What steps will reproduce the problem?
(1) Observe what a tight loop like this costs:

class X extends HTMLElement {
  constructor() {
  connectedCallback() {
customElements.define('x-x', X);
let elem = new X();
for (let i = 0; i < 1000; i++) {

Here's a jsbin:,output

What is the expected result?

This is pretty fast.

What happens instead?

Custom elements are 5x+ the cost of a DIV.

I did some profiling and it seems V8ScriptRunner::CallFunction has a lot of overhead, particularly ScopedFrameBlamer. Here's a screenshot:

(This is a different workload to the code above; it is creating and then appending a ton of different elements. This is after I've done a bunch of bindings and custom element optimizations in a local patch.)

So I have two questions:

1. ScopedFrameBlamer is a pretty large outlier here compared to all the other sort of probing things we have. Event handling and other things use CallFunction and we need it to be fast. Can we afford ScopedFrameBlamer?
2. Is V8ScriptRunner::CallFunction the right thing for custom elements to be using? I noticed someone has worked around this by adding CallInternalFunction which triggers less instrumentation. If we're running a custom element callback or event handler, won't the frame already be appropriately blamed? Maybe the ScopedFrameBlamer can be moved into the caller (and maybe it could used the scoped tracing stuff instead of Enter, Exit so it doesn't look up the thread ID twice, etc.)
Components: Blink>Bindings
yukishiino mentioned jbroman might be the person most closely working on this on the bindings side.

One thing to point out about the trace is: The work on the left to create the element and the work on the right to run its connected callback if you squint is "work done by some frame" but the one on the left doesn't have this ScopedFrameBlamer thing. Assuming the blame on the left is correct (please don't make it slower!) can we use the same reasoning for the blame on the right?
I think it would make sense to introduce CallFunctionFast for custom elements, which does much less instrumentation.

The historical assumption was that CallFunction won't be frequently called. This was true in days when we had only setTimeout, event listeners etc. However, custom elements changed the game.

I think that each callback of custom elements is cheap and won't be worth instrumenting. (At the very least, it's more important that they run fast than they're properly instrumented.) So I think it would make sense to introduce CallFunctionFast for those use cases.

Note: CallInternalFunction is not intended to be a fast version of CallFunction. CallInternalFunction is a version that doesn't run microtasks (this behavior is required by the spec).

Note: I think we should also consider not invoking empty custom element callbacks.

Status: Available (was: Untriaged)
OK, I'll look at adding a CallFunctionFast. Microtask behavior of CallInternalFunction noted.

We already only invoke callbacks the author has provided; see the callers of CustomElementDefinition::Has*Callback.
Was Chrome tracing enabled when you measured this? FrameBlamer's overhead should be ~0 unless tracing is turned on.
I wouldn't say it was free, it's:

If GetFrameBlameContext() is null: 6 out of line, 2 virtual
If GetFrameBlameContext() is not null: 6 out of line, 4 virtual, plus all the work inside TRACE_EVENT_API_ADD_TRACE_EVENT and friends.

It's doing the same work repeatedly in both the constructor and the destructor too.

We expect developers to use timeline and tracing to debug performance issues, so we need to make sure that the overhead of the instrumentation isn't so high it skews all those numbers.

In general ScopedFrameBlamer seems very heavy weight for something that bottoms out in a tracing macro. Can we make it cheaper and not plumb way out through the public API? :D
I did not have tracing on, but I was doing --no-sandbox
--js-flags="--per_basic_prof"; would that do it?

+1 to esprehn's observations. It looks like the tracing thing has an RAII
thing which might do slightly less work, but because of the layers and
plumbing ScopedFrameBlamer can't easily use that.

From discussion with haraken it sounds like I should add something with
less instrumentation for these calls which happen when the frame already
has something acting on the stack. I will do that.
The layers of indirection could probably also be simpler; the LocalFrame could hold the BlameContext directly instead of having to request it through two layers of virtual methods.
FWIW we are going through a redesign of our CPU attribution probes and I'll make sure these kind of efficiency aspects are taken into account. Like discussed we can probably make things faster by caching, but it might also turn out that deterministically instrumenting paths like these just isn't feasible and we need a more statistical approach.
Awesome. Sampling FTW.
Status: Started (was: Available)
I've got a patch that makes ScopedFrameBlamer ~4x faster in the disabled case, and removes some duplicated work in enabled cases. That's not to say we won't eventually want more here, but it's a start.
Project Member

Comment 12 by, Jun 22 2017

The following revision refers to this bug:

commit 3c56c268f1477a1ab6be286a50e062592ba69d94
Author: Jeremy Roman <>
Date: Thu Jun 22 04:45:44 2017

Make ScopedFrameBlamer more lightweight, especially when tracing is off.

With this patch, non-trivial work is done only if the tracing category
"blink" (which is the one used to report this information) is enabled.
In this case, ScopedFrameBlamer need only check this and ~ScopedFrameBlamer
is trivial enough to inline.

Additionally, duplicate calls to virtual client methods are eliminated
in the case where frame blaming is enabled.

This yields a roughly 4x speedup to a ScopedFrameBlamer microbenchmark
(measured on Linux Z620), and trims about 1/3 of the time off a benchmark
that repeatedly connects a custom element (invoking a trivial
connectedCallback) based on the one in  bug 714030 .

Bug:  714030 
Change-Id: I9152cbd1f55c3fbb62117d7025c92e6476157e65
Commit-Queue: Jeremy Roman <>
Reviewed-by: Dominic Cooney <>
Reviewed-by: Kentaro Hara <>
Cr-Commit-Position: refs/heads/master@{#481438}

Status: Fixed (was: Started)

Sign in to add a comment