Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 8 users
Status: Started
Owner:
Last visit > 30 days ago
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Bug


Sign in to add a comment
Creating polymer elements should be similarly as expensive as <input>
Project Member Reported by esprehn@chromium.org, Mar 11 2014 Back to list
This is a tracking bug for making polymer element creation faster.

While we likely can't get to exactly the same speed, we should be able to get to a point where the difference is <= 25%.
 
Comment 1 Deleted
Comment 3 Deleted
Comment 4 by adamk@chromium.org, Mar 11 2014
Labels: hotlist-toolkit
Comment 5 by adamk@chromium.org, Mar 11 2014
Cc: dominicc@chromium.org kenjibaheux@chromium.org
 Issue 351113  has been merged into this issue.
Comment 6 by adamk@chromium.org, Mar 11 2014
Here's a test case from sorvell (originally attached to  issue 351113 )
input.html
993 bytes View Download
Labels: -Cr-Blink Cr-Blink-WebComponents
Status: Available
Cc: -kenjibaheux@chromium.org
Labels: -hotlist-toolkit hotlist-Toolkit
Labels: -hotlist-toolkit Hotlist-Polymer
Components: -Blink>WebComponents Blink>HTML>CustomElements Blink>DOM>ShadowDOM
Blockedon: 659918
We probably need to revisit this with a new Polymer 2.0 benchmark. There's also something called polymetrics which might be interesting.
Owner: dominicc@chromium.org
Status: Started
Here's a port of the attachment in comment 6 to "v1". We have some work to do; it takes 3-5x the time of input.
index.html
883 bytes View Download
This needs digging into with the C++ and V8 profilers. Using the tracing from  Issue 350785  shifts the cost profile a bit; x-input is a factor of ~2x slower again with tracing on.
Poked at this a bit but my only conclusion is I need to work the benchmark to give statistically reliable results.

Some random observations:
- GC root finding is not as efficient as it could be for shadow roots, some ideas at https://codereview.chromium.org/2463293003 but blink::Node::parentOrShadowHostOrTemplateHostNodeForDocumentFragment should be inlined
- CEv1 allocates more for reactions than CEv0 does, maybe it shouldn't; this benchmark has 5k elements + shadow roots alive so we spend a lot of time tracing live elements, less memory pressure would help
- definition finding maps through strings when it could use numeric indices
- callAsFunction, callAsConstructor look "fat." linux perf indicates they may call v8_inspector::V8Debugger::getGroupId 3? times... I don't understand where the third call. I also wonder if ending function eval could have a fastpath for when the debugger agent has not changed.
- the memory barriers from tracing look expensive? I may want to back out tracing HTMLConstructor and maybe the reaction queues, since in theory you could cover them with performance marks around super() and looking at JS call stacks. Not sure.
One more random thought: There used to be an optimization called NewObject, maybe it could apply to some createElement cases. But staring at the bindings code all I see NewObject doing is a debug assertion about new objects.
Cc: haraken@chromium.org
Summary: Creating polymer elements should be similarly as expensive as <input> (was: [META] Creating polymer elements should be similarly as expensive as <input>)
Retitling because this is no longer 'meta'.

Another way to look at this is 'how much can you do' until you're slower than input?

On my beefy desktop, it's this:

class MyInput extends HTMLElement {
  constructor() {
    super();
  }
};

The budget may already be (just) blown at this point, anything more than this pushes you over the edge. Each of these individual things blows the budget. Here they are in order of increasing cost:

- set an expando in the constructor (probably blows the budget)
- create a DIV and drop it on the floor
- no-op connected callback
- attach open ShadowRoot

The gestalt from profiles is that we spend a lot of time in memory management related gubbins, so using less memory would be one place to start. INPUT has the distinct advantage of not requiring wrappers for its shadow DOM or shadow content.

Attached is a new copy of the benchmark I'm using which does resampling to compute a confidence interval and things.
index.html
6.0 KB View Download
Can we get some performance breakdown per method (like https://bugs.chromium.org/p/chromium/issues/detail?id=636655#c7)?

keishi@ would know how to get the performance breakdown on Mac.

Here are some top-down, input.txt is an INPUT element; cev1nosd.txt is basically what you see in Comment 17. Both in a driver loop like this:

function time(tag) {
  var d = document.createElement('div');
  document.body.appendChild(d);
  var start = performance.now();
  for (let i = 0; i < 1000; i++) {
    d.appendChild(document.createElement(tag));
  }
  let time = (performance.now() - start);
  d.remove();
  return time;
}

Here's the mean measurements:

input: 4.9011 [4.2894-5.7474ms, 95% CI]
z-input: 6.1008 [5.7027-6.6073ms, 95% CI]

Note: z-input = "cev1nosd".

Note that these *may* take the same time--the 95% CIs overlap--but remember that INPUT is creating INPUT, ShadowRoot and DIV, wiring them up, and doing form association on attach. z-input is just creating Z-INPUT and no other elements; no connected callback; etc.

This should be much cheaper!
And here's the profiles:
profiles.tar.bz2
234 KB Download
Brief update, I have been looking at this again. Here's where a custom element with a constructor stands versus DIV:

z-input: 3.0720 [2.6697-3.7316ms, 95% CI] (z-input is *just* one element)
div: 0.7525 [0.6796-0.8665ms, 95% CI]

Most of the costs are associated with garbage collection. All custom elements have to have rare data which brings with it a ton more work.
Here's some ideas I will explore:

- CustomElementDefinition could cache its QualifiedName so CreateElementForConstructor doesn't go through interning one each time.
- V8PerIsolateData::HasInstance could have a cache instead of doing lookups each time.
- One or two reactions could be stored inline and then spilled to the heap so that typical cases don't heap allocate queues.
- Eliminate tracing around fine-grained operations like HtmlConstructor. Probably there's cheaper and more useful profiling we could do anyway by aggregating by type in the backend.
- Investigate whether there's any optimizations at creation time. We know the element is going to end up with rare data, for instance.
I should add that I'm doing some other things which should help over in Issue 710184. For example it can find the custom element definition for a constructor with an index lookup instead of string hashing and lookups.
One more random idea: We know at creation time that these could be/will be custom elements. We could have a subtype of HTMLElement with additional space for the custom element state. We would free up space in ElementRareData and avoid allocating rare data for custom elements--the question is, how many?

This could complicate a whole lot of other stuff; it's a trick you only get to play once.

It would be pertinent to see how many other NRD/ERD fields are set by typical custom elements (other than the layout object pointer which is spilled to it.)
Status update--I've done some prototyping. TL;DR:

- I have some hope we can make things faster.
- You should probably call constructors directly and avoid createElement, if you can, until the bindings are faster.
- Current WIP creating and appending (but not laying out) a custom element is still ~2.5x the cost of a DIV.
- Current WIP, benchmark in comment 13, we're roughly 3x the cost of INPUT, down from 4x on 57.0.2987.133. However I haven't done any callback optimization yet.

Details:

I uploaded the WIP to https://chromium.googlesource.com/experimental/chromium/src refs/wip/dominicc/faster-custom-elements.

Here's the average time to create one element and append it to the DOM. The definition of my component is:

class MyComponent extends HTMLElement {
  constructor() {
    super();
  }
}

57.0.2987.133 (current stable):

input: 0.003990 [0.003884-0.004099ms, 95% CI]
div: 0.000873 [0.000851-0.000900ms, 95% CI]
my-component: 0.003105 [0.003003-0.003240ms, 95% CI]
call constructor: 0.002132 [0.001977-0.002340ms, 95% CI]

Before, r465173:

input: 0.003187 [0.003083-0.003290ms, 95% CI]
div: 0.000602 [0.000588-0.000619ms, 95% CI]
my-component: 0.002648 [0.002473-0.002843ms, 95% CI]
call constructor: 0.001863 [0.001656-0.002160ms, 95% CI]

After:

input: 0.003382 [0.003272-0.003500ms, 95% CI]
div: 0.000595 [0.000584-0.000607ms, 95% CI]
my-component: 0.002138 [0.001999-0.002309ms, 95% CI]
call constructor: 0.001372 [0.001250-0.001494ms, 95% CI]

(Treat these CIs with care, because I'm creating and appending 10,000 elements and then dividing that aggregate cost by 10,000. It's just slightly more convenient for comparing results when I dial the number of elements up and down. These all append but don't do layout.)

"After" has the following optimizations:

- Only set prototypes if the author changed the constructor's prototype.
- Don't allocate rare data for custom elements; we know at creation time they're custom elements. (Only v1 autonomous custom elements.)
- Custom elements cache and reuse qualified names.
- Don't trace in the built-in HTMLElement (ie super()).
- Cache the JavaScript string "prototype"; many things use it.
- Map constructors to their definitions with a private property that's an index into the registry's list of definitions, instead of looking them up by name.

The next things on the list are more in bindings, but it would be nice if:

- Thunking was faster generally. There's a big difference between createElement('my-component') and new MyComponent just because createElement has to thunk into and out of C++ one extra time.
- Getting the current window (LocalDOMWindow::From) is expensive. I think I can cache that.

I will probably broaden my benchmark a bit soon to include a callback.
The rare data is interesting, <input> has a ShadowRoot so it allocates a RareData, ElementShadow and ShadowRoot object. That should be more expensive than a custom element without a ShadowRoot in terms of memory allocations.
Blockedon: 714030
Yes. I think vanilla component/DIV is a useful comparison and component w/ shadow root/INPUT is a useful comparison.

I am going to look at callbacks next. I have discovered that frame blamer is super expensive and filed Issue 714030.

I think it could make sense to allocate the space for the inevitable ShadowRoot* and needs distribution bit in the inline component space. I need to do some profiling of real components and see if they have ERD for ShadowRoots or also for other reasons.
Sign in to add a comment