New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 10 users

Issue metadata

Status: Available
Owner: ----
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Bug

Sign in to add a comment

Issue 351146: Creating polymer elements should be similarly as expensive as <input>

Reported by, Mar 11 2014 Project Member

Issue description

This is a tracking bug for making polymer element creation faster.

While we likely can't get to exactly the same speed, we should be able to get to a point where the difference is <= 25%.

Comment 1 Deleted

Comment 3 Deleted

Comment 4 by, Mar 11 2014

Labels: hotlist-toolkit

Comment 5 by, Mar 11 2014

 Issue 351113  has been merged into this issue.

Comment 6 by, Mar 11 2014

Here's a test case from sorvell (originally attached to  issue 351113 )
993 bytes View Download

Comment 7 by, Dec 11 2015

Labels: -Cr-Blink Cr-Blink-WebComponents
Status: Available

Comment 8 by, Jun 24 2016

Labels: -hotlist-toolkit hotlist-Toolkit

Comment 9 by, Aug 10 2016

Labels: -hotlist-toolkit Hotlist-Polymer

Comment 10 by, Oct 12 2016

Components: -Blink>WebComponents Blink>HTML>CustomElements Blink>DOM>ShadowDOM

Comment 11 by, Oct 27 2016

Blockedon: 659918

Comment 12 by, Nov 1 2016

We probably need to revisit this with a new Polymer 2.0 benchmark. There's also something called polymetrics which might be interesting.

Comment 13 by, Nov 1 2016

Status: Started (was: Available)
Here's a port of the attachment in comment 6 to "v1". We have some work to do; it takes 3-5x the time of input.
883 bytes View Download

Comment 14 by, Nov 1 2016

This needs digging into with the C++ and V8 profilers. Using the tracing from  Issue 350785  shifts the cost profile a bit; x-input is a factor of ~2x slower again with tracing on.

Comment 15 by, Nov 1 2016

Poked at this a bit but my only conclusion is I need to work the benchmark to give statistically reliable results.

Some random observations:
- GC root finding is not as efficient as it could be for shadow roots, some ideas at but blink::Node::parentOrShadowHostOrTemplateHostNodeForDocumentFragment should be inlined
- CEv1 allocates more for reactions than CEv0 does, maybe it shouldn't; this benchmark has 5k elements + shadow roots alive so we spend a lot of time tracing live elements, less memory pressure would help
- definition finding maps through strings when it could use numeric indices
- callAsFunction, callAsConstructor look "fat." linux perf indicates they may call v8_inspector::V8Debugger::getGroupId 3? times... I don't understand where the third call. I also wonder if ending function eval could have a fastpath for when the debugger agent has not changed.
- the memory barriers from tracing look expensive? I may want to back out tracing HTMLConstructor and maybe the reaction queues, since in theory you could cover them with performance marks around super() and looking at JS call stacks. Not sure.

Comment 16 by, Nov 1 2016

One more random thought: There used to be an optimization called NewObject, maybe it could apply to some createElement cases. But staring at the bindings code all I see NewObject doing is a debug assertion about new objects.

Comment 17 by, Nov 2 2016

Summary: Creating polymer elements should be similarly as expensive as <input> (was: [META] Creating polymer elements should be similarly as expensive as <input>)
Retitling because this is no longer 'meta'.

Another way to look at this is 'how much can you do' until you're slower than input?

On my beefy desktop, it's this:

class MyInput extends HTMLElement {
  constructor() {

The budget may already be (just) blown at this point, anything more than this pushes you over the edge. Each of these individual things blows the budget. Here they are in order of increasing cost:

- set an expando in the constructor (probably blows the budget)
- create a DIV and drop it on the floor
- no-op connected callback
- attach open ShadowRoot

The gestalt from profiles is that we spend a lot of time in memory management related gubbins, so using less memory would be one place to start. INPUT has the distinct advantage of not requiring wrappers for its shadow DOM or shadow content.

Attached is a new copy of the benchmark I'm using which does resampling to compute a confidence interval and things.
6.0 KB View Download

Comment 18 by, Nov 2 2016

Can we get some performance breakdown per method (like

keishi@ would know how to get the performance breakdown on Mac.

Comment 19 by, Nov 4 2016

Here are some top-down, input.txt is an INPUT element; cev1nosd.txt is basically what you see in Comment 17. Both in a driver loop like this:

function time(tag) {
  var d = document.createElement('div');
  var start =;
  for (let i = 0; i < 1000; i++) {
  let time = ( - start);
  return time;

Here's the mean measurements:

input: 4.9011 [4.2894-5.7474ms, 95% CI]
z-input: 6.1008 [5.7027-6.6073ms, 95% CI]

Note: z-input = "cev1nosd".

Note that these *may* take the same time--the 95% CIs overlap--but remember that INPUT is creating INPUT, ShadowRoot and DIV, wiring them up, and doing form association on attach. z-input is just creating Z-INPUT and no other elements; no connected callback; etc.

This should be much cheaper!

Comment 20 by, Nov 4 2016

And here's the profiles:
234 KB Download

Comment 21 by, Apr 17 2017

Brief update, I have been looking at this again. Here's where a custom element with a constructor stands versus DIV:

z-input: 3.0720 [2.6697-3.7316ms, 95% CI] (z-input is *just* one element)
div: 0.7525 [0.6796-0.8665ms, 95% CI]

Most of the costs are associated with garbage collection. All custom elements have to have rare data which brings with it a ton more work.

Comment 22 by, Apr 19 2017

Here's some ideas I will explore:

- CustomElementDefinition could cache its QualifiedName so CreateElementForConstructor doesn't go through interning one each time.
- V8PerIsolateData::HasInstance could have a cache instead of doing lookups each time.
- One or two reactions could be stored inline and then spilled to the heap so that typical cases don't heap allocate queues.
- Eliminate tracing around fine-grained operations like HtmlConstructor. Probably there's cheaper and more useful profiling we could do anyway by aggregating by type in the backend.
- Investigate whether there's any optimizations at creation time. We know the element is going to end up with rare data, for instance.

Comment 23 by, Apr 19 2017

I should add that I'm doing some other things which should help over in Issue 710184. For example it can find the custom element definition for a constructor with an index lookup instead of string hashing and lookups.

Comment 24 by, Apr 19 2017

One more random idea: We know at creation time that these could be/will be custom elements. We could have a subtype of HTMLElement with additional space for the custom element state. We would free up space in ElementRareData and avoid allocating rare data for custom elements--the question is, how many?

This could complicate a whole lot of other stuff; it's a trick you only get to play once.

It would be pertinent to see how many other NRD/ERD fields are set by typical custom elements (other than the layout object pointer which is spilled to it.)

Comment 25 by, Apr 20 2017

Status update--I've done some prototyping. TL;DR:

- I have some hope we can make things faster.
- You should probably call constructors directly and avoid createElement, if you can, until the bindings are faster.
- Current WIP creating and appending (but not laying out) a custom element is still ~2.5x the cost of a DIV.
- Current WIP, benchmark in comment 13, we're roughly 3x the cost of INPUT, down from 4x on 57.0.2987.133. However I haven't done any callback optimization yet.


I uploaded the WIP to refs/wip/dominicc/faster-custom-elements.

Here's the average time to create one element and append it to the DOM. The definition of my component is:

class MyComponent extends HTMLElement {
  constructor() {

57.0.2987.133 (current stable):

input: 0.003990 [0.003884-0.004099ms, 95% CI]
div: 0.000873 [0.000851-0.000900ms, 95% CI]
my-component: 0.003105 [0.003003-0.003240ms, 95% CI]
call constructor: 0.002132 [0.001977-0.002340ms, 95% CI]

Before, r465173:

input: 0.003187 [0.003083-0.003290ms, 95% CI]
div: 0.000602 [0.000588-0.000619ms, 95% CI]
my-component: 0.002648 [0.002473-0.002843ms, 95% CI]
call constructor: 0.001863 [0.001656-0.002160ms, 95% CI]


input: 0.003382 [0.003272-0.003500ms, 95% CI]
div: 0.000595 [0.000584-0.000607ms, 95% CI]
my-component: 0.002138 [0.001999-0.002309ms, 95% CI]
call constructor: 0.001372 [0.001250-0.001494ms, 95% CI]

(Treat these CIs with care, because I'm creating and appending 10,000 elements and then dividing that aggregate cost by 10,000. It's just slightly more convenient for comparing results when I dial the number of elements up and down. These all append but don't do layout.)

"After" has the following optimizations:

- Only set prototypes if the author changed the constructor's prototype.
- Don't allocate rare data for custom elements; we know at creation time they're custom elements. (Only v1 autonomous custom elements.)
- Custom elements cache and reuse qualified names.
- Don't trace in the built-in HTMLElement (ie super()).
- Cache the JavaScript string "prototype"; many things use it.
- Map constructors to their definitions with a private property that's an index into the registry's list of definitions, instead of looking them up by name.

The next things on the list are more in bindings, but it would be nice if:

- Thunking was faster generally. There's a big difference between createElement('my-component') and new MyComponent just because createElement has to thunk into and out of C++ one extra time.
- Getting the current window (LocalDOMWindow::From) is expensive. I think I can cache that.

I will probably broaden my benchmark a bit soon to include a callback.

Comment 26 by, Apr 20 2017

The rare data is interesting, <input> has a ShadowRoot so it allocates a RareData, ElementShadow and ShadowRoot object. That should be more expensive than a custom element without a ShadowRoot in terms of memory allocations.

Comment 27 by, Apr 21 2017

Blockedon: 714030

Comment 28 by, Apr 21 2017

Yes. I think vanilla component/DIV is a useful comparison and component w/ shadow root/INPUT is a useful comparison.

I am going to look at callbacks next. I have discovered that frame blamer is super expensive and filed  Issue 714030 .

I think it could make sense to allocate the space for the inevitable ShadowRoot* and needs distribution bit in the inline component space. I need to do some profiling of real components and see if they have ERD for ShadowRoots or also for other reasons.

Comment 29 by, Jan 15 2018

Owner: ----
Status: Available (was: Started)
Bulk edit bugs owned by dominicc@

Comment 30 by, Jan 23 2018

Labels: Performance

Comment 31 by, Jan 23

Project Member
Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available.

For more details visit - Your friendly Sheriffbot

Comment 32 by, Jan 24

Labels: -Hotlist-Recharge-Cold
Status: Available (was: Untriaged)

Sign in to add a comment