New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 901063 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Chrome , Mac
Pri: 1
Type: Bug

Blocking:
issue 896643



Sign in to add a comment

[Shadow DOM] Element projection performance regression in v1 (vs. v0)

Project Member Reported by keanulee@chromium.org, Nov 1

Issue description

Chrome Version: 70.0.3538.77
OS: macOS 10.13.6 (17G65)

With Shadow DOM v1 (.attachShadow()/<slot>), measuring the offsetHeight of projected child elements is much slower than with Shadow DOM v0 (.createShadowRoot()/<content>).

What steps will reproduce the problem?
(1) Open https://jsbin.com/ravolid/edit?html,output
(2) Observe v1 time measurement is much greater than v0

What is the expected result?

v1 time measurement should be at least on par with v0


What happens instead?

v1 performance seems to be regressed.

 
Blockedon: 896643
Labels: -Pri-3 OS-Chrome OS-Linux OS-Mac OS-Windows Pri-2
Blockedon: -896643
Blocking: 896643
See attached DevTools screenshot, showing the main difference being time spent in style recalc.

I made a few changes to the repro to fix bias against v1 - namely clearing the v0 DOM entries so they don't affect the number of affected nodes in v1's style recalc. It shaves 10% or so off the performance degradation, but the results are still almost 10x worse:

https://winter-option.glitch.me
Screen Shot 2018-11-01 at 7.54.54 PM.png
175 KB View Download
Cc: hayato@chromium.org
Owner: rakina@chromium.org
Status: Assigned (was: Untriaged)
rakina@, could you have a chance to take a look?

If you need my help, please feel free to ask me.
Cc: yoichio@chromium.org yosin@chromium.org
yoichio@, yosin@, if you are interested in, please feel free to investigate this.

I'm now in delegating mode.
@rakina: Any updates? This is blocking WebUI's Shadow DOM v1 migration.
Labels: -Pri-2 Pri-1
I think this is due to slot assignment calculation being slow.  HTMLSlotElement::AssignedNodeNextTo and  HTMLSlotElement::AssignedNodePreviousTo are both O(N). I think it's time to modify it to use maps. I'll send a CL soon.
It sounds that that this is not due to the slowness of assignment calculation. :)

Please be careful to make HTMLSlotElement::AssignedNodeNextTo O(1).

If the size of assigned nodes is small, a simple linear search, O(n), is much faster than a hash map lookup, O(1) in theory.

So which is faster depends on the use case.
Oops, yeah. I didn't mean "assignment calculation" but traversing the nodes while recalculating styles/layout. TraverseSibling for V1 calls those functions. 

I've encountered this problem with virtual-scroller implementation that also assigns lots of children to the host, and we ended up solving by just wrapping in a container. But I don't know if we want to keep this as a O(N) function, which is called every time we want to traverse to a sibling, meaning it ends up as O(N^2), pretty far from normal sibling traversal. I think it's worth it to have the small use case slightly slower.
Yup, I am not objecting to making it O(1). Shadow DOM v0 is actually doing that.

As you might see it in our backlog, I have another idea, stateful flat tree traversal, so that it would make traversal much faster, however, that was not realized due to my bandwidth problem. :(
Project Member

Comment 12 by bugdroid1@chromium.org, Nov 9

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/eaddcb2899f1120b8a8d4d30d3c2a2911e65d049

commit eaddcb2899f1120b8a8d4d30d3c2a2911e65d049
Author: Rakina Zata Amni <rakina@chromium.org>
Date: Fri Nov 09 07:56:45 2018

Add assigned_nodes_index_ map for slot's assigned child

Currently when traversing to the next/prev sibling of a slot's assigned
child, we find the current node's position by looping through a list
of the slot's assigned child, taking O(N) time where N is the number of
assigned children in that slot. This means traversing through a slot's
children will take a total of O(N^2) time. This CL adds a node->index
map for slot's children so that we will instead take O(1) time to find
a node's position in the list.

The downside of this is in cases where there are only a small number
of children assigned to a slot, the performance will be slightly worse
because of the hashmap lookup.

Benchmarked with https://jsbin.com/ravolid/edit?html,output, comparing
shadow dom v0 vs v1, where v0 already uses maps.

For N = 500 (release build), v0 takes ~20ms, v1 without map takes ~200ms, so v0 is 10x
faster than v1.
For N = 500 (local build), v0 takes ~54ms, v1 with map takes ~90ms, so v0 is 1.8x
faster than v1.


Bug: 901063
Change-Id: Iec006fcafca1e368ab97f40d909362dbcf74a08e
Reviewed-on: https://chromium-review.googlesource.com/c/1328622
Reviewed-by: Hayato Ito <hayato@chromium.org>
Commit-Queue: Rakina Zata Amni <rakina@chromium.org>
Cr-Commit-Position: refs/heads/master@{#606764}
[modify] https://crrev.com/eaddcb2899f1120b8a8d4d30d3c2a2911e65d049/third_party/blink/renderer/core/html/html_slot_element.cc
[modify] https://crrev.com/eaddcb2899f1120b8a8d4d30d3c2a2911e65d049/third_party/blink/renderer/core/html/html_slot_element.h

Labels: TE-Verified-M72 TE-Verified-72.0.3608.0
Able to reproduce this issue on Windows 10, Mac OS 10.13.6 and Ubuntu 17.10 on the reported version 70.0.3538.77 and the issue is fixed on the latest M-72 build 72.0.3608.0.

1. Launched Chrome and navigated to https://jsbin.com/ravolid/edit?html,output.
2. Could observe that the v1 time measurement is on par with v0.
Attached is the screen shot for reference.

Hence adding TE verified labels as the fix is working as intended.

Thanks..
901063-M72.PNG
121 KB View Download
@rakina: Do you consider this fixed? Is the improvement satisfactory? Is it possible to match Shadow DOM v0 performance? Or is there a reason that justifies that v1 will always be slower than v0 in this case?

For 5k or 10k elements the difference still seems significant

5k elements: v0: 649ms, v1: 3879ms, 6x slower
10k elements: v0: 1129ms, v1: 14018ms, 12x slower

@dpapad,
This definitely needs some more investigation on what else is slowing it down. As I'm not familiar with v0 vs v1 implementation, maybe hayato@ can give some possible causes, and I'm happy to explore more (or maybe some other people who's interested)

However, I'm not quite sure how common cases with thousands of slot children are in the web. Are you using that much, and is it possible to actually contain the children so that it doesn't get slotted directly?
@rakina

My advise is:

1. Your test, https://jsbin.com/ravolid/edit?html,output, in the CL is measuring the cost of layout, as well as the cost of offsetHeight().  It would be better to separate it out.

2. It looks v1's performance test includes the cost of v0host's layout. It would be better to 'v0host.remove()' before measuring v1.
@rakina: Any updates here?
So recently hayato@ landed 2 CLs that looks like it's helping this (crrev.com/c/1337225, crrev.com/c/1341744). Can you try with the latest canary?

Sign in to add a comment