[Shadow DOM] Element projection performance regression in v1 (vs. v0) |
|||||
Issue descriptionChrome Version: 70.0.3538.77 OS: macOS 10.13.6 (17G65) With Shadow DOM v1 (.attachShadow()/<slot>), measuring the offsetHeight of projected child elements is much slower than with Shadow DOM v0 (.createShadowRoot()/<content>). What steps will reproduce the problem? (1) Open https://jsbin.com/ravolid/edit?html,output (2) Observe v1 time measurement is much greater than v0 What is the expected result? v1 time measurement should be at least on par with v0 What happens instead? v1 performance seems to be regressed.
,
Nov 1
See attached DevTools screenshot, showing the main difference being time spent in style recalc. I made a few changes to the repro to fix bias against v1 - namely clearing the v0 DOM entries so they don't affect the number of affected nodes in v1's style recalc. It shaves 10% or so off the performance degradation, but the results are still almost 10x worse: https://winter-option.glitch.me
,
Nov 2
rakina@, could you have a chance to take a look? If you need my help, please feel free to ask me.
,
Nov 2
yoichio@, yosin@, if you are interested in, please feel free to investigate this. I'm now in delegating mode.
,
Nov 7
@rakina: Any updates? This is blocking WebUI's Shadow DOM v1 migration.
,
Nov 8
,
Nov 8
I think this is due to slot assignment calculation being slow. HTMLSlotElement::AssignedNodeNextTo and HTMLSlotElement::AssignedNodePreviousTo are both O(N). I think it's time to modify it to use maps. I'll send a CL soon.
,
Nov 8
It sounds that that this is not due to the slowness of assignment calculation. :) Please be careful to make HTMLSlotElement::AssignedNodeNextTo O(1). If the size of assigned nodes is small, a simple linear search, O(n), is much faster than a hash map lookup, O(1) in theory. So which is faster depends on the use case.
,
Nov 8
Oops, yeah. I didn't mean "assignment calculation" but traversing the nodes while recalculating styles/layout. TraverseSibling for V1 calls those functions. I've encountered this problem with virtual-scroller implementation that also assigns lots of children to the host, and we ended up solving by just wrapping in a container. But I don't know if we want to keep this as a O(N) function, which is called every time we want to traverse to a sibling, meaning it ends up as O(N^2), pretty far from normal sibling traversal. I think it's worth it to have the small use case slightly slower.
,
Nov 8
Yup, I am not objecting to making it O(1). Shadow DOM v0 is actually doing that. As you might see it in our backlog, I have another idea, stateful flat tree traversal, so that it would make traversal much faster, however, that was not realized due to my bandwidth problem. :(
,
Nov 9
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/eaddcb2899f1120b8a8d4d30d3c2a2911e65d049 commit eaddcb2899f1120b8a8d4d30d3c2a2911e65d049 Author: Rakina Zata Amni <rakina@chromium.org> Date: Fri Nov 09 07:56:45 2018 Add assigned_nodes_index_ map for slot's assigned child Currently when traversing to the next/prev sibling of a slot's assigned child, we find the current node's position by looping through a list of the slot's assigned child, taking O(N) time where N is the number of assigned children in that slot. This means traversing through a slot's children will take a total of O(N^2) time. This CL adds a node->index map for slot's children so that we will instead take O(1) time to find a node's position in the list. The downside of this is in cases where there are only a small number of children assigned to a slot, the performance will be slightly worse because of the hashmap lookup. Benchmarked with https://jsbin.com/ravolid/edit?html,output, comparing shadow dom v0 vs v1, where v0 already uses maps. For N = 500 (release build), v0 takes ~20ms, v1 without map takes ~200ms, so v0 is 10x faster than v1. For N = 500 (local build), v0 takes ~54ms, v1 with map takes ~90ms, so v0 is 1.8x faster than v1. Bug: 901063 Change-Id: Iec006fcafca1e368ab97f40d909362dbcf74a08e Reviewed-on: https://chromium-review.googlesource.com/c/1328622 Reviewed-by: Hayato Ito <hayato@chromium.org> Commit-Queue: Rakina Zata Amni <rakina@chromium.org> Cr-Commit-Position: refs/heads/master@{#606764} [modify] https://crrev.com/eaddcb2899f1120b8a8d4d30d3c2a2911e65d049/third_party/blink/renderer/core/html/html_slot_element.cc [modify] https://crrev.com/eaddcb2899f1120b8a8d4d30d3c2a2911e65d049/third_party/blink/renderer/core/html/html_slot_element.h
,
Nov 12
Able to reproduce this issue on Windows 10, Mac OS 10.13.6 and Ubuntu 17.10 on the reported version 70.0.3538.77 and the issue is fixed on the latest M-72 build 72.0.3608.0. 1. Launched Chrome and navigated to https://jsbin.com/ravolid/edit?html,output. 2. Could observe that the v1 time measurement is on par with v0. Attached is the screen shot for reference. Hence adding TE verified labels as the fix is working as intended. Thanks..
,
Nov 13
@rakina: Do you consider this fixed? Is the improvement satisfactory? Is it possible to match Shadow DOM v0 performance? Or is there a reason that justifies that v1 will always be slower than v0 in this case? For 5k or 10k elements the difference still seems significant 5k elements: v0: 649ms, v1: 3879ms, 6x slower 10k elements: v0: 1129ms, v1: 14018ms, 12x slower
,
Nov 13
@dpapad, This definitely needs some more investigation on what else is slowing it down. As I'm not familiar with v0 vs v1 implementation, maybe hayato@ can give some possible causes, and I'm happy to explore more (or maybe some other people who's interested) However, I'm not quite sure how common cases with thousands of slot children are in the web. Are you using that much, and is it possible to actually contain the children so that it doesn't get slotted directly?
,
Nov 13
@rakina My advise is: 1. Your test, https://jsbin.com/ravolid/edit?html,output, in the CL is measuring the cost of layout, as well as the cost of offsetHeight(). It would be better to separate it out. 2. It looks v1's performance test includes the cost of v0host's layout. It would be better to 'v0host.remove()' before measuring v1.
,
Nov 29
@rakina: Any updates here?
,
Nov 29
So recently hayato@ landed 2 CLs that looks like it's helping this (crrev.com/c/1337225, crrev.com/c/1341744). Can you try with the latest canary? |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by dpa...@chromium.org
, Nov 1Labels: -Pri-3 OS-Chrome OS-Linux OS-Mac OS-Windows Pri-2