Investigate Safari tech preview improvements to Animometer suite > Images and Animometer suite > Suits |
||||||
Issue descriptionThe Safari Tech Preview has made significant gains on the Images and Suits benchmarks (~50%) where M53 seems to have little improvement. We should look into these changes and see if there are any general purpose improvements to be made.
,
May 31 2016
This is what I see profiles, they've vastly changed how recalcStyle works recently, but it appears slower in my testing. The recording step seems to have become 2x faster which I think is what we're seeing. Safari 9.1.1 (11601.6.17) Running Time Self (ms) Symbol Name 44636.0ms 33.1% 9.0 WebCore::TileGrid::platformCALayerPaintContents(WebCore::PlatformCALayer*, WebCore::GraphicsContext&, WebCore::FloatRect const&) 28321.0ms 21.0% 6.0 WebCore::Document::recalcStyle(WebCore::Style::Change) 19913.0ms 14.7% 5.0 WebCore::ScriptedAnimationController::serviceScriptedAnimations(double) 8043.0ms 5.9% 1.0 WebCore::FrameView::updateLayoutAndStyleIfNeededRecursive() 5185.0ms 3.8% 0.0 WebCore::FrameView::layout(bool) WebKit nightly r201450 Running Time Self (ms) Symbol Name 38672.0ms 29.2% 2.0 WebCore::Document::recalcStyle(WebCore::Style::Change) 28128.0ms 21.2% 4.0 WebCore::ScriptedAnimationController::serviceScriptedAnimations(double) 19650.0ms 14.8% 7.0 WebCore::TileGrid::platformCALayerPaintContents(WebCore::PlatformCALayer*, WebCore::GraphicsContext&, WebCore::FloatRect const&) 13113.0ms 9.9% 0.0 WebCore::FrameView::updateLayoutAndStyleIfNeededRecursive() 3835.0ms 2.8% 1.0 WebCore::FrameView::layout(bool)
,
May 31 2016
Looking at the call graphs the code itself doesn't seem to do much of anything different, maybe they fixed some n^2, or maybe they changed how data is passed through paint. Assuming my profiles are correct, it seems like what changed is some kind of overhead inside the painting code was removed since RenderReplaced::paint spends a similar percent in both profiles, but the rest of the system is spending much less time in the nightly build. Safari 9.1.1 (11601.6.17) Total % Symbol Name 33.1 WebCore::TileGrid::platformCALayerPaintContents 33 WebCore::PlatformCALayer::drawLayerContents 32.9 WebCore::GraphicsLayer::paintGraphicsLayerContents 32.9 WebCore::RenderLayerBacking::paintContents 32.9 WebCore::RenderLayerBacking::paintIntoLayer 32.8 WebCore::RenderLayer::paintLayerContents 32.7 WebCore::RenderLayer::paintLayerContents 32.5 WebCore::RenderLayer::paintLayerContents 32.5 WebCore::RenderLayer::paintForegroundForFragments 32.5 WebCore::RenderLayer::paintForegroundForFragmentsWithPhase 32.5 WebCore::RenderBlock::paint 32.5 WebCore::RenderBlock::paintObject 32.5 WebCore::RenderBlock::paintContents 32.5 WebCore::RenderFlexibleBox::paintChildren (... lots of recursion through various paint methods ...) 32.2 WebCore::RenderLayer::paintLayerContents 31.6 WebCore::RenderLayer::paintLayer 28.7 WebCore::RenderLayer::paintLayerByApplyingTransform 18.5 WebCore::RenderLayer::paintLayerContents 11.8 WebCore::RenderLayer::paintForegroundForFragments 11.4 WebCore::RenderLayer::paintForegroundForFragmentsWithPhase 11 WebCore::RenderImage::paint 10.9 WebCore::RenderReplaced::paint 10.4 WebCore::RenderImage::paintReplaced 8.4 WebCore::RenderImage::paintIntoRect 6.5 WebCore::GraphicsContext::drawImage 1.2 WebCore::ImageQualityController::shouldPaintAtLowQuality WebKit nightly r201450 Total % Symbol Name 14.8 WebCore::TileGrid::platformCALayerPaintContents 14.8 WebCore::PlatformCALayer::drawLayerContents 14.7 WebCore::GraphicsLayer::paintGraphicsLayerContents 14.7 WebCore::RenderLayerBacking::paintContents 14.7 WebCore::RenderLayerBacking::paintIntoLayer 14.7 WebCore::RenderLayer::paintLayerContents 14.6 WebCore::RenderLayer::paintLayerContents 14.5 WebCore::RenderLayer::paintLayerContents 14.5 WebCore::RenderLayer::paintForegroundForFragments 14.5 WebCore::RenderLayer::paintForegroundForFragmentsWithPhase 14.5 WebCore::RenderBlock::paint 14.5 WebCore::RenderBlock::paintObject 14.5 WebCore::RenderBlock::paintContents 14.5 WebCore::RenderFlexibleBox::paintChildren (... lots of recursion through various paint methods ...) 14.4 WebCore::RenderLayer::paintLayerContents 14.4 WebCore::RenderLayer::paintLayerContents 14.1 WebCore::RenderLayer::paintLayer 13.1 WebCore::RenderLayer::paintLayerByApplyingTransform 11.3 WebCore::RenderLayer::paintLayerContents 9 WebCore::RenderLayer::paintForegroundForFragments 8.8 WebCore::RenderLayer::paintForegroundForFragmentsWithPhase 8.6 WebCore::RenderImage::paint 8.5 WebCore::RenderReplaced::paint 8 WebCore::RenderImage::paintReplaced 6.1 WebCore::RenderImage::paintIntoRect 4.5 WebCore::GraphicsContext::drawImage 1.1 WebCore::ImageQualityController::chooseInterpolationQuality
,
Jun 2 2016
Testing with builds downloaded from: https://webkit.org/nightly/archives/ version - Leaves score (complexity) r194967 Jan 13 - 250 r195979 Feb 2 - 500 r198971 April 2 - 620 r200103 April 26 - 700 r200505 May 6 - 700 r201450 May 27 - 700 r201559 June 1 - 700 Chrome 53.0.2754.0 (canary) - 500 w/ --force-gpu-rasterization --enable-main-frame-before-activation
,
Jun 2 2016
Hmm, testing again on r194967 Jan 13 and I see 500, not sure why I got 250 before.
,
Jun 2 2016
I bisected longer, there's a lot of noise in Leaves benchmark, but this is what I see: r197566 => ~500 r197613 => ~600 (sometimes it's a lot higher, but usually it's about this). I then looked at all of the changes between those two revisions, the one that looks most promising is: r197594 Use larger tiles when possible to reduce per-tile painting overhead https://bugs.webkit.org/show_bug.cgi?id=154985 (see the dependent patch as well: http://trac.webkit.org/changeset/197541) They now use "Giant Tiles" for layers which aren't scrollable, which seems to be about the size of the entire layer (with a maximum). They also switched to viewport width tiles (like Ganesh) for scrollable layers). The original patch hard coded giant tiles as 4096, but ToT has changed it to respect the maximum IOSurface size: https://github.com/WebKit/webkit/blob/e518eb57f68c3ded22d48c6aed1a731bcf120a19/Source/WebCore/platform/graphics/ca/TileController.cpp#L496 With Ganesh turned on the Animometer benchmarks seem to get the stage cut into 4 bands for me since our tiles are 1/4 the viewport height. I wonder if we should also use giant tiles for non-scrollable layers?
,
Jun 2 2016
I tried hacking up kMinHeightForGpuRasteredTile = 4096 locally which does show one huge tile with debug borders turned on. It doesn't seem to change my score though, someone else should try as well, maybe I'm missing something.
Animometer > Leaves, Fixed complexity = 500
Canary 53.0.2754.0 as observed by tracing:
BeginMainFrame 20.634
synchronizedPaint 8.985 = 43.5%
FrameView::invalidateTreeIfNeededRecursive 1.485 = 7.2%
PaintLayerCompositor::updateIfNeededRecursive 0.374 = 1.8%
updateStyle 5.405 = 26.1%
serviceScriptedAnimations (raf) 3.282 = 15.9%
PictureLayer::Update 0.806 = 3.9%
Note that this shows synchronizedPaint as being much slower than what I see in Instruments, I suspect this is tracing overhead, perhaps because we instrument BitmapImage::draw and we're calling it 500 times per frame?
Content Shell @ 033172cfd48f0c1d634324a38bf75711572cfb13 as observed by Instruments:
Total % Symbol Name
100 cc::ProxyMain::BeginMainFrame
31.2 blink::Document::updateStyle
23.6 blink::PageAnimator::serviceScriptedAnimations
19.7 blink::FrameView::synchronizedPaint
18.2 blink::FrameView::invalidateTreeIfNeededRecursive
3.4 cc::LayerTreeHost::UpdateLayers
2.8 blink::PaintLayerCompositor::updateIfNeededRecursive
WebKit r197566 as observed by Instruments:
Total % Symbol Name
100 main
35.2 WebCore::TileGrid::platformCALayerPaintContents
32.2 WebCore::Document::recalcStyle
5.1 WebCore::RenderLayer::repaintIncludingDescendants
18 WebCore::ScriptedAnimationController::serviceScriptedAnimations
4.8 WebCore::RenderLayer::updateLayerPositionsAfterLayout
3.9 WebCore::RenderView::layout
WebKit r197613 as observed by Instruments:
Total % Symbol Name
100 main
40.5 WebCore::Document::recalcStyle
8 RenderLayer::repaintIncludingDescendants
23.7 WebCore::ScriptedAnimationController::serviceScriptedAnimations
18.8 WebCore::TileGrid::platformCALayerPaintContents
6.2 WebCore::RenderLayer::updateLayerPositionsAfterLayout
5.8 WebCore::RenderView::layout
They're doing layout and updateLayerPositionsAfterLayout, we're smart enough to avoid it apparently.
The breakdown seems to be:
WebKit:
32% style
24% script
19% record
12% layout (layout, layer pos update)
8% paint invalidation
Chrome:
31% style
24% script
19% record
18% paint invalidation (invalidateTreeIfNeededRecursive)
6% compositing (UpdateLayers, updateIfNeededRecursive)
,
Jun 2 2016
> I tried hacking up kMinHeightForGpuRasteredTile = 4096 locally which does show one > huge tile with debug borders turned on. It doesn't seem to change my score though, > someone else should try as well, maybe I'm missing something. We're main thread bound on Leaves so tile sizes shouldn't really make a difference to our critical path. If we were raster task bound or GPU thread bound then larger tiles could reduce overheads. It probably improves power usage.
,
Jun 2 2016
Yeah, we already record at one size (an skp for the whole layer?) and then raster at another (per tile). Since their recordings are hidden inside the calls to CG, their tile size maps directly to the recording size so they're also recording one big command list inside CG when they use giant tiles which is the same as synchronizedPaint. Since the style and script steps are the same percentage wise, but they're faster overall (500 vs 600) does that mean they're 20% faster at everything?
,
Jun 2 2016
> Since the style and script steps are the same percentage wise, but they're faster > overall (500 vs 600) does that mean they're 20% faster at everything? At script I guess. Our style includes layout doesn't it? If I change kMaxInvalidationRectCount to 1 for cc::InvalidationRegion we get another 8% improvement on Leaves.
,
Jun 2 2016
First comment: As mentioned in ealier comments, larger tiles don't help the SPv1 architecture, because we always "re-record" the entire interest rect area, which is not related to the size of a tile. I put "re-record" in quotes because there is actually caching in place that avoids some/most of the recording cost.
,
Jun 2 2016
Discussed with Enne. We think it's worth trying converting InvalidationRegion to be backed by a vector rather than a Region (which is in turn backed by SkRegion, which has semantics that are fancier than we seem to need). Implementing it would be the way to see if it makes a difference.
,
Jun 10 2016
The task in item 12 was done by vmiura.
,
Jul 11 2016
,
Oct 14 2016
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by vmi...@chromium.org
, May 27 2016