New issue
Advanced search Search tips

Issue 615496 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Investigate Safari tech preview improvements to Animometer suite > Images and Animometer suite > Suits

Project Member Reported by dk...@chromium.org, May 27 2016

Issue description

The Safari Tech Preview has made significant gains on the Images and Suits benchmarks (~50%) where M53 seems to have little improvement. We should look into these changes and see if there are any general purpose improvements to be made.
 

Comment 1 by vmi...@chromium.org, May 27 2016

I would like to see what has changed on Leaves and Design tests.  These are cases where Chrome was faster than Safari before it's recent improvements.
Cc: esprehn@chromium.org
Components: Blink>Paint
Owner: chrishtr@chromium.org
This is what I see profiles, they've vastly changed how recalcStyle works recently, but it appears slower in my testing. The recording step seems to have become 2x faster which I think is what we're seeing.

Safari 9.1.1 (11601.6.17)

Running Time    Self (ms)       Symbol Name
44636.0ms   33.1%   9.0      WebCore::TileGrid::platformCALayerPaintContents(WebCore::PlatformCALayer*, WebCore::GraphicsContext&, WebCore::FloatRect const&)
28321.0ms   21.0%   6.0      WebCore::Document::recalcStyle(WebCore::Style::Change)
19913.0ms   14.7%   5.0      WebCore::ScriptedAnimationController::serviceScriptedAnimations(double)
8043.0ms    5.9%    1.0      WebCore::FrameView::updateLayoutAndStyleIfNeededRecursive()
5185.0ms    3.8%    0.0      WebCore::FrameView::layout(bool)

WebKit nightly r201450

Running Time    Self (ms)       Symbol Name
38672.0ms   29.2%   2.0      WebCore::Document::recalcStyle(WebCore::Style::Change)
28128.0ms   21.2%   4.0      WebCore::ScriptedAnimationController::serviceScriptedAnimations(double)
19650.0ms   14.8%   7.0      WebCore::TileGrid::platformCALayerPaintContents(WebCore::PlatformCALayer*, WebCore::GraphicsContext&, WebCore::FloatRect const&)
13113.0ms    9.9%   0.0      WebCore::FrameView::updateLayoutAndStyleIfNeededRecursive()
3835.0ms    2.8%    1.0      WebCore::FrameView::layout(bool)

Looking at the call graphs the code itself doesn't seem to do much of anything different, maybe they fixed some n^2, or maybe they changed how data is passed through paint. Assuming my profiles are correct, it seems like what changed is some kind of overhead inside the painting code was removed since RenderReplaced::paint spends a similar percent in both profiles, but the rest of the system is spending much less time in the nightly build.

Safari 9.1.1 (11601.6.17)

Total %     Symbol Name
33.1         WebCore::TileGrid::platformCALayerPaintContents
33        WebCore::PlatformCALayer::drawLayerContents
32.9           WebCore::GraphicsLayer::paintGraphicsLayerContents
32.9            WebCore::RenderLayerBacking::paintContents
32.9             WebCore::RenderLayerBacking::paintIntoLayer
32.8              WebCore::RenderLayer::paintLayerContents
32.7               WebCore::RenderLayer::paintLayerContents
32.5                WebCore::RenderLayer::paintLayerContents
32.5                 WebCore::RenderLayer::paintForegroundForFragments
32.5                  WebCore::RenderLayer::paintForegroundForFragmentsWithPhase
32.5                   WebCore::RenderBlock::paint
32.5                    WebCore::RenderBlock::paintObject
32.5                     WebCore::RenderBlock::paintContents
32.5                      WebCore::RenderFlexibleBox::paintChildren
(... lots of recursion through various paint methods ...)
32.2                                     WebCore::RenderLayer::paintLayerContents
31.6                                      WebCore::RenderLayer::paintLayer
28.7                                       WebCore::RenderLayer::paintLayerByApplyingTransform
18.5                                        WebCore::RenderLayer::paintLayerContents
11.8                                         WebCore::RenderLayer::paintForegroundForFragments
11.4                                          WebCore::RenderLayer::paintForegroundForFragmentsWithPhase
11                                         WebCore::RenderImage::paint
10.9                                            WebCore::RenderReplaced::paint
10.4                                             WebCore::RenderImage::paintReplaced
8.4                                           WebCore::RenderImage::paintIntoRect
6.5                                            WebCore::GraphicsContext::drawImage
1.2                                            WebCore::ImageQualityController::shouldPaintAtLowQuality


WebKit nightly r201450

Total %     Symbol Name
14.8         WebCore::TileGrid::platformCALayerPaintContents
14.8          WebCore::PlatformCALayer::drawLayerContents
14.7           WebCore::GraphicsLayer::paintGraphicsLayerContents
14.7            WebCore::RenderLayerBacking::paintContents
14.7             WebCore::RenderLayerBacking::paintIntoLayer
14.7              WebCore::RenderLayer::paintLayerContents
14.6               WebCore::RenderLayer::paintLayerContents
14.5                WebCore::RenderLayer::paintLayerContents
14.5                 WebCore::RenderLayer::paintForegroundForFragments
14.5                  WebCore::RenderLayer::paintForegroundForFragmentsWithPhase
14.5                   WebCore::RenderBlock::paint
14.5                    WebCore::RenderBlock::paintObject
14.5                     WebCore::RenderBlock::paintContents
14.5                      WebCore::RenderFlexibleBox::paintChildren
(... lots of recursion through various paint methods ...)
14.4                                    WebCore::RenderLayer::paintLayerContents
14.4                                     WebCore::RenderLayer::paintLayerContents
14.1                                      WebCore::RenderLayer::paintLayer
13.1                                       WebCore::RenderLayer::paintLayerByApplyingTransform
11.3                                        WebCore::RenderLayer::paintLayerContents
9                                        WebCore::RenderLayer::paintForegroundForFragments
8.8                                       WebCore::RenderLayer::paintForegroundForFragmentsWithPhase
8.6                                        WebCore::RenderImage::paint
8.5                                         WebCore::RenderReplaced::paint
8                                            WebCore::RenderImage::paintReplaced
6.1                                           WebCore::RenderImage::paintIntoRect
4.5                                            WebCore::GraphicsContext::drawImage
1.1                                            WebCore::ImageQualityController::chooseInterpolationQuality

Testing with builds downloaded from: https://webkit.org/nightly/archives/

version - Leaves score (complexity)

r194967 Jan 13 - 250
r195979 Feb 2 - 500
r198971 April 2 - 620
r200103 April 26 - 700
r200505 May 6 - 700
r201450 May 27 - 700
r201559 June 1 - 700

Chrome 53.0.2754.0 (canary) - 500
w/  --force-gpu-rasterization --enable-main-frame-before-activation

Hmm, testing again on r194967 Jan 13 and I see 500, not sure why I got 250 before.
Cc: ccameron@chromium.org
I bisected longer, there's a lot of noise in Leaves benchmark, but this is what I see:

r197566 => ~500
r197613 => ~600 (sometimes it's a lot higher, but usually it's about this).

I then looked at all of the changes between those two revisions, the one that looks most promising is:

r197594 Use larger tiles when possible to reduce per-tile painting overhead
https://bugs.webkit.org/show_bug.cgi?id=154985
(see the dependent patch as well: http://trac.webkit.org/changeset/197541)

They now use "Giant Tiles" for layers which aren't scrollable, which seems to be about the size of the entire layer (with a maximum). They also switched to viewport width tiles (like Ganesh) for scrollable layers).

The original patch hard coded giant tiles as 4096, but ToT has changed it to respect the maximum IOSurface size:
https://github.com/WebKit/webkit/blob/e518eb57f68c3ded22d48c6aed1a731bcf120a19/Source/WebCore/platform/graphics/ca/TileController.cpp#L496

With Ganesh turned on the Animometer benchmarks seem to get the stage cut into 4 bands for me since our tiles are 1/4 the viewport height. I wonder if we should also use giant tiles for non-scrollable layers?
Cc: pdr@chromium.org
I tried hacking up kMinHeightForGpuRasteredTile = 4096 locally which does show one huge tile with debug borders turned on. It doesn't seem to change my score though, someone else should try as well, maybe I'm missing something.

Animometer > Leaves, Fixed complexity = 500

Canary 53.0.2754.0 as observed by tracing:

BeginMainFrame 20.634
    synchronizedPaint 8.985 = 43.5%
    FrameView::invalidateTreeIfNeededRecursive 1.485 = 7.2%
    PaintLayerCompositor::updateIfNeededRecursive 0.374 = 1.8%
    updateStyle 5.405 = 26.1%
    serviceScriptedAnimations (raf) 3.282 = 15.9%
    PictureLayer::Update 0.806 = 3.9%

Note that this shows synchronizedPaint as being much slower than what I see in Instruments, I suspect this is tracing overhead, perhaps because we instrument BitmapImage::draw and we're calling it 500 times per frame?

Content Shell @ 033172cfd48f0c1d634324a38bf75711572cfb13 as observed by Instruments:

Total %     Symbol Name
100         cc::ProxyMain::BeginMainFrame
31.2            blink::Document::updateStyle
23.6            blink::PageAnimator::serviceScriptedAnimations
19.7            blink::FrameView::synchronizedPaint
18.2            blink::FrameView::invalidateTreeIfNeededRecursive
3.4             cc::LayerTreeHost::UpdateLayers
2.8             blink::PaintLayerCompositor::updateIfNeededRecursive

WebKit r197566 as observed by Instruments:

Total %     Symbol Name
100         main
35.2            WebCore::TileGrid::platformCALayerPaintContents
32.2            WebCore::Document::recalcStyle
5.1                 WebCore::RenderLayer::repaintIncludingDescendants
18              WebCore::ScriptedAnimationController::serviceScriptedAnimations
4.8             WebCore::RenderLayer::updateLayerPositionsAfterLayout
3.9             WebCore::RenderView::layout

WebKit r197613 as observed by Instruments:

Total %     Symbol Name
100         main
40.5            WebCore::Document::recalcStyle
8	 	    RenderLayer::repaintIncludingDescendants
23.7            WebCore::ScriptedAnimationController::serviceScriptedAnimations
18.8            WebCore::TileGrid::platformCALayerPaintContents
6.2             WebCore::RenderLayer::updateLayerPositionsAfterLayout
5.8             WebCore::RenderView::layout

They're doing layout and updateLayerPositionsAfterLayout, we're smart enough to avoid it apparently.

The breakdown seems to be:
WebKit:
    32% style
    24% script
    19% record
    12% layout (layout, layer pos update)
     8% paint invalidation

Chrome:
    31% style
    24% script
    19% record
    18% paint invalidation (invalidateTreeIfNeededRecursive)
    6% compositing (UpdateLayers, updateIfNeededRecursive)

> I tried hacking up kMinHeightForGpuRasteredTile = 4096 locally which does show one
> huge tile with debug borders turned on. It doesn't seem to change my score though,
> someone else should try as well, maybe I'm missing something.

We're main thread bound on Leaves so tile sizes shouldn't really make a difference to our critical path.

If we were raster task bound or GPU thread bound then larger tiles could reduce overheads.  It probably improves power usage.
Yeah, we already record at one size (an skp for the whole layer?) and then raster at another (per tile). Since their recordings are hidden inside the calls to CG, their tile size maps directly to the recording size so they're also recording one big command list inside CG when they use giant tiles which is the same as synchronizedPaint.

Since the style and script steps are the same percentage wise, but they're faster overall (500 vs 600) does that mean they're 20% faster at everything?
> Since the style and script steps are the same percentage wise, but they're faster
> overall (500 vs 600) does that mean they're 20% faster at everything?

At script I guess.  Our style includes layout doesn't it?

If I change kMaxInvalidationRectCount to 1 for cc::InvalidationRegion we get another 8% improvement on Leaves.
First comment:

As mentioned in ealier comments, larger tiles don't help the SPv1 architecture, because
we always "re-record" the entire interest rect area, which is not related to the size
of a tile. I put "re-record" in quotes because there is actually caching in place that
avoids some/most of the recording cost.
Discussed with Enne. We think it's worth trying converting InvalidationRegion
to be backed by a vector rather than a Region (which is in turn backed by
SkRegion, which has semantics that are fancier than we seem to need).

Implementing it would be the way to see if it makes a difference.
The task in item 12 was done by vmiura.
Labels: -Pri-1 Pri-2
Status: Fixed (was: Assigned)

Sign in to add a comment