New issue
Advanced search Search tips

Issue 631032 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner:
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 3
Type: Bug



Sign in to add a comment

Investigate why SimpleFontData::width and ::bounds functions are so performance critical

Project Member Reported by drott@chromium.org, Jul 25 2016

Issue description

See discussion in 
https://codereview.chromium.org/1980913002

It's surprising that these functions are so performance critical once we have shaping results in the word cache. 

Especially the Range API and selection code seems to suffer. Speculating: Perhaps we can put that on a new foundation and adapt it to the wordcache instead of using coordinate to cursor position translation functions from Font.h?

 

Comment 1 by kojii@chromium.org, Jul 26 2016

Cc: yosin@chromium.org

Comment 2 by kojii@chromium.org, Aug 2 2016

From comment #79 of the CL: https://codereview.chromium.org/1980913002#msg79

I profiled (Win10, debug)
editing/selection/modify_move/move-by-word-visually-crash-test-5.html
and it says 43.94% of time is spent SkPaint::getTextWidths(). 

Function Name	Total CPU (%)	Self CPU (%)	Total CPU (ms)	Self CPU (ms)	Module
 - blink::DOMSelection::modify	94.04 %	0.00 %	49156	1	blink_core.dll
   - blink::FrameSelection::modify	94.03 %	0.00 %	49152	1	blink_core.dll
     - blink::SelectionModifier::modify	92.86 %	0.00 %	48543	2	blink_core.dll
       - blink::SelectionModifier::lineDirectionPointForBlockDirectionNavigation	85.12 %	0.00 %	44494	1	blink_core.dll
         - blink::SelectionModifier::lineDirectionPointForBlockDirectionNavigation	85.00 %	0.00 %	44434	2	blink_core.dll
           - blink::localCaretRectOfPosition	84.88 %	0.00 %	44368	0	blink_core.dll
             - blink::localCaretRectOfPositionTemplate<blink::EditingAlgorithm<blink::NodeTraversal> >	84.88 %	0.01 %	44368	3	blink_core.dll
               - blink::LayoutText::localCaretRect	84.86 %	0.00 %	44361	1	blink_core.dll
                 - blink::InlineTextBox::positionForOffset	84.80 %	0.00 %	44327	1	blink_core.dll
                   - blink::Font::selectionRectForText	84.74 %	0.00 %	44296	2	blink_platform.dll
                     - blink::Font::selectionRectForComplexText	84.72 %	0.00 %	44284	1	blink_platform.dll
                       - blink::CachingWordShaper::getCharacterRange	84.69 %	0.00 %	44272	0	blink_platform.dll
                         - WTF::VectorBuffer<WTF::RefPtr<blink::ShapeResult const >,64,WTF::PartitionAllocator>::resetBufferPointer	84.08 %	0.01 %	43951	5	blink_platform.dll
                             ?	0.01 %	0.01 %	4	4	blink_platform.dll
                           - blink::CachingWordShapeIterator::next	83.87 %	0.04 %	43841	19	blink_platform.dll
                             - blink::CachingWordShapeIterator::nextForAllowTabs	83.81 %	0.01 %	43810	6	blink_platform.dll
                               - blink::CachingWordShapeIterator::nextWord	83.68 %	0.01 %	43743	7	blink_platform.dll
                                 - blink::CachingWordShapeIterator::shapeToEndIndex	83.49 %	0.02 %	43641	8	blink_platform.dll
                                   - blink::CachingWordShapeIterator::shapeWord	83.28 %	0.02 %	43533	8	blink_platform.dll
                                     - blink::CachingWordShapeIterator::shapeWordWithoutSpacing	83.25 %	0.02 %	43519	13	blink_platform.dll
                                       - blink::HarfBuzzShaper::shapeResult	79.93 %	0.11 %	41781	56	blink_platform.dll
                                         + blink::HarfBuzzShaper::extractShapeResults	7.53 %	0.10 %	3936	52	blink_platform.dll
                                         - blink::HarfBuzzShaper::shapeRange	57.15 %	0.03 %	29877	14	blink_platform.dll
                                           - hb_shape	55.06 %	0.01 %	28784	7	blink_platform.dll
                                             - hb_shape_full	55.05 %	0.02 %	28777	8	blink_platform.dll
                                               - hb_shape_plan_execute	54.80 %	0.04 %	28645	20	blink_platform.dll
                                                   - hb_font_t::has_glyph_h_origin_func	54.72 %	0.05 %	28604	26	blink_platform.dll
                                                     - hb_font_t::has_glyph_h_origin_func	54.60 %	0.08 %	28541	40	blink_platform.dll
                                                       - hb_font_t::has_glyph_h_origin_func	54.13 %	0.30 %	28294	156	blink_platform.dll
                                                         - hb_font_t::get_glyph_h_advance	48.85 %	0.30 %	25538	155	blink_platform.dll
                                                           - std::_Unique_ptr_base<hb_font_t,blink::HbFontDeleter>::get_deleter	48.56 %	0.30 %	25383	157	blink_platform.dll
                                                             - blink::SkiaTextMetrics::getGlyphWidthAndExtentsForHarfBuzz	48.03 %	0.35 %	25107	183	blink_platform.dll
                                                               - SkPaint::getTextWidths	43.94 %	0.70 %	22969	364	skia.dll
                                                                 - SkAutoGlyphCache::SkAutoGlyphCache	30.11 %	0.20 %	15740	105	skia.dll
                                                                   - SkPaint::detachCache	29.29 %	0.20 %	15310	102	skia.dll
                                                                     - SkPaint::descriptorProc	29.09 %	0.34 %	15207	177	skia.dll

Comment 3 by drott@chromium.org, Aug 2 2016

This also shows that we should definitely do, what eae@ suggested: untangle the getGlyphWidthsAndExtentsForHarfBuzz function to only retrieve width and extents separately. Retrieving the extents when we only need the advance with causes extra overhead.


Comment 4 by kojii@chromium.org, Aug 2 2016

#3: agreed.

Found why we shape so much in this test. Count of shapeWord():
A space, from cache: 123,822
From word cache:       4,283
Not from cache:      123,927

Examples of "not from cache", mostly because len>15:
"position:caption;" len=17
"max-height:ne-resize;" len=21

If we add ':' and '-' as word delimiters, the number of uncached shaping goes down to 4,464; i.e., can eliminate 97% cases to reshape.

There's a long word in this test (len=243) and it drags performance of selection quite badly, so additional word delimiters improves the perf only by 25%. I have an idea for really-long-word for break-all/break-word, but haven't got bad enough cases to need to think about it further.

To summary:
1. Our word delimiters and cache max len is tuned for human writing systems and does not perform great for programing languages.
2. Selection/caret relies on the perf of Font::selectionRectForText() a lot.
3. SkPaint::getTextWidths() consumes ~77% of shapeRange(). Not sure if this is normal or specific to this case though.
4. In this case most of the time was spent in shapeRange, not in extractShapeResults, so the drott@'s idea to tune bounds may not help.

Questions to us:
1. Do we want to tune word caching more to non-human writing systems?
2. Can selection/caret reduce the number of calls to Font, to improve when cache doesn't help?
3. SkPaint::getTextWidths() looks too heavy. All text in this test is ASCII, so they should be served from cache in Skia.
  a. Maybe cache isn't hitting, should we look into inside Skia more?
  b. If cache doesn't give us enough perf because of layering problem:
    b.i. Consider changing hb API to allow batch queries?
    b.ii. Is it better to disable cache in Skia and have/keep our own instead to avoid layering overhead?
    b.iii. Other possibilities?

WDYT?
Project Member

Comment 5 by bugdroid1@chromium.org, Aug 2 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/994c65ef57ec7c84b18fd66d7b8b85e5a2dc29bd

commit 994c65ef57ec7c84b18fd66d7b8b85e5a2dc29bd
Author: drott <drott@chromium.org>
Date: Tue Aug 02 20:11:12 2016

Untangle HarfBuzz callbacks for glyph width and extents

Under the hood, SkPaint uses "needsFullMetrics" or just advance
callbacks into the glyph cache. Untangling the calls and only retrieving
either advance or bounds should speed up these HarfBuzz callbacks.

Measured locally I am seeing ~10% improvements on
blink_pref.layout:chapter-reflow and
~blink_pref.layout:ArabicLineLayout.

BUG= 631032 
R=kojii,tzik

Review-Url: https://codereview.chromium.org/2204923002
Cr-Commit-Position: refs/heads/master@{#409295}

[modify] https://crrev.com/994c65ef57ec7c84b18fd66d7b8b85e5a2dc29bd/third_party/WebKit/Source/platform/fonts/shaping/HarfBuzzFace.cpp
[modify] https://crrev.com/994c65ef57ec7c84b18fd66d7b8b85e5a2dc29bd/third_party/WebKit/Source/platform/fonts/skia/SkiaTextMetrics.cpp
[modify] https://crrev.com/994c65ef57ec7c84b18fd66d7b8b85e5a2dc29bd/third_party/WebKit/Source/platform/fonts/skia/SkiaTextMetrics.h

Comment 6 by drott@chromium.org, Aug 2 2016

Thanks for your analysis, Koji, very good to have the word cache hit rate dissected. I'll reply in more detail tomorrow - but if you have a chance to look at it when you get to the office tomorrow - could you profile this after the Untangling CL landed and perhaps get an impression of how it changed things?

Comment 7 by kojii@chromium.org, Aug 3 2016

Cc: kouhei@chromium.org
We have less overhead than before, great, but still:
* bounds is faster, but width is slow.
* SkPaint::getTextWidths() consumes majority of the time.

kouhei@ and twik@ thinks SkPaint::getTextWidths() on Windows is considerably slower than on Linux, they're looking into it.

 - blink::DOMSelection::modify	93.94 %
   - blink::FrameSelection::modify	93.93 %
     - blink::SelectionModifier::modify	92.77 %
       - blink::SelectionModifier::lineDirectionPointForBlockDirectionNavigation	84.84 %
         - blink::SelectionModifier::lineDirectionPointForBlockDirectionNavigation	84.73 %
           - blink::localCaretRectOfPosition	84.58 %
             - blink::localCaretRectOfPositionTemplate<blink::EditingAlgorithm<blink::NodeTraversal> >	84.58 %
               - blink::LayoutText::localCaretRect	84.57 %
                 - blink::InlineTextBox::positionForOffset	84.49 %
                   - blink::Font::selectionRectForText	84.44 %
                     - blink::Font::selectionRectForComplexText	84.41 %
                       - blink::CachingWordShaper::getCharacterRange	84.40 %
                         - WTF::VectorBuffer<WTF::RefPtr<blink::ShapeResult const >,64,WTF::PartitionAllocator>::resetBufferPointer	83.76 %
                           - blink::CachingWordShapeIterator::next	83.50 %
                             - blink::CachingWordShapeIterator::nextForAllowTabs	83.46 %
                               - blink::CachingWordShapeIterator::nextWord	83.38 %
                                 - blink::CachingWordShapeIterator::shapeToEndIndex	83.19 %
                                   - blink::CachingWordShapeIterator::shapeWord	83.05 %
                                     - blink::CachingWordShapeIterator::shapeWordWithoutSpacing	83.03 %
                                       - blink::HarfBuzzShaper::shapeResult	79.72 %
                                         - blink::HarfBuzzShaper::shapeRange	55.60 %
                                           - hb_shape	53.53 %
                                             - hb_shape_full	53.51 %
                                               - hb_shape_plan_execute	53.24 %
                                                 - _hb_ot_shape	53.19 %
                                                   - hb_font_t::has_glyph_h_origin_func	53.17 %
                                                     - hb_font_t::has_glyph_h_origin_func	53.03 %
                                                       - hb_font_t::has_glyph_h_origin_func	52.52 %
                                                         - hb_font_t::get_glyph_h_advance	46.70 %
                                                           - std::_Unique_ptr_base<hb_font_t,blink::HbFontDeleter>::get_deleter	46.37 %
                                                             - blink::SkiaTextMetrics::getGlyphWidthForHarfBuzz	45.87 %
                                                               - SkPaint::getTextWidths	42.18 %
                                                                 - SkAutoGlyphCache::SkAutoGlyphCache	28.61 %
                                                                   - SkPaint::detachCache	27.99 %
                                                                     - SkPaint::descriptorProc	27.84 %
                                                                       + SkPaint::doComputeFastBounds	7.20 %
                                                                       + SkPaint::setupForAsPaths	5.20 %
                                                                       + SkGlyphCache::DetachCache	4.17 %
                                                                       + SkPaint::unflatten	4.12 %
                                                                       + SkBinaryWriteBuffer::SkBinaryWriteBuffer	2.62 %
                                                                       + SkBinaryWriteBuffer::~SkBinaryWriteBuffer	2.25 %
                                                                   + std::unique_ptr<SkGlyphCache,SkFunctionWrapper<void,SkGlyphCache,&SkGlyphCache::AttachCache> >::unique_ptr<SkGlyphCache,SkFunctionWrapper<void,SkGlyphCache,&SkGlyphCache::AttachCache> >	0.51 %
                                                                 + SkCanonicalizePaint::SkCanonicalizePaint	5.58 %
                                                                 + SkAutoGlyphCache::~SkAutoGlyphCache	3.23 %
                                                                 + SkPaint::setupForAsPaths	2.24 %
                                                               + blink::SkiaTextMetrics::SkiaTextMetrics	2.28 %
                                                         + _hb_ot_shape_normalize	1.77 %
                                                         + hb_ot_shape_plan_t::position	1.76 %
                                                         + hb_ot_layout_substitute_start	1.28 %
                                           + blink::HarfBuzzFace::getScaledFont	1.72 %
                                         - blink::HarfBuzzShaper::extractShapeResults	7.64 %
                                           - blink::ShapeResult::insertRun	5.83 %
                                             + blink::FloatRect::unite	1.77 %
                                             + blink::SimpleFontData::boundsForGlyph	1.45 %
                                         + blink::RunSegmenter::consume	6.50 %

Data in sheets:
https://docs.google.com/a/chromium.org/spreadsheets/d/1W5t0nOSKgs1lo3YtKXCZpJn4v1NjRK9zkBn4YTOnKwc/edit?usp=sharing

Comment 8 by drott@chromium.org, Aug 3 2016

I benchmarked on Linux two possible changes against current master (i.e. the master which has the glyph bounds unification CL):

http://roettsch.es/results_glyphBounds.html

master = current master, glyph bounds function unified to metrics based bounds (except mac)
deactivateMap = removing glyphToBounds Map
batchedBounds = a change that uses one call to SkPaint::getTextWidths from ShapeResult::insertRun

I see improvements in a couple of layout related tests: ArabicLineLayout, chapter-reflow-once. While a few of them, like japanese-kokoro-insert and character-fallback have slightly lower values. However, looking at html5-full-render and html-parser-threaded from the blink_perf.parser set, I see nice improvements there.

Perhaps you could review those numbers, kojii@ and kouhei@ and tell me what you think?

I am leaning towards trying this change and removing the glyph to bounds map at least on non Mac platforms and see what performance results we see on the bots, and what memory improvements we see on the Android memory health dashboards, wdyt?

My other conclusion is that batched access to Skia (getTextWidths for the whole run) is not a clear winner over removing the glyph to bounds map, it seems to profit in some benchmarks but not in others. The microbenchmark results are hard for me to interpret. 

If there are no objections, I'll prepare a CL for removing the glyphToBoundsMap everywhere except Mac, where we have performance problems with the editing test when removing it.

Comment 9 by drott@chromium.org, Aug 3 2016

On Mac, removing the glyphToBounds Mac has negative performance effects on html5-full-render and parser threaded as well as on a couple of blink_perf.layout tests, so it seems to me we should keep it on Mac, results are here:
https://roettsch.es/glyphBoundsMac/results.html

Project Member

Comment 10 by bugdroid1@chromium.org, Aug 3 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3af81d9791986c5d47b261dda6f2a672e65ae313

commit 3af81d9791986c5d47b261dda6f2a672e65ae313
Author: drott <drott@chromium.org>
Date: Wed Aug 03 17:52:45 2016

Remove glyph metrics caching on non-Mac platforms

As shown in the benchmarking results discussed in  crbug.com/631032 
removing the glyph metrics caching in Blink on non Mac platforms and
only using the lower level Skia caches instead should not affect overall
layout and text performance negatively. In fact, some performance tests
such as html5-full-render and html-parser-threaded are improving in
local measurements on linux.

This should give us improvements in memory consumption ranging from
tens of kilobytes in a single-origin use renderer up to over a megabyte
on longer lifespan renderers such as in the Android WebView and help
with crbug.com/577306.

On Mac platforms we are still using path based glyph metrics, which seem
to be too slow. Here, the glyph metrics caching in Blink cannot yet be
removed, compare
https://bugs.chromium.org/p/chromium/issues/detail?id=608338#c9

BUG= 589462 , 631032 
R=kojii,tzik,eae

Review-Url: https://codereview.chromium.org/2199333008
Cr-Commit-Position: refs/heads/master@{#409564}

[modify] https://crrev.com/3af81d9791986c5d47b261dda6f2a672e65ae313/third_party/WebKit/Source/platform/fonts/SimpleFontData.h

> Perhaps you could review those numbers, kojii@ and kouhei@ and tell me what you think?
As a HTML parser OWNER, I think we should ignore html-parser{,-threaded} results (both win/loss). Those benchmarks shouldn't create LayoutTree for the target docs, and thus should not be affected by these changes.

html5-full-render does force layouts, so I think this win is actually relevant :)
Can you help me, kouhei@, which perf bots for windows are up to date and which run blink_perf.parser and blink_perf.layout? 

Or in general, could you help analyze the results of the CL landed above, the glyphToBoundsMap removal? I am having trouble figuring out the right graphs from the perf dashboard.
Hmm. The CL landed w/ rev 409564 didn't show any perf improvements nor regressions on the dashboard.
I manually checked blink_perf.{layout, parser} on win bots, but no change.
Sorry let me retract c#13. The CL did have a huge impact memory-wise:
https://chromeperf.appspot.com/group_report?rev=409564

However, it is showing regression on Win bots. see  crbug.com/634477 
tzik,kojii: Given that this is Win specific, is this related to the issue we chatted yesterday?

Comment 15 by kojii@chromium.org, Aug 11 2016

Components: Blink>Fonts
Labels: OS-All

Comment 16 by e...@chromium.org, Nov 30 2017

Status: WontFix (was: Assigned)
No longer an issue in NG.

Sign in to add a comment