Investigate why SimpleFontData::width and ::bounds functions are so performance critical |
||||
Issue descriptionSee discussion in https://codereview.chromium.org/1980913002 It's surprising that these functions are so performance critical once we have shaping results in the word cache. Especially the Range API and selection code seems to suffer. Speculating: Perhaps we can put that on a new foundation and adapt it to the wordcache instead of using coordinate to cursor position translation functions from Font.h?
,
Aug 2 2016
From comment #79 of the CL: https://codereview.chromium.org/1980913002#msg79 I profiled (Win10, debug) editing/selection/modify_move/move-by-word-visually-crash-test-5.html and it says 43.94% of time is spent SkPaint::getTextWidths(). Function Name Total CPU (%) Self CPU (%) Total CPU (ms) Self CPU (ms) Module - blink::DOMSelection::modify 94.04 % 0.00 % 49156 1 blink_core.dll - blink::FrameSelection::modify 94.03 % 0.00 % 49152 1 blink_core.dll - blink::SelectionModifier::modify 92.86 % 0.00 % 48543 2 blink_core.dll - blink::SelectionModifier::lineDirectionPointForBlockDirectionNavigation 85.12 % 0.00 % 44494 1 blink_core.dll - blink::SelectionModifier::lineDirectionPointForBlockDirectionNavigation 85.00 % 0.00 % 44434 2 blink_core.dll - blink::localCaretRectOfPosition 84.88 % 0.00 % 44368 0 blink_core.dll - blink::localCaretRectOfPositionTemplate<blink::EditingAlgorithm<blink::NodeTraversal> > 84.88 % 0.01 % 44368 3 blink_core.dll - blink::LayoutText::localCaretRect 84.86 % 0.00 % 44361 1 blink_core.dll - blink::InlineTextBox::positionForOffset 84.80 % 0.00 % 44327 1 blink_core.dll - blink::Font::selectionRectForText 84.74 % 0.00 % 44296 2 blink_platform.dll - blink::Font::selectionRectForComplexText 84.72 % 0.00 % 44284 1 blink_platform.dll - blink::CachingWordShaper::getCharacterRange 84.69 % 0.00 % 44272 0 blink_platform.dll - WTF::VectorBuffer<WTF::RefPtr<blink::ShapeResult const >,64,WTF::PartitionAllocator>::resetBufferPointer 84.08 % 0.01 % 43951 5 blink_platform.dll ? 0.01 % 0.01 % 4 4 blink_platform.dll - blink::CachingWordShapeIterator::next 83.87 % 0.04 % 43841 19 blink_platform.dll - blink::CachingWordShapeIterator::nextForAllowTabs 83.81 % 0.01 % 43810 6 blink_platform.dll - blink::CachingWordShapeIterator::nextWord 83.68 % 0.01 % 43743 7 blink_platform.dll - blink::CachingWordShapeIterator::shapeToEndIndex 83.49 % 0.02 % 43641 8 blink_platform.dll - blink::CachingWordShapeIterator::shapeWord 83.28 % 0.02 % 43533 8 blink_platform.dll - blink::CachingWordShapeIterator::shapeWordWithoutSpacing 83.25 % 0.02 % 43519 13 blink_platform.dll - blink::HarfBuzzShaper::shapeResult 79.93 % 0.11 % 41781 56 blink_platform.dll + blink::HarfBuzzShaper::extractShapeResults 7.53 % 0.10 % 3936 52 blink_platform.dll - blink::HarfBuzzShaper::shapeRange 57.15 % 0.03 % 29877 14 blink_platform.dll - hb_shape 55.06 % 0.01 % 28784 7 blink_platform.dll - hb_shape_full 55.05 % 0.02 % 28777 8 blink_platform.dll - hb_shape_plan_execute 54.80 % 0.04 % 28645 20 blink_platform.dll - hb_font_t::has_glyph_h_origin_func 54.72 % 0.05 % 28604 26 blink_platform.dll - hb_font_t::has_glyph_h_origin_func 54.60 % 0.08 % 28541 40 blink_platform.dll - hb_font_t::has_glyph_h_origin_func 54.13 % 0.30 % 28294 156 blink_platform.dll - hb_font_t::get_glyph_h_advance 48.85 % 0.30 % 25538 155 blink_platform.dll - std::_Unique_ptr_base<hb_font_t,blink::HbFontDeleter>::get_deleter 48.56 % 0.30 % 25383 157 blink_platform.dll - blink::SkiaTextMetrics::getGlyphWidthAndExtentsForHarfBuzz 48.03 % 0.35 % 25107 183 blink_platform.dll - SkPaint::getTextWidths 43.94 % 0.70 % 22969 364 skia.dll - SkAutoGlyphCache::SkAutoGlyphCache 30.11 % 0.20 % 15740 105 skia.dll - SkPaint::detachCache 29.29 % 0.20 % 15310 102 skia.dll - SkPaint::descriptorProc 29.09 % 0.34 % 15207 177 skia.dll
,
Aug 2 2016
This also shows that we should definitely do, what eae@ suggested: untangle the getGlyphWidthsAndExtentsForHarfBuzz function to only retrieve width and extents separately. Retrieving the extents when we only need the advance with causes extra overhead.
,
Aug 2 2016
#3: agreed.
Found why we shape so much in this test. Count of shapeWord():
A space, from cache: 123,822
From word cache: 4,283
Not from cache: 123,927
Examples of "not from cache", mostly because len>15:
"position:caption;" len=17
"max-height:ne-resize;" len=21
If we add ':' and '-' as word delimiters, the number of uncached shaping goes down to 4,464; i.e., can eliminate 97% cases to reshape.
There's a long word in this test (len=243) and it drags performance of selection quite badly, so additional word delimiters improves the perf only by 25%. I have an idea for really-long-word for break-all/break-word, but haven't got bad enough cases to need to think about it further.
To summary:
1. Our word delimiters and cache max len is tuned for human writing systems and does not perform great for programing languages.
2. Selection/caret relies on the perf of Font::selectionRectForText() a lot.
3. SkPaint::getTextWidths() consumes ~77% of shapeRange(). Not sure if this is normal or specific to this case though.
4. In this case most of the time was spent in shapeRange, not in extractShapeResults, so the drott@'s idea to tune bounds may not help.
Questions to us:
1. Do we want to tune word caching more to non-human writing systems?
2. Can selection/caret reduce the number of calls to Font, to improve when cache doesn't help?
3. SkPaint::getTextWidths() looks too heavy. All text in this test is ASCII, so they should be served from cache in Skia.
a. Maybe cache isn't hitting, should we look into inside Skia more?
b. If cache doesn't give us enough perf because of layering problem:
b.i. Consider changing hb API to allow batch queries?
b.ii. Is it better to disable cache in Skia and have/keep our own instead to avoid layering overhead?
b.iii. Other possibilities?
WDYT?
,
Aug 2 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/994c65ef57ec7c84b18fd66d7b8b85e5a2dc29bd commit 994c65ef57ec7c84b18fd66d7b8b85e5a2dc29bd Author: drott <drott@chromium.org> Date: Tue Aug 02 20:11:12 2016 Untangle HarfBuzz callbacks for glyph width and extents Under the hood, SkPaint uses "needsFullMetrics" or just advance callbacks into the glyph cache. Untangling the calls and only retrieving either advance or bounds should speed up these HarfBuzz callbacks. Measured locally I am seeing ~10% improvements on blink_pref.layout:chapter-reflow and ~blink_pref.layout:ArabicLineLayout. BUG= 631032 R=kojii,tzik Review-Url: https://codereview.chromium.org/2204923002 Cr-Commit-Position: refs/heads/master@{#409295} [modify] https://crrev.com/994c65ef57ec7c84b18fd66d7b8b85e5a2dc29bd/third_party/WebKit/Source/platform/fonts/shaping/HarfBuzzFace.cpp [modify] https://crrev.com/994c65ef57ec7c84b18fd66d7b8b85e5a2dc29bd/third_party/WebKit/Source/platform/fonts/skia/SkiaTextMetrics.cpp [modify] https://crrev.com/994c65ef57ec7c84b18fd66d7b8b85e5a2dc29bd/third_party/WebKit/Source/platform/fonts/skia/SkiaTextMetrics.h
,
Aug 2 2016
Thanks for your analysis, Koji, very good to have the word cache hit rate dissected. I'll reply in more detail tomorrow - but if you have a chance to look at it when you get to the office tomorrow - could you profile this after the Untangling CL landed and perhaps get an impression of how it changed things?
,
Aug 3 2016
We have less overhead than before, great, but still:
* bounds is faster, but width is slow.
* SkPaint::getTextWidths() consumes majority of the time.
kouhei@ and twik@ thinks SkPaint::getTextWidths() on Windows is considerably slower than on Linux, they're looking into it.
- blink::DOMSelection::modify 93.94 %
- blink::FrameSelection::modify 93.93 %
- blink::SelectionModifier::modify 92.77 %
- blink::SelectionModifier::lineDirectionPointForBlockDirectionNavigation 84.84 %
- blink::SelectionModifier::lineDirectionPointForBlockDirectionNavigation 84.73 %
- blink::localCaretRectOfPosition 84.58 %
- blink::localCaretRectOfPositionTemplate<blink::EditingAlgorithm<blink::NodeTraversal> > 84.58 %
- blink::LayoutText::localCaretRect 84.57 %
- blink::InlineTextBox::positionForOffset 84.49 %
- blink::Font::selectionRectForText 84.44 %
- blink::Font::selectionRectForComplexText 84.41 %
- blink::CachingWordShaper::getCharacterRange 84.40 %
- WTF::VectorBuffer<WTF::RefPtr<blink::ShapeResult const >,64,WTF::PartitionAllocator>::resetBufferPointer 83.76 %
- blink::CachingWordShapeIterator::next 83.50 %
- blink::CachingWordShapeIterator::nextForAllowTabs 83.46 %
- blink::CachingWordShapeIterator::nextWord 83.38 %
- blink::CachingWordShapeIterator::shapeToEndIndex 83.19 %
- blink::CachingWordShapeIterator::shapeWord 83.05 %
- blink::CachingWordShapeIterator::shapeWordWithoutSpacing 83.03 %
- blink::HarfBuzzShaper::shapeResult 79.72 %
- blink::HarfBuzzShaper::shapeRange 55.60 %
- hb_shape 53.53 %
- hb_shape_full 53.51 %
- hb_shape_plan_execute 53.24 %
- _hb_ot_shape 53.19 %
- hb_font_t::has_glyph_h_origin_func 53.17 %
- hb_font_t::has_glyph_h_origin_func 53.03 %
- hb_font_t::has_glyph_h_origin_func 52.52 %
- hb_font_t::get_glyph_h_advance 46.70 %
- std::_Unique_ptr_base<hb_font_t,blink::HbFontDeleter>::get_deleter 46.37 %
- blink::SkiaTextMetrics::getGlyphWidthForHarfBuzz 45.87 %
- SkPaint::getTextWidths 42.18 %
- SkAutoGlyphCache::SkAutoGlyphCache 28.61 %
- SkPaint::detachCache 27.99 %
- SkPaint::descriptorProc 27.84 %
+ SkPaint::doComputeFastBounds 7.20 %
+ SkPaint::setupForAsPaths 5.20 %
+ SkGlyphCache::DetachCache 4.17 %
+ SkPaint::unflatten 4.12 %
+ SkBinaryWriteBuffer::SkBinaryWriteBuffer 2.62 %
+ SkBinaryWriteBuffer::~SkBinaryWriteBuffer 2.25 %
+ std::unique_ptr<SkGlyphCache,SkFunctionWrapper<void,SkGlyphCache,&SkGlyphCache::AttachCache> >::unique_ptr<SkGlyphCache,SkFunctionWrapper<void,SkGlyphCache,&SkGlyphCache::AttachCache> > 0.51 %
+ SkCanonicalizePaint::SkCanonicalizePaint 5.58 %
+ SkAutoGlyphCache::~SkAutoGlyphCache 3.23 %
+ SkPaint::setupForAsPaths 2.24 %
+ blink::SkiaTextMetrics::SkiaTextMetrics 2.28 %
+ _hb_ot_shape_normalize 1.77 %
+ hb_ot_shape_plan_t::position 1.76 %
+ hb_ot_layout_substitute_start 1.28 %
+ blink::HarfBuzzFace::getScaledFont 1.72 %
- blink::HarfBuzzShaper::extractShapeResults 7.64 %
- blink::ShapeResult::insertRun 5.83 %
+ blink::FloatRect::unite 1.77 %
+ blink::SimpleFontData::boundsForGlyph 1.45 %
+ blink::RunSegmenter::consume 6.50 %
Data in sheets:
https://docs.google.com/a/chromium.org/spreadsheets/d/1W5t0nOSKgs1lo3YtKXCZpJn4v1NjRK9zkBn4YTOnKwc/edit?usp=sharing
,
Aug 3 2016
I benchmarked on Linux two possible changes against current master (i.e. the master which has the glyph bounds unification CL): http://roettsch.es/results_glyphBounds.html master = current master, glyph bounds function unified to metrics based bounds (except mac) deactivateMap = removing glyphToBounds Map batchedBounds = a change that uses one call to SkPaint::getTextWidths from ShapeResult::insertRun I see improvements in a couple of layout related tests: ArabicLineLayout, chapter-reflow-once. While a few of them, like japanese-kokoro-insert and character-fallback have slightly lower values. However, looking at html5-full-render and html-parser-threaded from the blink_perf.parser set, I see nice improvements there. Perhaps you could review those numbers, kojii@ and kouhei@ and tell me what you think? I am leaning towards trying this change and removing the glyph to bounds map at least on non Mac platforms and see what performance results we see on the bots, and what memory improvements we see on the Android memory health dashboards, wdyt? My other conclusion is that batched access to Skia (getTextWidths for the whole run) is not a clear winner over removing the glyph to bounds map, it seems to profit in some benchmarks but not in others. The microbenchmark results are hard for me to interpret. If there are no objections, I'll prepare a CL for removing the glyphToBoundsMap everywhere except Mac, where we have performance problems with the editing test when removing it.
,
Aug 3 2016
On Mac, removing the glyphToBounds Mac has negative performance effects on html5-full-render and parser threaded as well as on a couple of blink_perf.layout tests, so it seems to me we should keep it on Mac, results are here: https://roettsch.es/glyphBoundsMac/results.html
,
Aug 3 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3af81d9791986c5d47b261dda6f2a672e65ae313 commit 3af81d9791986c5d47b261dda6f2a672e65ae313 Author: drott <drott@chromium.org> Date: Wed Aug 03 17:52:45 2016 Remove glyph metrics caching on non-Mac platforms As shown in the benchmarking results discussed in crbug.com/631032 removing the glyph metrics caching in Blink on non Mac platforms and only using the lower level Skia caches instead should not affect overall layout and text performance negatively. In fact, some performance tests such as html5-full-render and html-parser-threaded are improving in local measurements on linux. This should give us improvements in memory consumption ranging from tens of kilobytes in a single-origin use renderer up to over a megabyte on longer lifespan renderers such as in the Android WebView and help with crbug.com/577306. On Mac platforms we are still using path based glyph metrics, which seem to be too slow. Here, the glyph metrics caching in Blink cannot yet be removed, compare https://bugs.chromium.org/p/chromium/issues/detail?id=608338#c9 BUG= 589462 , 631032 R=kojii,tzik,eae Review-Url: https://codereview.chromium.org/2199333008 Cr-Commit-Position: refs/heads/master@{#409564} [modify] https://crrev.com/3af81d9791986c5d47b261dda6f2a672e65ae313/third_party/WebKit/Source/platform/fonts/SimpleFontData.h
,
Aug 4 2016
> Perhaps you could review those numbers, kojii@ and kouhei@ and tell me what you think?
As a HTML parser OWNER, I think we should ignore html-parser{,-threaded} results (both win/loss). Those benchmarks shouldn't create LayoutTree for the target docs, and thus should not be affected by these changes.
html5-full-render does force layouts, so I think this win is actually relevant :)
,
Aug 4 2016
Can you help me, kouhei@, which perf bots for windows are up to date and which run blink_perf.parser and blink_perf.layout? Or in general, could you help analyze the results of the CL landed above, the glyphToBoundsMap removal? I am having trouble figuring out the right graphs from the perf dashboard.
,
Aug 7 2016
Hmm. The CL landed w/ rev 409564 didn't show any perf improvements nor regressions on the dashboard.
I manually checked blink_perf.{layout, parser} on win bots, but no change.
,
Aug 9 2016
Sorry let me retract c#13. The CL did have a huge impact memory-wise: https://chromeperf.appspot.com/group_report?rev=409564 However, it is showing regression on Win bots. see crbug.com/634477 tzik,kojii: Given that this is Win specific, is this related to the issue we chatted yesterday?
,
Aug 11 2016
,
Nov 30 2017
No longer an issue in NG. |
||||
►
Sign in to add a comment |
||||
Comment 1 by kojii@chromium.org
, Jul 26 2016