New issue
Advanced search Search tips

Issue 862773 link

Starred by 4 users

Issue metadata

Status: Assigned
Owner:
Cc:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 1
Type: Bug



Sign in to add a comment

RenderTextHarfBuzz is slow on macOS

Project Member Reported by ccameron@chromium.org, Jul 11

Issue description

RenderTextHarfBuzz is causing significant UI-thread jank in MacViews.

Examine the following trace -- this shows a 150 msec delay between pressing a key in the omnibox, and anything actually showing up.

Calls to GetFallbackFont and GetFallbackFonts account for a huge delay, more than once.

We also make several hundred calls to RenderTextHarfBuzz::ShapeRunWithFontInternal. We added a cache for the results of this call in  issue 826265 , but the calls are still far too slow.

After this massive 100-msec computation, we display the omnibox contents that are attached.
 
trace.png
123 KB View Download
omnibox.png
148 KB View Download
I instrumented a unit test to just call ShapeRunWithFontInternal in a loop, and got a good profile of it. Attached a picture.

We're spending nearly all of our time on the call to CTTypesetterCreateWithAttributedStringAndOptions at [1]. We then call CTTypesetterCreateLine specifying a range of [0, 0], indicating "use the full string we just created".

Now let's pop up a bit and look at the sequences of calls we're making to ShapeRunWithFontInternal. When I type in "n", one of the options that pops up is news.ycombinator.com. We have a flurry of calls being made to ShapeRunWithFontInternal, with the text and range parameters as follows.
  text:"news.ycombinator.com" range:{0,4}
  text:"news.ycombinator.com" range:{4,5}
  text:"news.ycombinator.com" range:{5,16}
  text:"news.ycombinator.com" range:{16,17}
  text:"news.ycombinator.com" range:{17,20}
This is actually a very mild example -- for the text "Extensions running in developer mode can harm your computer. If you're not a developer, you should disable these extensions running in developer mode to stay safe.", we have a call for every single word(!!!). Similar disasters happen for URLs.

The more efficient way to go about this would be to
- call CTTypesetterCreateWithAttributedStringAndOptions for the full string (and, you know, cache it)
- have the individual calls to ShapeRunWithFontInternal use that cached string
- and specify their sub-range with CTTypesetterCreateLine

[1]
https://cs.chromium.org/chromium/src/third_party/harfbuzz-ng/src/src/hb-coretext.cc?rcl=058708a665cdd9e796581dbcf60a5778d3f5e240&l=971
sample.png
291 KB View Download
Cc: drott@chromium.org behdad@chromium.org
+drott/behdad (harfbuzz experts) may have some ideas/comments too. (Note we've added a text run cache, but it's not great for omnibox results, which need to be low-latency and often need to typeset URLs.)

ccameron: Note in r480350 I tweaked the text run logic to always break at whitespace. I guess the URL example (with ".") wouldn't have been affected. But, generally, this is necessary for font fallbacks in i18n. E.g. the font that provides Hebrew glyphs on Mac (e.g. סיבית ) doesn't also provide a glyph for an ASCII space. Maybe we can introduce some special-case breaking rules for ASCII urls so we don't break at ".", and limit cases we break at " " as well. But this would (in theory) work against the text run cache. ( Issue 797215  captures some performance degradation around this too).

That is, breaking at "words" should provide better cache hits, but that may fail since we're taking context into account. (Does Blink? Some of the comments about matching "words" in its cache suggest to me it does not). Maybe there are some simple rules we can use to limit the amount of unicode context we send to harfbuzz to get more cache hits.
Labels: Group-Performance
Labels: M-69
Owner: ccameron@chromium.org
Status: Assigned (was: Available)
Labels: Target-69
Labels: ReleaseBlock-Stable
We only use the adjacent context in the function arabic_joining, so the following patch disables context for non-RTL languages (super-hacky, but should help some):
https://chromium-review.googlesource.com/c/chromium/src/+/1135723
I think we can do the suggested comment in #1 without any changes to the HB interface. In particular, we can find the RunLists that share all parameters except for range in ShapeRunLists at [1], and then we can parse all of the results together at [2].

[1]
https://cs.chromium.org/chromium/src/ui/gfx/render_text_harfbuzz.cc?rcl=356993f45e105380f1ac1db6894e6038830d8569&l=1690

[2]
https://cs.chromium.org/chromium/src/ui/gfx/render_text_harfbuzz.cc?rcl=356993f45e105380f1ac1db6894e6038830d8569&l=1063
Project Member

Comment 9 by bugdroid1@chromium.org, Jul 13

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/791f1c19acbf159aab8fb9652e95b77222075239

commit 791f1c19acbf159aab8fb9652e95b77222075239
Author: Christopher Cameron <ccameron@chromium.org>
Date: Fri Jul 13 03:58:48 2018

MacViews: Only use context for text shape when using RTL languages

The call to hb_buffer_add_utf16 will read up to 5 characters adjacent
to the specified range. This context is only used in the function
arabic_joining.

We can get much better caching if we don't include this context in our
keys. Arabic is an RTL language, so we can omit this context in the key
for all non-RTL languages.

Bug:  860612 , 862773
Change-Id: Icc11059886ab835032141ad25b966d358f7177b3
Reviewed-on: https://chromium-review.googlesource.com/1135723
Commit-Queue: ccameron <ccameron@chromium.org>
Reviewed-by: Sidney San Martín <sdy@chromium.org>
Cr-Commit-Position: refs/heads/master@{#574834}
[modify] https://crrev.com/791f1c19acbf159aab8fb9652e95b77222075239/ui/gfx/render_text_harfbuzz.cc

I have a prototype of the idea in #8 at
https://chromium-review.googlesource.com/c/chromium/src/+/1136042

Doing a release build to see if it actually helps.
I gathered data for the time it takes for the calls to ShapeRunList that result from me pressing the "t" key in the omnibox. This is the average over 50 runs, with caching disabled (with caching enabled, all but the first run are instantaneous).

This gets us a pretty decent speedup, ranging from none (or slightly negative) for small strings to 5-10x for the longer runs.

    text |  num |    original |         new
  length | runs | time (msec) | time (msec)
  -------+------+-------------+------------
       1 |    1 |      0.0969 |       0.107
      13 |    3 |      0.1928 |       0.151
      20 |    5 |      0.3478 |       0.206
      38 |   12 |      0.6326 |       0.270
      59 |   17 |      0.8375 |       0.281
      60 |   17 |      0.8399 |       0.281
      95 |   38 |      1.9517 |       0.339
      96 |   38 |      1.8519 |       0.328
      97 |   38 |      1.8781 |       0.337
      98 |   39 |      2.1015 |       0.392
     179 |   54 |      3.3648 |       0.566
     197 |   67 |     10.5980 |       1.160

The patch here is pretty messy (and doesn't handle all cases, but those cases aren't hit by this data). I'll try to create a cleaner version of it. The patch will probably split TextRunHarfBuzz into a "common parameters to multiple runs" and "run-specific data" section, and add hashing there.
FWIW, I investigated RenderTextMac a little this morning, but I'm stopping since you have an (easier) fix. Just a couple of data points of the omnibox's calls to RenderText::Elide:

text length | HarfBuzz (ms) | Mac (ms)
------------+---------------+---------
         33 |       0.14885 |  0.18692
        430 |       2.39401 |  0.22898

So the speedup is similar, which is good.
Cc: jmukthavaram@chromium.org
Labels: Needs-Feedback
ccameron@, Please let us know clear steps to verify this issue from TE end.
Thanks..!
Project Member

Comment 14 by bugdroid1@chromium.org, Jul 16

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/102c5ce5dbc8e4823a1719b82bc790d328d0c77c

commit 102c5ce5dbc8e4823a1719b82bc790d328d0c77c
Author: Christopher Cameron <ccameron@chromium.org>
Date: Mon Jul 16 16:52:29 2018

HarfBuzz: Separate common parameters and shape output

Optimization part 1 of ~3

Separate TextRunHarfBuzz into three structures
 - the parameters that are common to multiple text runs
 - the output from shaping
 - everything else (the range and the post-shaping output)

Replace the ShapeRunWithFontOutput structure with the
TextRunHarfBuzz ShapeOutput structure.

This is a step towards merging the HarfBuzz calls runs that share
the same CommonParams.

Bug: 862773
Change-Id: I940af96e0c105b079baef24c97034f24f05ef56a
Reviewed-on: https://chromium-review.googlesource.com/1137852
Reviewed-by: Leonard Grey <lgrey@chromium.org>
Commit-Queue: ccameron <ccameron@chromium.org>
Cr-Commit-Position: refs/heads/master@{#575311}
[modify] https://crrev.com/102c5ce5dbc8e4823a1719b82bc790d328d0c77c/ui/gfx/render_text_harfbuzz.cc
[modify] https://crrev.com/102c5ce5dbc8e4823a1719b82bc790d328d0c77c/ui/gfx/render_text_harfbuzz.h
[modify] https://crrev.com/102c5ce5dbc8e4823a1719b82bc790d328d0c77c/ui/gfx/render_text_unittest.cc

Project Member

Comment 15 by bugdroid1@chromium.org, Jul 16

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/52d3b85518c057c8ff55ab2f1c21ff17afc68576

commit 52d3b85518c057c8ff55ab2f1c21ff17afc68576
Author: Christopher Cameron <ccameron@chromium.org>
Date: Mon Jul 16 19:55:10 2018

HarfBuzz: Adjust parameterization of shape functions

Optimization part 2 of ~3

This is in preparation for a follow-on change that will change
RenderTextHarfBuzz::ShapeRun to take a single CommonParams, and a list
of all runs that share these parameters.

To that end, pull out explicit use of RenderTextHarfBuzz::common, and
pass the CommonParams explicitly, wherever possible.

Bug: 862773
Change-Id: Ia349c23ad58ab590c8ec6f05c6a9f1dcc253ab57
Reviewed-on: https://chromium-review.googlesource.com/1137891
Commit-Queue: ccameron <ccameron@chromium.org>
Reviewed-by: Leonard Grey <lgrey@chromium.org>
Cr-Commit-Position: refs/heads/master@{#575392}
[modify] https://crrev.com/52d3b85518c057c8ff55ab2f1c21ff17afc68576/ui/gfx/render_text_harfbuzz.cc
[modify] https://crrev.com/52d3b85518c057c8ff55ab2f1c21ff17afc68576/ui/gfx/render_text_harfbuzz.h
[modify] https://crrev.com/52d3b85518c057c8ff55ab2f1c21ff17afc68576/ui/gfx/render_text_unittest.cc

Ugh, this is turning out to be more complicated than I'd thought. I have a patch that enables merging calls into CoreText but ... there are hacks that I think go too far.

For the curious, it's at
https://chromium-review.googlesource.com/c/chromium/src/+/1139891

Beyond this, I don't see any reasonable-scope fixes to our use of HarfBuzz, or to the HarfBuzz API, which would fix this.

Which is lower risk: That patch, or switching to RenderTextMac?
RenderTextMac is not feature-complete. It has no text-editing, selection ( Issue 722217 ), nor multiline support. It was never able to be cached efficiently by views::Label (see e.g.,  Issue 665280 , which led to r522617 and a saga in  Issue 798927 ).

RenderTextMac is a non-starter, and I don't think it's worth comparing performance of its ShapeRunList() implementation, since it lacks so much functionality we need, and has worse "real" performance when RenderText objects are cached inside toolkit-views.
Labels: -ReleaseBlock-Stable -M-69 -Target-69 Target-70 M-70
MacViews triage: I'm stripping RBS from this and moving the target to M70. This is still important to fix - RTHB is one of our major perf bottlenecks - but it is no longer impacting enough to prevent ship and I don't think we'll have a solid answer by M69.
Project Member

Comment 19 by bugdroid1@chromium.org, Jul 23

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9801165f33842fc79b27b2cf0f0f30681cc5f628

commit 9801165f33842fc79b27b2cf0f0f30681cc5f628
Author: Christopher Cameron <ccameron@chromium.org>
Date: Mon Jul 23 23:42:01 2018

HarfBuzz: Identify runs with common font parameters

Optimization part 2.5 of ~3

In ShapeRunsWithFont, identify runs that the same initial CommonParams
and insert them into a RunVector. Change the rest of the shaping
pipeline to operate on these RunVectors, and move most of the
processing of CommonParams to be done once per RunVector instead of
once per run.

This change was intended to be part of the change to allow coalescing
of similar calls into HarfBuzz (which may or may not land), but it was
found to improve the performance in the cache hit case by a factor of
almost 2.

Bug: 862773
Change-Id: I8df02c2a8622eb9be74a65191fad79613ff1da2d
Reviewed-on: https://chromium-review.googlesource.com/1145904
Commit-Queue: ccameron <ccameron@chromium.org>
Reviewed-by: Leonard Grey <lgrey@chromium.org>
Cr-Commit-Position: refs/heads/master@{#577330}
[modify] https://crrev.com/9801165f33842fc79b27b2cf0f0f30681cc5f628/ui/gfx/render_text_harfbuzz.cc
[modify] https://crrev.com/9801165f33842fc79b27b2cf0f0f30681cc5f628/ui/gfx/render_text_harfbuzz.h
[modify] https://crrev.com/9801165f33842fc79b27b2cf0f0f30681cc5f628/ui/gfx/render_text_unittest.cc

Cc: e...@chromium.org
+eae, +behdad

I took a look at this with eae@.

The font that is slow is .AppleSystemUIFont, and the function that is slow is _hb_coretext_shape.

This appears not to be an AAT font (because _hb_coretext_aat_shape isn't on the stack in #1), but the shape function is shared between _hb_coretext_aat_shape and _hb_coretext_shape (see [1]), and takes about 0.5 msec per instantiation (so, 0.5 msec per word in Chrome's UI).

[1] https://cs.chromium.org/chromium/src/third_party/harfbuzz-ng/src/src/hb-coretext.cc?rcl=2b76767bf572364d3d647cdd139f2044a7ad06b2&l=1410
My guess is that we spend time recreating CTFont over and over again whereas it should be cached.  Are you reusing the same hb_font_t?
It was *much* worse when we were recreating the CTFont each time :). Full beachball territory --  Issue 785522 .

https://chromium-review.googlesource.com/851293 switched to using `hb_coretext_font_create(ct_font);` so that the CTFont is not recreated, just wrapped in the hb_font_t. I think we _could_ (and should) cache at a higher level than what gfx::CreateHarfBuzzFont [1] currently does, but that doesn't appear to be the main bottleneck.

[1] https://cs.chromium.org/chromium/src/ui/gfx/harfbuzz_font_skia.cc?type=cs&q=gfx::CreateHarfBuzzFont&sq=package:chromium&g=0&l=284
Can you do some debugging, to see if we are recreating hb_face_t / hb_font_t for each word?  That will help guide where I we should look next. Thanks.
Project Member

Comment 24 by bugdroid1@chromium.org, Jul 25

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4e2ba897d5fb48e2c5a08dcde8a353432c9a2558

commit 4e2ba897d5fb48e2c5a08dcde8a353432c9a2558
Author: Christopher Cameron <ccameron@chromium.org>
Date: Wed Jul 25 20:41:42 2018

HarfBuzz: Rename CommonParams to FontParams

Bug: 862773
Change-Id: Idca82f1c4da83a5cff57f77a66bd2706e842f619
Reviewed-on: https://chromium-review.googlesource.com/1149277
Commit-Queue: ccameron <ccameron@chromium.org>
Reviewed-by: Leonard Grey <lgrey@chromium.org>
Cr-Commit-Position: refs/heads/master@{#578043}
[modify] https://crrev.com/4e2ba897d5fb48e2c5a08dcde8a353432c9a2558/ui/gfx/render_text_harfbuzz.cc
[modify] https://crrev.com/4e2ba897d5fb48e2c5a08dcde8a353432c9a2558/ui/gfx/render_text_harfbuzz.h
[modify] https://crrev.com/4e2ba897d5fb48e2c5a08dcde8a353432c9a2558/ui/gfx/render_text_unittest.cc

Labels: Merge-Request-69
Requesting merge of the patches so far, to keep M69 in sync.

There may be one more patch to improve performance, but that will probably be left until M70.
Project Member

Comment 26 by sheriffbot@chromium.org, Jul 27

Labels: -Merge-Request-69 Hotlist-Merge-Approved Merge-Approved-69
Your change meets the bar and is auto-approved for M69. Please go ahead and merge the CL to branch 3497 manually. Please contact milestone owner if you have questions.
Owners: amineer@(Android), kariahda@(iOS), cindyb@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 27 by bugdroid1@chromium.org, Jul 27

Labels: -merge-approved-69 merge-merged-3497
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e4059f4faa614f3c5da7ca8bf55626bd2e058a97

commit e4059f4faa614f3c5da7ca8bf55626bd2e058a97
Author: Christopher Cameron <ccameron@chromium.org>
Date: Fri Jul 27 21:14:31 2018

HarfBuzz: Identify runs with common font parameters

Optimization part 2.5 of ~3

In ShapeRunsWithFont, identify runs that the same initial CommonParams
and insert them into a RunVector. Change the rest of the shaping
pipeline to operate on these RunVectors, and move most of the
processing of CommonParams to be done once per RunVector instead of
once per run.

This change was intended to be part of the change to allow coalescing
of similar calls into HarfBuzz (which may or may not land), but it was
found to improve the performance in the cache hit case by a factor of
almost 2.

TBR=ccameron@chromium.org

(cherry picked from commit 9801165f33842fc79b27b2cf0f0f30681cc5f628)

Bug: 862773
Change-Id: I8df02c2a8622eb9be74a65191fad79613ff1da2d
Reviewed-on: https://chromium-review.googlesource.com/1145904
Commit-Queue: ccameron <ccameron@chromium.org>
Reviewed-by: Leonard Grey <lgrey@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#577330}
Reviewed-on: https://chromium-review.googlesource.com/1153655
Reviewed-by: ccameron <ccameron@chromium.org>
Cr-Commit-Position: refs/branch-heads/3497@{#174}
Cr-Branched-From: 271eaf50594eb818c9295dc78d364aea18c82ea8-refs/heads/master@{#576753}
[modify] https://crrev.com/e4059f4faa614f3c5da7ca8bf55626bd2e058a97/ui/gfx/render_text_harfbuzz.cc
[modify] https://crrev.com/e4059f4faa614f3c5da7ca8bf55626bd2e058a97/ui/gfx/render_text_harfbuzz.h
[modify] https://crrev.com/e4059f4faa614f3c5da7ca8bf55626bd2e058a97/ui/gfx/render_text_unittest.cc

Project Member

Comment 28 by bugdroid1@chromium.org, Jul 27

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a93590b5b30cb74c8d92222dfc637986053f9ee4

commit a93590b5b30cb74c8d92222dfc637986053f9ee4
Author: Christopher Cameron <ccameron@chromium.org>
Date: Fri Jul 27 21:50:25 2018

HarfBuzz: Rename CommonParams to FontParams

TBR=ccameron@chromium.org

(cherry picked from commit 4e2ba897d5fb48e2c5a08dcde8a353432c9a2558)

Bug: 862773
Change-Id: Idca82f1c4da83a5cff57f77a66bd2706e842f619
Reviewed-on: https://chromium-review.googlesource.com/1149277
Commit-Queue: ccameron <ccameron@chromium.org>
Reviewed-by: Leonard Grey <lgrey@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#578043}
Reviewed-on: https://chromium-review.googlesource.com/1153636
Reviewed-by: ccameron <ccameron@chromium.org>
Cr-Commit-Position: refs/branch-heads/3497@{#179}
Cr-Branched-From: 271eaf50594eb818c9295dc78d364aea18c82ea8-refs/heads/master@{#576753}
[modify] https://crrev.com/a93590b5b30cb74c8d92222dfc637986053f9ee4/ui/gfx/render_text_harfbuzz.cc
[modify] https://crrev.com/a93590b5b30cb74c8d92222dfc637986053f9ee4/ui/gfx/render_text_harfbuzz.h
[modify] https://crrev.com/a93590b5b30cb74c8d92222dfc637986053f9ee4/ui/gfx/render_text_unittest.cc

Re #23: Yes, we are indeed re-creating the hb_font_t for every word at [1]. It would be very easy for us not to do that, but, like tapted@ mentioned in #22, that's not where the slowness is.

The root of the slowness is the call to CTTypesetterCreateWithAttributedStringAndOptions at [2].

When I discussed this with eae@, it sounded as though it could be possible to shape these fonts without going through CoreText in this way -- did I understand that correctly? (this is a path that is shared with AAT fonts)

[1] https://cs.chromium.org/chromium/src/ui/gfx/render_text_harfbuzz.cc?rcl=ec4400b28a913290681684212c9dfccb5ea103ed&l=1091

[2] https://cs.chromium.org/chromium/src/third_party/harfbuzz-ng/src/src/hb-coretext.cc?rcl=2b76767bf572364d3d647cdd139f2044a7ad06b2&l=970

If it's really easy, please fix the hb_font_t recreation first and then we'll look at what's slow.  It doesn't currently make sense to me, as Blink is going through the same code path and does not have this problem.
Re #20:

> This appears not to be an AAT font [...]

If you see slowness through CoreText calls, such as CTTypesetterCreateWithAttributedStringAndOptions, then you're reaching the hb-coretext code path, which should only be taken for AAT fonts. On Mac OS from 10.12. San Francisco is the system UI font, which does make use of AAT features.

Is there a plan to improve shaping of AAT fonts in HarfBuzz?
(or alternatively, is there a plan to move the text shaping from blink into ui/, since it is much more effective at minimizing calls to CT?)
> Is there a plan to improve shaping of AAT fonts in HarfBuzz?

Yes. We like to implement AAT shaping natively in HarfBuzz.  Here's the issue tracking that:

  https://github.com/harfbuzz/harfbuzz/issues/322

This is about 50% implemented already.  I hope to finish it later this year.  That should give up huge speedup.
Cc: etienneb@chromium.org
 Issue 619684  was a mac-specific bug relating to font fallbacks, which seems to be the bottleneck for Issue 894459 on Windows. See comments in  Issue 619684  for more info.

We should call gfx::GetFallbackFonts (plural) only in very rare cases on Mac. Likely it will _also_ be drawing the unicode string incorrectly (e.g. with tofu glyphs), and a GetFallbackFonts call is a last-ditch attempt to replace some of the tofu.

On Mac, the API we call from

bool GetFallbackFont(const Font& font,
                     const base::char16* text,
                     int text_length,
                     Font* result);

(i.e. CTFontCreateForString()), is likely to always give the optimal font for a given text run.


(We should probably make GetFallbackFonts a member function on gfx::Font, which atomically caches the result after generating the vector the first time).
I'm adding UMA metrics to get more information on this:
  https://chromium-review.googlesource.com/c/chromium/src/+/1281227
ccameron@, do you have any reproducible benchmark steps or measurements that I could try out? Working together with behdad@, we are very close to merging native AAT support in HarfBuzz to work in Chrome, so that we do not need to go through CoreText anymore. It would be great to collect some numbers: If you have any steps, I could give it a go. Thanks.

https://chromium-review.googlesource.com/c/chromium/src/+/1275945
That's great news!

The benchmarks that I had were hacky things that I'd thrown into unittests (perhaps I should have committed them).

Probably a good test would be to try using a bookmark folder (as in the video in  issue 826265 ), with can_use_cache forced to false at [0]. I suspect that with the HB roll, the performance will be okay without the cache (though we can leave the cache for now).

[0] https://cs.chromium.org/chromium/src/ui/gfx/render_text_harfbuzz.cc?rcl=684a87668622e1ae8e85bb1bb514ce6a42b43565&l=1909

Sign in to add a comment