RenderTextHarfBuzz: run breaking breaks at spaces in ASCII text |
||
Issue description
If the input text is a single line containing "abc def", RenderTextHarfBuzz::GetSubstringBounds() will return a vector of length 3, even though all the text is one contiguous "chunk". This is because of this logic in ScriptInterval():
while (char_iterator.Advance()) {
ScriptSetIntersect(char_iterator.get(), scripts, &scripts_size);
if (scripts_size == 0U)
return char_iterator.array_pos();
*script = scripts[0];
}
However, the script set for 'a', 'b', 'c', 'd', 'e', and 'f' is { USCRIPT_LATIN } and the script set for ' ' is { USCRIPT_COMMON }, which don't intersect, so any space is treated as a script change for HarfBuzz's purposes.
This probably causes a related issue where we separately shape each space-separated component of text.
,
Aug 1
On the plus side, by breaking at spaces, we do end up hitting the cache more.
,
Aug 1
Good point. That said, ideally, we'd be caching by character (or similar, like Blink) instead of run. I suppose we should only include USCRIPT_COMMON in surrounding ranges if it's a performance win.
,
Aug 1
The breaking for Latin scripts was added in r480350 to: - fix Issue 731563 - reach towards parity with Blink behaviour, which is to break runs at whitespace - improve hits in a hypothetical text-run cache (which ccameron since added \o/) The problem with USCRIPT_COMMON is that not all typefaces provide glyphs for all characters in it, resulting in badness (See the screenshot in https://crbug.com/731563 ). E.g. the Hebrew typeface on mac doesn't have an ASCII space, so if we don't break the ASCII space into its own run we render garbage. Long-term, we should completely merge Blink's and src/ui's run-breaking algorithm (and font-fallback mechanisms, and caching, etc.).
,
Aug 1
Okay, sounds like this is a won't fix (working as intended) then. |
||
►
Sign in to add a comment |
||
Comment 1 by msw@chromium.org
, Aug 1