RenderTextHarfbuzz doesn't break runs when switching from ascii to non-ascii when whitespace is involved. |
|||
Issue descriptionChrome Version : 60.0.3112.7 OS Version: OS X 10.12.5 So... I'm trying to track down some odd behaviour. Consider the text "סיבית – ויקיפדיה". This is the title of the Wikipedia page at https://he.wikipedia.org/wiki/%D7%A1%D7%99%D7%91%D7%99%D7%AA 16 characters. RTHB currently breaks at 6, 7, 8 and 16. That's from the right, so the runs are: "סיבית " "–" " " "ויקיפדיה" This causes the first run to have a mix of ascii and non-ascii glyphs. On Mac, there is no fallback from a system UI font that has all those glyphs. The result is icky (see attached mixed.png) Basically, putting a space *after* RTL text does not break. But putting a space *before* RTL text does break. I don't think this is a unicode/bidi thing. The same thing happens with Japanese. E.g. " ありがと" breaks the run after the first character and renders fine. But "ありがと " has no breaks and struggles. Note these are ascii spaces, not full-width (?) spaces. (Typing a space in Japanese IME doesn't cause the same problems). Starting to trace... I think the problem is some code in RTHB's int ScriptInterval(const base::string16& text, size_t start, size_t length, UScriptCode* script) specifically while (char_iterator.Advance()) { // Special handling to merge white space into the previous run. if (u_isUWhiteSpace(char_iterator.get())) continue; Removing that special handling seems to make these problems go away. Runs of whitespace are still merged, but only if they are runs of whitespace in the same script. Do we need that special handling for anything? (Can I kill it?)
,
Jun 9 2017
That logic was added in https://codereview.chromium.org/1070223004 Didn't dive too much into the context for that. But if we can kill it without breaking anything else, we should.
,
Jun 9 2017
[mac triage] Getting this out of the unconfirmed queue
,
Jun 19 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/141af5c41b4e6a777e5c6d53575f124bc838865e commit 141af5c41b4e6a777e5c6d53575f124bc838865e Author: tapted <tapted@chromium.org> Date: Mon Jun 19 05:07:23 2017 RenderText: always break runs at whitespace RenderTextHarfBuzz is currently inconsistent with where it chooses to break runs around whitespace whenever non-ascii characters are involved. Currently it attaches runs of whitespace _after_ characters to the preceding run (ignoring whether there is a script/ascii/non-ascii mismatch between the run and the kind of whitespace). However, it does not always attach runs of whitespace that occur _before_ runs (even if they are both ascii). Attaching whitespace while ignoring what script that whitespace is from can cause problems. A font is not obligated to provide whitespace glyphs for all scripts. To fix this, always break at whitespace. This is closer to Blink behaviour - shorter runs without whitespace makes better use of the cache of text runs kept in blink's Font class; indexed by string (typically, a dictionary word). BUG= 731563 Review-Url: https://codereview.chromium.org/2942843002 Cr-Commit-Position: refs/heads/master@{#480350} [modify] https://crrev.com/141af5c41b4e6a777e5c6d53575f124bc838865e/ui/gfx/render_text_harfbuzz.cc [modify] https://crrev.com/141af5c41b4e6a777e5c6d53575f124bc838865e/ui/gfx/render_text_unittest.cc
,
Jun 27 2017
|
|||
►
Sign in to add a comment |
|||
Comment 1 by tapted@chromium.org
, Jun 9 2017