Break-word splits emoji family sequence under space constraints |
||||||||||
Issue descriptionChrome Version: ToT OS: Linux What steps will reproduce the problem? (1) Open http://roettsch.es/emojifam.html What is the expected result? Emoji shows up as family using NotoColorEmoji What happens instead? Emoji is broken up into family members, displayed vertically. Luckily we do have a test case covering our LineBreakType::kBreakCharacter behavior and emoji family sequences, but somehow in Layout, the sequence still gets split up: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/text/TextBreakIteratorTest.cpp?q=TextBreakIt&sq=package:chromium&l=108 Removing the display: inline-block; property from body and adding a width: property, then shrinking the width will lead to similar results. Under size constraints, layout will at some point break the emoji sequence if break-word is specified. Easiest to reproduce on Linux, since NotoColorEmoji can be instantiated using FreeType there. Possible to reproduce on other OSes by patching WebFontDecoder: https://chromium-review.googlesource.com/575064 Koji, any ideas what could be going wrong here? Thanks to Nicholas for the reduction.
,
Jul 20 2017
> Easiest to reproduce on Linux... Oh, I missed this. I see.
,
Jul 20 2017
On second thought, I wonder maybe we should fix OffsetForPosition rather than line breaker, since there are a lot more users of OffsetForPosition, like hit-testing, ellipsis (text-overflow), etc.
,
Jul 20 2017
Just to clarify the windows behavior a bit for anyone stumbling across this, the NotoColorEmoji font is cbdt/cblc for the color emojis. Edge now correctly renders this font type, while chrome currently only supports colr/cpal on windows (see https://bugs.chromium.org/p/chromium/issues/detail?id=703332). Chrome seems to try to load the font, but displays blank glyphs, causing the flicker as the emoji initially renders with the default style. Gecko likely ignores the font type at the moment completely.
,
Jul 20 2017
Thank you for taking a look, Koji. Could you give some context regarding what is wrong with OffsetForPosition or how the latter interacts with/causes the line breaking to go wrong? I can't make the connection between your two statements. Which OffsetForPosition are you talking about - the on in Font.h? Thanks again.
,
Jul 21 2017
OffsetForPosition() computes "cluster" using the data returned from HB, which accommodates shaping cluster but not full grapheme cluster as defined in UAX#29. So ShapingLineBreaker uses OffsetForPosition() to compute candidate position, then use break iterator to determine the full grapheme cluster boundaries. In current line breaker, we normally go through all break opportunities using break iterator, but when mid-word-break (break-all, break-word, etc.) is needed, it computes like ShapingLineBreaker; use OffsetForPosition() to find candidate position, then use break iterator only if break-all. So one fix is for the current line breaker to use break iterator for break-word too. But I searched for how OffsetForPostion() is used, hit-test, ellipsis...all these need the full grapheme cluster as defined in UAX#29, so if we go that route, we'll need to patch them all. On the other hand, OffsetForPostion() is rather low-level, should be fast. I wonder using break iterator may hit some perf problem, especially when caller already has code to adjust grapheme cluster boundaries to the return value of OffsetForPosition(). +yosin@ > Which OffsetForPosition are you talking about - the on in Font.h? Font.h one calls ShapeResult, so the impls are the same. At which level we'd like to change (if we want that route) is still to discuss though.
,
Jul 21 2017
Thanks for the detailed explanations, now I understand. Maybe we need something on top of OffsetForPosition? I agree with you that this function is rather low leve. We should describe its semantics better, being that it's breaking on HB cluster information, not on full grapheme clusters. If we would fix it, I suppose we would fix it by combining it with a ICU break iterator. So I am not sure there is a benefit in putting it below the OffsetForPosition interface - for example if OffsetForPosition is used in contexts where the smaller HB clusters are needed. So we could have an API on top of it, either combining it with an ICU break iterator, or only using a break iterator. I am trying to understand the combining better: > So ShapingLineBreaker uses OffsetForPosition() to compute candidate position, then use break iterator to determine the full grapheme cluster boundaries. So OffsetForPosition is used for translating from coordinate to character index - in which direction searches ShapingLineBreaker from there, only forward, or only backwards? Or depending on RTL?
,
Jul 21 2017
On third thoughts, maybe it's not. I mean, OffsetForPosition() and grapheme cluster is an issue we'll need to look into, but this case might hit other problems. The family emoji is single glyph, correct? OffsetForPosition() shouldn't return between glyphs. I think I have some idea of possible cause, let me look into it first.
,
Jul 21 2017
,
Jul 21 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/14ec221a9013017a9007181bb7414464ee63f1b4 commit 14ec221a9013017a9007181bb7414464ee63f1b4 Author: Koji Ishii <kojii@chromium.org> Date: Fri Jul 21 23:54:45 2017 Adjust the mid-word-break heuristic limit to 4em for Emoji ZWJ sequence Mid-word-break algorithm kicks in when CSS properties such as 'word-wrap: break-word' is used. Logically speaking, we should layout as normal, and break words if overflow occurs. However, Blink has 2em heuristic limit because of the performance problem in certain cases. This means Blink measures characters that go beyond the right edge only up to 2em before it starts mid-word-break. See the previous CL[1] for more details. This patch adjusts it to 4em. The longest common ligature is Emoji ZWJ sequence, and its v5.0[2] can ligate 4 Emoji into 1 at maximum. To handle this, 3em overflow is needed, and 1em for a rainy day fund. [1] r403830, https://codereview.chromium.org/2077313002 [2] http://unicode.org/emoji/charts/emoji-zwj-sequences.html Bug: 746222 Change-Id: Ic58b58716502f4ae9a06c1e91d2f5de944bc7978 Reviewed-on: https://chromium-review.googlesource.com/581249 Reviewed-by: Dominik Röttsches <drott@chromium.org> Commit-Queue: Koji Ishii <kojii@chromium.org> Cr-Commit-Position: refs/heads/master@{#488813} [modify] https://crrev.com/14ec221a9013017a9007181bb7414464ee63f1b4/third_party/WebKit/Source/core/layout/line/BreakingContextInlineHeaders.h
,
Jul 23 2017
,
Jul 24 2017
Great, thanks! Perhaps we can merge this to 61?
,
Jul 25 2017
Request merge to M61, as the fix is desired. The fix is safe, change line break constant from 2 to 4, and is now in a few days in Canary.
,
Jul 26 2017
Your change meets the bar and is auto-approved for M61. Please go ahead and merge the CL to branch 3163 manually. Please contact milestone owner if you have questions. Owners: amineer@(Android), cmasso@(iOS), ketakid @(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jul 26 2017
Pls merge you change to M61 branch 3163 by 5:00 PM today, Wednesday if possible so we can take it in for next week M61 last dev release. Thank you.
,
Jul 27 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d2c58dc97bb1e3675d23402fd08399f290367c3e commit d2c58dc97bb1e3675d23402fd08399f290367c3e Author: Koji Ishii <kojii@chromium.org> Date: Thu Jul 27 05:55:27 2017 Merge 3163: Adjust the mid-word-break heuristic limit to 4em for Emoji ZWJ sequence Mid-word-break algorithm kicks in when CSS properties such as 'word-wrap: break-word' is used. Logically speaking, we should layout as normal, and break words if overflow occurs. However, Blink has 2em heuristic limit because of the performance problem in certain cases. This means Blink measures characters that go beyond the right edge only up to 2em before it starts mid-word-break. See the previous CL[1] for more details. This patch adjusts it to 4em. The longest common ligature is Emoji ZWJ sequence, and its v5.0[2] can ligate 4 Emoji into 1 at maximum. To handle this, 3em overflow is needed, and 1em for a rainy day fund. [1] r403830, https://codereview.chromium.org/2077313002 [2] http://unicode.org/emoji/charts/emoji-zwj-sequences.html TBR=kojii@chromium.org (cherry picked from commit 14ec221a9013017a9007181bb7414464ee63f1b4) Bug: 746222 Change-Id: Ic58b58716502f4ae9a06c1e91d2f5de944bc7978 Reviewed-on: https://chromium-review.googlesource.com/581249 Reviewed-by: Dominik Röttsches <drott@chromium.org> Commit-Queue: Koji Ishii <kojii@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#488813} Reviewed-on: https://chromium-review.googlesource.com/587705 Reviewed-by: Koji Ishii <kojii@chromium.org> Cr-Commit-Position: refs/branch-heads/3163@{#74} Cr-Branched-From: ff259bab28b35d242e10186cd63af7ed404fae0d-refs/heads/master@{#488528} [modify] https://crrev.com/d2c58dc97bb1e3675d23402fd08399f290367c3e/third_party/WebKit/Source/core/layout/line/BreakingContextInlineHeaders.h
,
Jul 28 2017
,
Aug 1 2017
Verified this issue on Ubuntu 14.04 using chrome latest Dev #61.0.3163.25 by following test case provided in the original comment. Observed emoji shows up as family using NotoColorEmoji without breaking. Hence adding TE-Verified label for M-61. Thanks!
,
Aug 30 2017
Reopened due to revert in https://chromium.googlesource.com/chromium/src.git/+/01e06953c7cf0b0db8f69b145a424efafee91aeb
,
Aug 31 2017
The fix for issue 749319, issue 757446 should have fixed this case in a different way. M62 has the fix. |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by kojii@chromium.org
, Jul 20 2017