New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 746222 link

Starred by 1 user

Issue metadata

Status: Duplicate
Owner:
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 3
Type: Bug



Sign in to add a comment

Break-word splits emoji family sequence under space constraints

Project Member Reported by drott@chromium.org, Jul 19 2017

Issue description

Chrome Version: ToT
OS: Linux

What steps will reproduce the problem?
(1) Open http://roettsch.es/emojifam.html

What is the expected result?
Emoji shows up as family using NotoColorEmoji

What happens instead?
Emoji is broken up into family members, displayed vertically.

Luckily we do have a test case covering our LineBreakType::kBreakCharacter behavior and emoji family sequences, but somehow in Layout, the sequence still gets split up:
https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/text/TextBreakIteratorTest.cpp?q=TextBreakIt&sq=package:chromium&l=108

Removing the display: inline-block; property from body and adding a width: property, then shrinking the width will lead to similar results. Under size constraints, layout will at some point break the emoji sequence if break-word is specified.

Easiest to reproduce on Linux, since NotoColorEmoji can be instantiated using FreeType there. Possible to reproduce on other OSes by patching WebFontDecoder:
https://chromium-review.googlesource.com/575064

Koji, any ideas what could be going wrong here?

Thanks to Nicholas for the reduction.


 

Comment 1 by kojii@chromium.org, Jul 20 2017

Yeah, the current layout engine doesn't use LazyLineBreakIterator for mid-word-break except when break-all. I didn't like that and that's why I implemented kBreakCharacter, but currently it's used only in LayoutNG.

I tested the page on Windows, and seeing something strange:
* In Chrome, family appears and then disappears (blank screen), probably when Noto is loaded.
* In Edge, family appears and then replaced with different family.
* In Gecko, family appears but keep it (failed to load web fonts?)

I guess the "different family" Edge renders later is the family of NotoColorEmoji?

Comment 2 by kojii@chromium.org, Jul 20 2017

> Easiest to reproduce on Linux...
Oh, I missed this. I see.

Comment 3 by kojii@chromium.org, Jul 20 2017

On second thought, I wonder maybe we should fix OffsetForPosition rather than line breaker, since there are a lot more users of OffsetForPosition, like hit-testing, ellipsis (text-overflow), etc.

Comment 4 by nreiter@google.com, Jul 20 2017

Just to clarify the windows behavior a bit for anyone stumbling across this, the NotoColorEmoji font is cbdt/cblc for the color emojis. Edge now correctly renders this font type, while chrome currently only supports colr/cpal on windows (see https://bugs.chromium.org/p/chromium/issues/detail?id=703332). Chrome seems to try to load the font, but displays blank glyphs, causing the flicker as the emoji initially renders with the default style. Gecko likely ignores the font type at the moment completely.

Comment 5 by drott@google.com, Jul 20 2017

Thank you for taking a look, Koji. Could you give some context regarding what is wrong with OffsetForPosition or how the latter interacts with/causes the line breaking to go wrong? I can't make the connection between your two statements. Which OffsetForPosition are you talking about - the on in Font.h? Thanks again.

Comment 6 by kojii@chromium.org, Jul 21 2017

Cc: yosin@chromium.org
OffsetForPosition() computes "cluster" using the data returned from HB, which accommodates shaping cluster but not full grapheme cluster as defined in UAX#29. So ShapingLineBreaker uses OffsetForPosition() to compute candidate position, then use break iterator to determine the full grapheme cluster boundaries.

In current line breaker, we normally go through all break opportunities using break iterator, but when mid-word-break (break-all, break-word, etc.) is needed, it computes like ShapingLineBreaker; use OffsetForPosition() to find candidate position, then use break iterator only if break-all.

So one fix is for the current line breaker to use break iterator for break-word too.

But I searched for how OffsetForPostion() is used, hit-test, ellipsis...all these need the full grapheme cluster as defined in UAX#29, so if we go that route, we'll need to patch them all.

On the other hand, OffsetForPostion() is rather low-level, should be fast. I wonder using break iterator may hit some perf problem, especially when caller already has code to adjust grapheme cluster boundaries to the return value of OffsetForPosition(). +yosin@

> Which OffsetForPosition are you talking about - the on in Font.h?

Font.h one calls ShapeResult, so the impls are the same. At which level we'd like to change (if we want that route) is still to discuss though.

Comment 7 by drott@chromium.org, Jul 21 2017

Thanks for the detailed explanations, now I understand. Maybe we need something on top of OffsetForPosition? I agree with you that this function is rather low leve. We should describe its semantics better, being that it's breaking on HB cluster information, not on full grapheme clusters.

If we would fix it, I suppose we would fix it by combining it with a ICU break iterator. So I am not sure there is a benefit in putting it below the OffsetForPosition interface - for example if OffsetForPosition is used in contexts where the smaller HB clusters are needed.

So we could have an API on top of it, either combining it with an ICU break iterator, or only using a break iterator. 

I am trying to understand the combining better:

> So ShapingLineBreaker uses OffsetForPosition() to compute candidate position, then use break iterator to determine the full grapheme cluster boundaries.

So OffsetForPosition is used for translating from coordinate to character index - in which direction searches ShapingLineBreaker from there, only forward, or only backwards? Or depending on RTL?

Comment 8 by kojii@chromium.org, Jul 21 2017

On third thoughts, maybe it's not. I mean, OffsetForPosition() and grapheme cluster is an issue we'll need to look into, but this case might hit other problems. The family emoji is single glyph, correct? OffsetForPosition() shouldn't return between glyphs.

I think I have some idea of possible cause, let me look into it first.
Project Member

Comment 10 by bugdroid1@chromium.org, Jul 21 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/14ec221a9013017a9007181bb7414464ee63f1b4

commit 14ec221a9013017a9007181bb7414464ee63f1b4
Author: Koji Ishii <kojii@chromium.org>
Date: Fri Jul 21 23:54:45 2017

Adjust the mid-word-break heuristic limit to 4em for Emoji ZWJ sequence

Mid-word-break algorithm kicks in when CSS properties such as
'word-wrap: break-word' is used. Logically speaking, we should layout
as normal, and break words if overflow occurs.

However, Blink has 2em heuristic limit because of the performance
problem in certain cases. This means Blink measures characters that go
beyond the right edge only up to 2em before it starts mid-word-break.
See the previous CL[1] for more details.

This patch adjusts it to 4em. The longest common ligature is Emoji ZWJ
sequence, and its v5.0[2] can ligate 4 Emoji into 1 at maximum. To
handle this, 3em overflow is needed, and 1em for a rainy day fund.

[1] r403830, https://codereview.chromium.org/2077313002
[2] http://unicode.org/emoji/charts/emoji-zwj-sequences.html

Bug:  746222 
Change-Id: Ic58b58716502f4ae9a06c1e91d2f5de944bc7978
Reviewed-on: https://chromium-review.googlesource.com/581249
Reviewed-by: Dominik Röttsches <drott@chromium.org>
Commit-Queue: Koji Ishii <kojii@chromium.org>
Cr-Commit-Position: refs/heads/master@{#488813}
[modify] https://crrev.com/14ec221a9013017a9007181bb7414464ee63f1b4/third_party/WebKit/Source/core/layout/line/BreakingContextInlineHeaders.h

Comment 11 by kojii@chromium.org, Jul 23 2017

Status: Fixed (was: Available)

Comment 12 by drott@chromium.org, Jul 24 2017

Great, thanks! Perhaps we can merge this to 61?

Comment 13 by kojii@chromium.org, Jul 25 2017

Labels: Merge-Request-61
Request merge to M61, as the fix is desired. The fix is safe, change line break constant from 2 to 4, and is now in a few days in Canary.
Project Member

Comment 14 by sheriffbot@chromium.org, Jul 26 2017

Labels: -Merge-Request-61 Hotlist-Merge-Approved Merge-Approved-61
Your change meets the bar and is auto-approved for M61. Please go ahead and merge the CL to branch 3163 manually. Please contact milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), ketakid @(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Pls merge you change to M61 branch 3163 by 5:00 PM today, Wednesday if possible so we can take it in for next week M61 last dev release. Thank you.
Project Member

Comment 16 by bugdroid1@chromium.org, Jul 27 2017

Labels: -merge-approved-61 merge-merged-3163
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d2c58dc97bb1e3675d23402fd08399f290367c3e

commit d2c58dc97bb1e3675d23402fd08399f290367c3e
Author: Koji Ishii <kojii@chromium.org>
Date: Thu Jul 27 05:55:27 2017

Merge 3163: Adjust the mid-word-break heuristic limit to 4em for Emoji ZWJ sequence

Mid-word-break algorithm kicks in when CSS properties such as
'word-wrap: break-word' is used. Logically speaking, we should layout
as normal, and break words if overflow occurs.

However, Blink has 2em heuristic limit because of the performance
problem in certain cases. This means Blink measures characters that go
beyond the right edge only up to 2em before it starts mid-word-break.
See the previous CL[1] for more details.

This patch adjusts it to 4em. The longest common ligature is Emoji ZWJ
sequence, and its v5.0[2] can ligate 4 Emoji into 1 at maximum. To
handle this, 3em overflow is needed, and 1em for a rainy day fund.

[1] r403830, https://codereview.chromium.org/2077313002
[2] http://unicode.org/emoji/charts/emoji-zwj-sequences.html

TBR=kojii@chromium.org

(cherry picked from commit 14ec221a9013017a9007181bb7414464ee63f1b4)

Bug:  746222 
Change-Id: Ic58b58716502f4ae9a06c1e91d2f5de944bc7978
Reviewed-on: https://chromium-review.googlesource.com/581249
Reviewed-by: Dominik Röttsches <drott@chromium.org>
Commit-Queue: Koji Ishii <kojii@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#488813}
Reviewed-on: https://chromium-review.googlesource.com/587705
Reviewed-by: Koji Ishii <kojii@chromium.org>
Cr-Commit-Position: refs/branch-heads/3163@{#74}
Cr-Branched-From: ff259bab28b35d242e10186cd63af7ed404fae0d-refs/heads/master@{#488528}
[modify] https://crrev.com/d2c58dc97bb1e3675d23402fd08399f290367c3e/third_party/WebKit/Source/core/layout/line/BreakingContextInlineHeaders.h

Comment 17 by kojii@chromium.org, Jul 28 2017

Owner: kojii@chromium.org
Labels: TE-Verified-M61 TE-Verified-61.0.3163.25
Verified this issue on Ubuntu 14.04 using chrome latest Dev #61.0.3163.25 by following test case provided in the original comment. Observed emoji shows up as family using NotoColorEmoji without breaking. Hence adding TE-Verified label for M-61.

Thanks!
746222.png
11.3 KB View Download

Comment 19 by drott@chromium.org, Aug 30 2017

Status: Assigned (was: Fixed)
Reopened due to revert in https://chromium.googlesource.com/chromium/src.git/+/01e06953c7cf0b0db8f69b145a424efafee91aeb

Comment 20 by kojii@chromium.org, Aug 31 2017

Mergedinto: 749319
Status: Duplicate (was: Assigned)
The fix for issue 749319,  issue 757446  should have fixed this case in a different way. M62 has the fix.

Sign in to add a comment