Add a test string that would overflow even with word-break:break-all to f/t/midword-break-before-surrogate-pair |
||||
Issue descriptionSpun off from https://codereview.chromium.org/2447513002/ With update to ICU 58, regional indicator pairs are treated like ID for LB. That is, there is a line breaking opportunity between 'regional indicator pairs' whether 'word-break: break-all' is applied or not. midword-break-before-surrogate-pair assumes that there is no line breaking opportunity and used it as a test string that would overflow even when 'w-b' is set to 'break-all'. LB=IS, LB=IN, LB=OP have some non-BMP characters that can be used for testing. (there might be other categories) http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Line_Break=Inseparable:] http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Line_Break=Inseparable:] http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Line_Break=Open_Punctuation:] See https://drafts.csswg.org/css-text/#valdef-word-break-break-all
,
Nov 2 2016
BTW, we're trying to solve several types of "shape-across-xxx", such as shape-across-element-boundaries, and many important cases are in our radar, but shape-across-break-opportunities is still even more challenging and I can't tell how well we can support it atm. Even if we managed to support it, it's likely to hit the layout performance since we need to re-shape after line break. Issue 479370 and issue 601694 have similar unresolved technical challenge. I don't know the motivation behind the change in ICU, but I have mild preference to tailor the rules for Blink and avoid hitting the technical challenge for regional indicators.
,
Nov 17 2016
> I don't know the motivation behind the change in ICU, but I have mild preference > to tailor the rules for Blink and avoid hitting the technical challenge for > regional indicators. Sorry that I don't understand why this issue has to do with what you wrote about 'shape-across-lb-opportunity'. A pair of RI codepoints will always stay together. So, I have little clue why you brought up 'shape-across-lb-opportunity'. > Happen to know reasons why ICU decided to handle RI differently from UAX#14, > or is UAX#14 PU behind ICU 58? Because UAX#14 PU makes a lot more sense than otherwise. :-) ICU 58 implemented draft Unicode standards in a few places ( Emoji 4.0 beta instead of Emoji 3.0, handling of confusable characters, etc).
,
Nov 17 2016
Well, in this case, UAX 14 (the latest version) specifies the same behavior regarding RI pairs. So, it's not a matter of UAX 14 vs UAX 14 PU.
,
Nov 17 2016
Ok. it appears that you misunderstood what I wrote (sorry if it's not clear). There is NO LB opportunity between one RI and the other RI as long as both of them belong to a matching pair. There is a LB opportunity between two adjacent pairs of RIs.
,
Nov 17 2016
Ah, got it. yes I misunderstood that way, thank you for pointing it out.
,
Aug 13
,
Sep 7
|
||||
►
Sign in to add a comment |
||||
Comment 1 by kojii@chromium.org
, Nov 2 2016