Update CursorMovementIterator while getting along with ICU updating. |
|||||||||||
Issue descriptionDiscovered while investigating webkit unit test failure for ICU 62 update ( bug 850334 ) third_party/blink/renderer/core/editing/state_machines/state_machine_util.cc has various Unicode character properties hard-coded, but it's not updated for a while. Some character properties such as EmojiGAZ are obsolete (as of Unicode 11). Indic syllable properties have undergone quite extensive changes. Moreover, IsGraphemeBreak() implemented a version of UAX 29 (Unicode Text Segmentation), but UAX 29 tends to be updated every Unicode release (every year). It's prone to become obsolete. I wonder why these character lists were hard-coded and IsGraphemeBreak() were implemented in Blink. ( https://codereview.chromium.org/1856603002 ) IMHO, those things had better be delegated to ICU's BreakIterator. If necessary, BreakIterator can be tailored. Blink layout uses ICU's break iterator. Apparently, IsGraphemeBreak is used is used in editing.
,
Jun 27 2018
,
Jun 27 2018
,
Jun 27 2018
Cursor movement needs to be different from UAX#29. WebKit and old Blink use a forked copy of ICU rules for cursor movement. When the forked rules weren't maintained very well, editing team decided to move to is own code by leaning Android doing so, but now the state machine is no longer maintained, so we're back to the original problem. nona@, what Android team does to keep it up-to-date? Is the effort sharable?
,
Jun 27 2018
>#4 I haven't changed any rules in Android in P. Only emoji set was updated.
,
Jun 27 2018
Sorry, I removed comment 5 due to that has internal only link and I missed #1 comment. At that time, the emoji implementation in ICU was buggy and far from the latest spec. So we decided to implement our own. I believe this has now solved. So, once the 2nd item is addressed by ICU, I think we can remove our state machine and switch back to the ICU implementation. I'll submit a feature request to ICU for 2nd item.
,
Jun 27 2018
For the Android stuff, almost the same reason as WebKit, we have our implementation of grapheme breaker. https://android.googlesource.com/platform/frameworks/minikin/+/master/libs/minikin/GraphemeBreak.cpp I haven't changed anything in this code in P.
,
Jun 27 2018
Wait, I just looked the BreakIterator API list and found BreakIterator::adoptText which accepts CharacterIterator instead of UChar* or UnicodeString. http://icu-project.org/apiref/icu4c/classicu_1_1BreakIterator.html#a1fd72184b59936bb62b07af071e2b0c1 The API doc is not clear but if this works like setText, this could be an solution for the 2nd item.
,
Jun 28 2018
Is the code you mentioned a grapheme cluster or a caret boundary?
,
Jun 28 2018
The code is a grapheme cluster and Android use that as a caret boundary.
,
Jun 28 2018
,
Jul 3
> Cursor movement needs to be different from UAX#29. WebKit and old Blink use a forked copy of ICU rules for cursor movement Yeah, I remember that. The cursor movement is tricky and a simple break-iterator-based method may not work in some situations without an extra-layer of code, but I'm digressing. Back to the topic: A rule tailored from the latest ICU would be still easier to maintain than a completely different custom code because we can compare the tailored rule against the original and port the difference on top of the latest CLDR/ICU. > At that time, the emoji implementation in ICU was buggy and far from the latest spec IIRC, in Q2 2016 (when this custom code was added to Blink), we could have cherry-picked ICU data/ breakiterator rules from ICU 58-to-be to support the latest Emoji spec available back then. :-) (ICU 58 was released in October 2016). Anyway, putting aside the past, it'd be great if what Nona mentioned in comment 9 could work for the 2nd point in comment 1. In the meantime, I'm marking this as a blocker to bug 850334 (ICU update to 62) although I may be able to find a temporary work around to update ICU 'now' than later.
,
Jul 4
> The code is a grapheme cluster and Android use that as a caret boundary. If this is UAX#29, then we're degrading in some i18n scripts. The value is unclear if Android doesn't support it, but I saw WebKit added some more rules for i18n cursor movements last year, so we're not as great as WebKit in this regard.
,
Jul 4
> The value is unclear if Android doesn't support it Even if a given Android device does not have a font for a new script, one can always use a web font. The failing tests with ICU 62.1 are as following. One is clearly about grapheme cluster, but the other two seem to be about editing. blink_platform_unittests (retry summary) failures: TextBoundariesTest.ForwardWhitespaces webkit_unit_tests (retry summary) failures: StateMachineUtilTest.IsGraphmeBreak_EmojiModifier EditingUtilitiesTest.uncheckedPreviousNextOffset
,
Jul 5
> The failing tests with ICU 62.1 are as following If this is blocking, or important, please assign to yosin@. I think I'm with you, but also we need his consensus. > Even if a given Android device does not have a font for a new script, one can always use a web font. True, but the point is, it doesn't work outside Blink; e.g., Android text edit.
,
Jul 7
I'm down to this test and going to take a look. EditingUtilitiesTest.uncheckedPreviousNextOffset I fixed the other two tests: Whitespaces: expected results were wrong per the latest UAX 29. Statemachine.IsGraph*: again expected results were wrong. I also removed unnecessary codes per the latest UAX 29).
,
Jul 7
EditingUtilitiesTest.uncheckedPreviousNextOffset has the same issue (the latest UAX 29 does not have GB 10 any more) and I fixed the test as well.
,
Jul 8
https://chromium-review.googlesource.com/c/chromium/src/+/1111818 has a partial fix for Blink's implementation of GB rules. I didn't venture to add GB12 (RI sequence handling). It may be done somewhere else, but I haven't looked for it). Moving Grapheme BI back to ICU may be better (if possible, which seems to be the case) in terms of the maintenance.
,
Jul 9
nona@, will you update Blink's state machine to adopt new Unicode emoji?
,
Jul 9
Talked with yosin@, he's ok either way; i.e., nona@ to fix IsGraphemeBreak, or going back to CursorMovementIterator.
,
Jul 10
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e9126f1d03725c2ae97d524985971d66089eede3 commit e9126f1d03725c2ae97d524985971d66089eede3 Author: Jungshik Shin <jshin@chromium.org> Date: Tue Jul 10 08:28:18 2018 Update ICU to 62.1 For the actual changes, see https://chromium-review.googlesource.com/c/chromium/deps/icu/+/1128503 Update a few tests per CLDR 33-1/ICU 62/Unicode 11 data changes. * Arabic number format expected results are also updated because Arabic locales now use "European" digits by default in Chromium's ICU to match the behavior of other Google products. The CLDR upstream is expcted to follow suit soon. * Blink's homegrown grapheme cluster breakiterator is partly updated per UAX 29 (text segmentation) in Unicode 11. (see crbug.com/856947) * Whitespaces are kept together by a new word breaking rule ( http://unicode.org/reports/tr29/#WB3d ). Update the test to reflect that change. * Fix the currency formatting code in payment to handle better cases with a space (U+00A0 / U+0020) between a currency code and a numeric value (e.g. "-USD 47.37" in en-AU for USD). A test was added to CurrencyAmounts/PaymentsCurrencyFormatterTest.IsValidCurrencyFormat for a negative value. Cq_Include_Trybots: master.tryserver.blink:linux_trusty_blink_rel,mac10.12_blink_rel,win10_blink_rel;master.tryserver.chromium.android:android_blink_rel Bug: 850334 ,856947 Test: base_unittests, and all other tests Cq-Include-Trybots: luci.chromium.try:linux_layout_tests_layout_ng Change-Id: I6e7e76df358fc1e04c9eb95422b3ab7bfb8f78fd Reviewed-on: https://chromium-review.googlesource.com/1111818 Commit-Queue: Jungshik Shin <jshin@chromium.org> Reviewed-by: Seigo Nonaka <nona@chromium.org> Reviewed-by: Kent Tamura <tkent@chromium.org> Reviewed-by: Yoshifumi Inoue <yosin@chromium.org> Reviewed-by: Mathieu Perreault <mathp@chromium.org> Reviewed-by: Avi Drissman <avi@chromium.org> Cr-Commit-Position: refs/heads/master@{#573650} [modify] https://crrev.com/e9126f1d03725c2ae97d524985971d66089eede3/DEPS [modify] https://crrev.com/e9126f1d03725c2ae97d524985971d66089eede3/base/i18n/number_formatting_unittest.cc [modify] https://crrev.com/e9126f1d03725c2ae97d524985971d66089eede3/base/i18n/time_formatting_unittest.cc [modify] https://crrev.com/e9126f1d03725c2ae97d524985971d66089eede3/components/payments/core/currency_formatter.cc [modify] https://crrev.com/e9126f1d03725c2ae97d524985971d66089eede3/components/payments/core/currency_formatter_unittest.cc [modify] https://crrev.com/e9126f1d03725c2ae97d524985971d66089eede3/third_party/WebKit/LayoutTests/fast/forms/time-multiple-fields/time-multiple-fields-localization.html [modify] https://crrev.com/e9126f1d03725c2ae97d524985971d66089eede3/third_party/WebKit/LayoutTests/platform/linux/fast/forms/time-multiple-fields/time-multiple-fields-localization-expected.txt [modify] https://crrev.com/e9126f1d03725c2ae97d524985971d66089eede3/third_party/WebKit/LayoutTests/platform/mac-mac10.12/fast/forms/time-multiple-fields/time-multiple-fields-localization-expected.txt [modify] https://crrev.com/e9126f1d03725c2ae97d524985971d66089eede3/third_party/WebKit/LayoutTests/platform/mac/fast/forms/time-multiple-fields/time-multiple-fields-localization-expected.txt [modify] https://crrev.com/e9126f1d03725c2ae97d524985971d66089eede3/third_party/WebKit/LayoutTests/platform/win/fast/forms/time-multiple-fields/time-multiple-fields-localization-expected.txt [delete] https://crrev.com/4ea4214a59dfcd1fe8f2cda5b435b96a118ee1c9/third_party/WebKit/LayoutTests/platform/win7/fast/forms/time-multiple-fields/time-multiple-fields-localization-expected.txt [modify] https://crrev.com/e9126f1d03725c2ae97d524985971d66089eede3/third_party/blink/renderer/core/editing/editing_utilities_test.cc [modify] https://crrev.com/e9126f1d03725c2ae97d524985971d66089eede3/third_party/blink/renderer/core/editing/state_machines/state_machine_util.cc [modify] https://crrev.com/e9126f1d03725c2ae97d524985971d66089eede3/third_party/blink/renderer/core/editing/state_machines/state_machine_util_test.cc [modify] https://crrev.com/e9126f1d03725c2ae97d524985971d66089eede3/third_party/blink/renderer/platform/text/platform_locale_test.cc [modify] https://crrev.com/e9126f1d03725c2ae97d524985971d66089eede3/third_party/blink/renderer/platform/text/text_boundaries_test.cc
,
Jul 10
,
Jul 13
,
Aug 2
,
Sep 25
Design doc: https://goo.gl/HyigTY
,
Jan 4
|
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by js...@chromium.org
, Jun 27 2018