Using ICU tokenization for East Asian phrase selection on Clank |
|||
Issue descriptionChrome Version: M59 OS: Android What steps will reproduce the problem? (1) Visit a page with East Asian language, like https://ja.wikipedia.org/wiki/%E6%97%A5%E6%9C%AC (2) Tap a term, like 憲法, or 日本国 What is the expected result? The whole term should be selected immediately. What happens instead? Only one character is selected. If Contextual Search is enabled, it would kick in, and expand the selection after the server response. Contextual Search is not always enabled, and the round-trip time is usually 1~2 seconds. On desktop, double clicking the term works. It would be nice if it works on Android as well.
,
May 7 2018
,
Jul 18
More context below. In some languages, the words are not broken by spaces. ICU provides a huge dictionary to detect word boundaries in Thai, Chinese, Japanese, Burmese, Lao, and Khmer. Due to the size of such a table, the part for Chinese and Japanese is not shipped on mobile. CJ dicts would add about 2MB binary size. For other languages, the table size is in the range of 50KB to 500KB each. This decision was done during the early Clank days back in around 2011.
,
Jul 25
|
|||
►
Sign in to add a comment |
|||
Comment 1 by js...@chromium.org
, Apr 30 2018