New issue
Advanced search Search tips

Issue 719045 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 3
Type: Bug

Blocking:
issue 865527



Sign in to add a comment

Using ICU tokenization for East Asian phrase selection on Clank

Project Member Reported by wychen@chromium.org, May 5 2017

Issue description

Chrome Version: M59
OS: Android

What steps will reproduce the problem?
(1) Visit a page with East Asian language, like https://ja.wikipedia.org/wiki/%E6%97%A5%E6%9C%AC
(2) Tap a term, like 憲法, or 日本国

What is the expected result?
The whole term should be selected immediately.

What happens instead?
Only one character is selected. If Contextual Search is enabled, it would kick in, and expand the selection after the server response. Contextual Search is not always enabled, and the round-trip time is usually 1~2 seconds.

On desktop, double clicking the term works. It would be nice if it works on Android as well.
 

Comment 1 by js...@chromium.org, Apr 30 2018

Cc: js...@chromium.org
That's because Chrome is not including CJ dictionary for word segmentation to save the space. 




Cc: -yoichio@chromium.org
More context below.

In some languages, the words are not broken by spaces. ICU provides a huge dictionary to detect word boundaries in Thai, Chinese, Japanese, Burmese, Lao, and Khmer. Due to the size of such a table, the part for Chinese and Japanese is not shipped on mobile. CJ dicts would add about 2MB binary size. For other languages, the table size is in the range of 50KB to 500KB each.

This decision was done during the early Clank days back in around 2011.
Blocking: 865527

Sign in to add a comment