New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 684609 link

Starred by 6 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Apr 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Bug

Blocking:
issue 689620
issue 699370



Sign in to add a comment

Distinguish zh-Hans vs zh-Hant

Project Member Reported by riesa@chromium.org, Jan 24 2017

Issue description

The CLD3 LangID model replacing CLD2 recognizes Chinese text but does not distinguish between Simplified and Traditional text. 

This does not affect translation results, but does affect source language reporting in the UI since there is no "Chinese" option just "Chinese (Simplified)" and "Chinese (Traditional)".

Since changing the UI and related abstractions is a rather high-complexity project touching code in C++, Java (Android), and Objective C (iOS), we opt instead for a simpler solution:

1. Include the Chinese Hans-Hant transliteration data shipped with the standard ICU distribution, but not currently included in Chromium.
2. Using the data from (1), run a deterministic Chinese script classifier to determine Simplified or Traditional for correct reporting in the UI.
 

Comment 1 by riesa@chromium.org, Jan 24 2017

Description: Show this description
Blocking: 689620

Comment 3 by js...@chromium.org, Feb 15 2017

Cc: -jungshik@google.com js...@chromium.org
Project Member

Comment 5 by bugdroid1@chromium.org, Feb 22 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/af9b038a6bf7758c74df279bc15a77a907ee61af

commit af9b038a6bf7758c74df279bc15a77a907ee61af
Author: riesa <riesa@chromium.org>
Date: Wed Feb 22 19:25:05 2017

Roll third_party/icu from 9cd2828 to 450be73

http://chromium.googlesource.com/chromium/deps/icu.git/+log/9cd2828..450be73

Two changes:
1. 450be73 Adds Hans-Hant transliterators and drops el-Upper from ICU data.
2. ec5152f Make two icu fuzz targets more useful.

BUG= 684609 

Review-Url: https://codereview.chromium.org/2705303003
Cr-Commit-Position: refs/heads/master@{#452162}

[modify] https://crrev.com/af9b038a6bf7758c74df279bc15a77a907ee61af/DEPS

Comment 6 by riesa@chromium.org, Mar 8 2017

Blocking: 699370

Comment 8 by riesa@chromium.org, Mar 8 2017

Cc: groby@chromium.org
Labels: Merge-Request-58
Project Member

Comment 9 by sheriffbot@chromium.org, Mar 9 2017

Labels: -Merge-Request-58 Merge-Review-58 Hotlist-Merge-Review
This bug requires manual review: DEPS changes referenced in bugdroid comments.
Please contact the milestone owner if you have questions.
Owners: amineer@(clank), cmasso@(bling), bhthompson@(cros), govind@(desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Before we approve merge to M58, could you please confirm change is well baked/verified in Canary, having enough automation coverage and will be a safe merge? Thank you.
Cc: abdulsyed@chromium.org

Comment 13 by riesa@google.com, Mar 11 2017

Latest patch addresses a memory regression reported on Android when visiting Chinese websites. We can and should wait a few days to ensure the perf bots are happy now.
Re #13:

Sure, please update the bug with perf bots result.

Also before we approve merge to M58, could you please confirm change is well baked/verified in Canary, having enough automation coverage and will be a safe merge? Thank you
Please look comments #14 - we're promoting M58 to Beta on Thursday, so Merge will have to be in by Wednesday 4PM. 
Labels: -Merge-Review-58 Merge-Approved-58
Approved - please merge to M58 Branch 3029
Project Member

Comment 18 by bugdroid1@chromium.org, Mar 15 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/deps/icu.git/+/c781b5f673cb9f87cfaf0433af9dba54948dd9b3

commit c781b5f673cb9f87cfaf0433af9dba54948dd9b3
Author: Jungshik Shin <jshin@chromium.org>
Date: Wed Mar 15 23:28:57 2017

Drops unused Hans-Hant ICU transliteration data.

Summary of data size decrease:
 android/icudtl.dat   6610128 ->  6573024 bytes
 common/icudtb.dat   10166816 -> 10129712 bytes
 common/icudtl.dat   10166816 -> 10129712 bytes

Patch by riesa@chromium.org

BUG= 684609 
R=jshin@chromium.org

Review-Url: https://codereview.chromium.org/2747173004 .

[modify] https://crrev.com/c781b5f673cb9f87cfaf0433af9dba54948dd9b3/android/icudtl.dat
[modify] https://crrev.com/c781b5f673cb9f87cfaf0433af9dba54948dd9b3/common/icudtb.dat
[modify] https://crrev.com/c781b5f673cb9f87cfaf0433af9dba54948dd9b3/common/icudtl.dat

Comment 20 by zkoch@chromium.org, Mar 16 2017

Since this merge missed the beta cut on Wednesday, is there a new branch # to use for next beta?
Cc: yyushkina@chromium.org
Your change is approved for M58. Please merge( Branch : 3029) ASAP so that it will be picked up for next Beta Release, RC cut on (Monday-03/20) at 5.00 PM PST.
Project Member

Comment 23 by bugdroid1@chromium.org, Mar 17 2017

Project Member

Comment 24 by bugdroid1@chromium.org, Mar 17 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/90d81c5653dc958f3f9699da7e555090ee8612ce

commit 90d81c5653dc958f3f9699da7e555090ee8612ce
Author: Rouslan Solomakhin <rouslan@chromium.org>
Date: Fri Mar 17 23:06:29 2017

[Merge M-58] Implements ChineseScriptClassifier functionality without icu::Transliterator

BUG= 684609 

Review-Url: https://codereview.chromium.org/2743843002
Cr-Commit-Position: refs/heads/master@{#456243}
(cherry picked from commit 062df28033e325db34be74c68da1f2bec888229d)

Review-Url: https://codereview.chromium.org/2759813002 .
Cr-Commit-Position: refs/branch-heads/3029@{#281}
Cr-Branched-From: 939b32ee5ba05c396eef3fd992822fcca9a2e262-refs/heads/master@{#454471}

[modify] https://crrev.com/90d81c5653dc958f3f9699da7e555090ee8612ce/components/translate/core/language_detection/chinese_script_classifier.cc
[modify] https://crrev.com/90d81c5653dc958f3f9699da7e555090ee8612ce/components/translate/core/language_detection/chinese_script_classifier.h

Issue 702812 has been merged into this issue.
Labels: Hotlist-ConOps

Comment 27 by riesa@chromium.org, Apr 10 2017

Status: Fixed (was: Started)
Project Member

Comment 28 by bugdroid1@chromium.org, Apr 11 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/deps/icu.git/+/d5c238dcc2e801210749f3bb421b9227fe1c0948

commit d5c238dcc2e801210749f3bb421b9227fe1c0948
Author: Jungshik Shin <jshin@chromium.org>
Date: Tue Apr 11 21:58:59 2017

Update trim_data to deal with locale fallback failure for units

Delete empty units,units{Narrow,Short} blocks after trimming units data.
Empty units* blocks in en_GB and a few other locales after trimming
causes ICU to fail to fall back to get the duration data for those
locales.

In addition, fix source/data/translit/root_subset.txt. Rule*Ids block has
to be present even though it's empty. When dropping Hans-Hant transform
rules, root_subset.txt was changed to be completely empty, which broke
"components_unittests --g_test_filter=AutofillProfileComparato*" .

With these changes, regenerate ICU data files. The size is slightly smaller.

android/icudtl.dat  6573872 => 6573792
common/icudt*dat    10130560 => 10130480

BUG= 707515 , 677043 , 684609 
TEST=components_unittests --gtest_filter=AutofillProfileComparato*
TEST=ui_base_unittests --gtest_filter=L10nUtilTest.TimeDurationForm*
R=derat@chromium.org

Review-Url: https://codereview.chromium.org/2812943003 .

[modify] https://crrev.com/d5c238dcc2e801210749f3bb421b9227fe1c0948/android/icudtl.dat
[modify] https://crrev.com/d5c238dcc2e801210749f3bb421b9227fe1c0948/common/icudtb.dat
[modify] https://crrev.com/d5c238dcc2e801210749f3bb421b9227fe1c0948/common/icudtl.dat
[modify] https://crrev.com/d5c238dcc2e801210749f3bb421b9227fe1c0948/scripts/trim_data.sh
[modify] https://crrev.com/d5c238dcc2e801210749f3bb421b9227fe1c0948/source/data/translit/root_subset.txt

Project Member

Comment 29 by bugdroid1@chromium.org, Apr 12 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/cf65b91bc7496b3dd2c958c890984e73c26ad22c

commit cf65b91bc7496b3dd2c958c890984e73c26ad22c
Author: jshin <jshin@chromium.org>
Date: Wed Apr 12 00:34:12 2017

Roll third_party/icu from 450be73 to b34251f

http://chromium.googlesource.com/chromium/deps/icu.git/+log/450be73..b34251f

Changes include (along with minor data build script updates):

  1. f0449ad Update IANA timezone db to 2017a from 2016i
  2. c781b5f Drops unused Hans-Hant ICU transliteration data.
     ( cuts down the ICU data file size by ~25kB.)
  3. d5c238d Update trim_data to deal with locale fallback failure for units
     and fix root_subset.txt in ICU's transliteration configuration.
  4. b34251f Update timezone db to 2017b

BUG= 684609 ,473288, 707515 , 677043 
TEST=components_unittests --gtest_filter=AutofillProfileComparato*
TEST=ui_base_unittests --gtest_filter=L10nUtilTest.TimeDurationForm*
TEST=For timezone change test, see CL #1 and #4 TEST lines.
TBR=riesa@chromium.org,derat@chromium.org

Review-Url: https://codereview.chromium.org/2755963002
Cr-Commit-Position: refs/heads/master@{#463856}

[modify] https://crrev.com/cf65b91bc7496b3dd2c958c890984e73c26ad22c/DEPS
[modify] https://crrev.com/cf65b91bc7496b3dd2c958c890984e73c26ad22c/ui/base/l10n/l10n_util_unittest.cc

Project Member

Comment 30 by bugdroid1@chromium.org, Apr 20 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chrome/tools/buildspec/+/cd0a1e71a466bf114861203e987fb7ed07e5efd6

commit cd0a1e71a466bf114861203e987fb7ed07e5efd6
Author: Jungshik Shin <jungshik@google.com>
Date: Thu Apr 20 19:47:02 2017

Components: -UI>Browser>Translate UI>Browser>Language>Translate

Sign in to add a comment