Distinguish zh-Hans vs zh-Hant |
|||||||||||||
Issue descriptionThe CLD3 LangID model replacing CLD2 recognizes Chinese text but does not distinguish between Simplified and Traditional text. This does not affect translation results, but does affect source language reporting in the UI since there is no "Chinese" option just "Chinese (Simplified)" and "Chinese (Traditional)". Since changing the UI and related abstractions is a rather high-complexity project touching code in C++, Java (Android), and Objective C (iOS), we opt instead for a simpler solution: 1. Include the Chinese Hans-Hant transliteration data shipped with the standard ICU distribution, but not currently included in Chromium. 2. Using the data from (1), run a deterministic Chinese script classifier to determine Simplified or Traditional for correct reporting in the UI.
,
Feb 13 2017
,
Feb 15 2017
,
Feb 21 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/deps/icu.git/+/450be73c9ee8ae29d43d4fdc82febb2a5f62bfb5 commit 450be73c9ee8ae29d43d4fdc82febb2a5f62bfb5 Author: Jungshik Shin <jshin@chromium.org> Date: Tue Feb 21 18:48:21 2017 Adds Hans-Hant transliterators and drops el-Upper from ICU data. Size increase for affected data files is as follows: android/icudtl.dat 6573776 -> 6610128 bytes common/icudtb.dat 10130464 -> 10166816 bytes common/icudtl.dat 10130464 -> 10166816 bytes This CL supercedes CL 2328013002. CL by riesa@chromium.org. BUG= 684609 R=jshin@chromium.org Review-Url: https://codereview.chromium.org/2652023002 . [modify] https://crrev.com/450be73c9ee8ae29d43d4fdc82febb2a5f62bfb5/README.chromium [modify] https://crrev.com/450be73c9ee8ae29d43d4fdc82febb2a5f62bfb5/android/icudtl.dat [modify] https://crrev.com/450be73c9ee8ae29d43d4fdc82febb2a5f62bfb5/common/icudtb.dat [modify] https://crrev.com/450be73c9ee8ae29d43d4fdc82febb2a5f62bfb5/common/icudtl.dat [modify] https://crrev.com/450be73c9ee8ae29d43d4fdc82febb2a5f62bfb5/scripts/data_files_to_preserve.txt [modify] https://crrev.com/450be73c9ee8ae29d43d4fdc82febb2a5f62bfb5/scripts/make_data.sh [delete] https://crrev.com/ec5152fccfdf72281af53f863e3859c20f409153/source/data/translit/css3transform.txt [add] https://crrev.com/450be73c9ee8ae29d43d4fdc82febb2a5f62bfb5/source/data/translit/root_subset.txt
,
Feb 22 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/af9b038a6bf7758c74df279bc15a77a907ee61af commit af9b038a6bf7758c74df279bc15a77a907ee61af Author: riesa <riesa@chromium.org> Date: Wed Feb 22 19:25:05 2017 Roll third_party/icu from 9cd2828 to 450be73 http://chromium.googlesource.com/chromium/deps/icu.git/+log/9cd2828..450be73 Two changes: 1. 450be73 Adds Hans-Hant transliterators and drops el-Upper from ICU data. 2. ec5152f Make two icu fuzz targets more useful. BUG= 684609 Review-Url: https://codereview.chromium.org/2705303003 Cr-Commit-Position: refs/heads/master@{#452162} [modify] https://crrev.com/af9b038a6bf7758c74df279bc15a77a907ee61af/DEPS
,
Mar 8 2017
,
Mar 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/87c6f931ecc4e25ce694d0c4039d4acdae572b60 commit 87c6f931ecc4e25ce694d0c4039d4acdae572b60 Author: riesa <riesa@chromium.org> Date: Wed Mar 08 05:04:13 2017 Adds ChineseScriptClassifier to predict zh-Hant or zh-Hans for input detected as zh. BUG= 684609 Review-Url: https://codereview.chromium.org/2732023003 Cr-Commit-Position: refs/heads/master@{#455383} [modify] https://crrev.com/87c6f931ecc4e25ce694d0c4039d4acdae572b60/components/translate/core/language_detection/BUILD.gn [add] https://crrev.com/87c6f931ecc4e25ce694d0c4039d4acdae572b60/components/translate/core/language_detection/chinese_script_classifier.cc [add] https://crrev.com/87c6f931ecc4e25ce694d0c4039d4acdae572b60/components/translate/core/language_detection/chinese_script_classifier.h [add] https://crrev.com/87c6f931ecc4e25ce694d0c4039d4acdae572b60/components/translate/core/language_detection/chinese_script_classifier_test.cc [modify] https://crrev.com/87c6f931ecc4e25ce694d0c4039d4acdae572b60/components/translate/core/language_detection/language_detection_util.cc
,
Mar 8 2017
,
Mar 9 2017
This bug requires manual review: DEPS changes referenced in bugdroid comments. Please contact the milestone owner if you have questions. Owners: amineer@(clank), cmasso@(bling), bhthompson@(cros), govind@(desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Mar 10 2017
Before we approve merge to M58, could you please confirm change is well baked/verified in Canary, having enough automation coverage and will be a safe merge? Thank you.
,
Mar 10 2017
,
Mar 11 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/062df28033e325db34be74c68da1f2bec888229d commit 062df28033e325db34be74c68da1f2bec888229d Author: riesa <riesa@chromium.org> Date: Sat Mar 11 01:12:57 2017 Implements ChineseScriptClassifier functionality without icu::Transliterator BUG= 684609 Review-Url: https://codereview.chromium.org/2743843002 Cr-Commit-Position: refs/heads/master@{#456243} [modify] https://crrev.com/062df28033e325db34be74c68da1f2bec888229d/components/translate/core/language_detection/chinese_script_classifier.cc [modify] https://crrev.com/062df28033e325db34be74c68da1f2bec888229d/components/translate/core/language_detection/chinese_script_classifier.h
,
Mar 11 2017
Latest patch addresses a memory regression reported on Android when visiting Chinese websites. We can and should wait a few days to ensure the perf bots are happy now.
,
Mar 12 2017
Re #13: Sure, please update the bug with perf bots result. Also before we approve merge to M58, could you please confirm change is well baked/verified in Canary, having enough automation coverage and will be a safe merge? Thank you
,
Mar 15 2017
Please look comments #14 - we're promoting M58 to Beta on Thursday, so Merge will have to be in by Wednesday 4PM.
,
Mar 15 2017
Android memory regression fixed and verfied: https://chromeperf.appspot.com/report?sid=9d873e3a7bfcae3cc105713f6fcecc35af5f5dcdb34458491991b852fe682cf8&rev=455400
,
Mar 15 2017
Approved - please merge to M58 Branch 3029
,
Mar 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/deps/icu.git/+/c781b5f673cb9f87cfaf0433af9dba54948dd9b3 commit c781b5f673cb9f87cfaf0433af9dba54948dd9b3 Author: Jungshik Shin <jshin@chromium.org> Date: Wed Mar 15 23:28:57 2017 Drops unused Hans-Hant ICU transliteration data. Summary of data size decrease: android/icudtl.dat 6610128 -> 6573024 bytes common/icudtb.dat 10166816 -> 10129712 bytes common/icudtl.dat 10166816 -> 10129712 bytes Patch by riesa@chromium.org BUG= 684609 R=jshin@chromium.org Review-Url: https://codereview.chromium.org/2747173004 . [modify] https://crrev.com/c781b5f673cb9f87cfaf0433af9dba54948dd9b3/android/icudtl.dat [modify] https://crrev.com/c781b5f673cb9f87cfaf0433af9dba54948dd9b3/common/icudtb.dat [modify] https://crrev.com/c781b5f673cb9f87cfaf0433af9dba54948dd9b3/common/icudtl.dat
,
Mar 16 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/deps/icu.git/+/48741911c5ece422d87d44286bbbf7123bcfe5e5 commit 48741911c5ece422d87d44286bbbf7123bcfe5e5 Author: Jungshik Shin <jshin@chromium.org> Date: Thu Mar 16 00:23:23 2017 Update root_subset.txt The file was mistakenly dropped from https://codereview.chromium.org/2747173004/ BUG= 684609 TBR=riesa@chromium.org Review-Url: https://codereview.chromium.org/2750943005 . [modify] https://crrev.com/48741911c5ece422d87d44286bbbf7123bcfe5e5/source/data/translit/root_subset.txt
,
Mar 16 2017
Since this merge missed the beta cut on Wednesday, is there a new branch # to use for next beta?
,
Mar 17 2017
,
Mar 17 2017
Your change is approved for M58. Please merge( Branch : 3029) ASAP so that it will be picked up for next Beta Release, RC cut on (Monday-03/20) at 5.00 PM PST.
,
Mar 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c5c460b93c3ec0575255bcd9148787901c2e9617 commit c5c460b93c3ec0575255bcd9148787901c2e9617 Author: Rouslan Solomakhin <rouslan@chromium.org> Date: Fri Mar 17 23:05:01 2017 [Merge M-58] Adds ChineseScriptClassifier to predict zh-Hant or zh-Hans for input detected as zh. BUG= 684609 Review-Url: https://codereview.chromium.org/2732023003 Cr-Commit-Position: refs/heads/master@{#455383} (cherry picked from commit 87c6f931ecc4e25ce694d0c4039d4acdae572b60) Review-Url: https://codereview.chromium.org/2756313002 . Cr-Commit-Position: refs/branch-heads/3029@{#280} Cr-Branched-From: 939b32ee5ba05c396eef3fd992822fcca9a2e262-refs/heads/master@{#454471} [modify] https://crrev.com/c5c460b93c3ec0575255bcd9148787901c2e9617/components/translate/core/language_detection/BUILD.gn [add] https://crrev.com/c5c460b93c3ec0575255bcd9148787901c2e9617/components/translate/core/language_detection/chinese_script_classifier.cc [add] https://crrev.com/c5c460b93c3ec0575255bcd9148787901c2e9617/components/translate/core/language_detection/chinese_script_classifier.h [add] https://crrev.com/c5c460b93c3ec0575255bcd9148787901c2e9617/components/translate/core/language_detection/chinese_script_classifier_test.cc [modify] https://crrev.com/c5c460b93c3ec0575255bcd9148787901c2e9617/components/translate/core/language_detection/language_detection_util.cc
,
Mar 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/90d81c5653dc958f3f9699da7e555090ee8612ce commit 90d81c5653dc958f3f9699da7e555090ee8612ce Author: Rouslan Solomakhin <rouslan@chromium.org> Date: Fri Mar 17 23:06:29 2017 [Merge M-58] Implements ChineseScriptClassifier functionality without icu::Transliterator BUG= 684609 Review-Url: https://codereview.chromium.org/2743843002 Cr-Commit-Position: refs/heads/master@{#456243} (cherry picked from commit 062df28033e325db34be74c68da1f2bec888229d) Review-Url: https://codereview.chromium.org/2759813002 . Cr-Commit-Position: refs/branch-heads/3029@{#281} Cr-Branched-From: 939b32ee5ba05c396eef3fd992822fcca9a2e262-refs/heads/master@{#454471} [modify] https://crrev.com/90d81c5653dc958f3f9699da7e555090ee8612ce/components/translate/core/language_detection/chinese_script_classifier.cc [modify] https://crrev.com/90d81c5653dc958f3f9699da7e555090ee8612ce/components/translate/core/language_detection/chinese_script_classifier.h
,
Apr 4 2017
Issue 702812 has been merged into this issue.
,
Apr 4 2017
,
Apr 10 2017
,
Apr 11 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/deps/icu.git/+/d5c238dcc2e801210749f3bb421b9227fe1c0948 commit d5c238dcc2e801210749f3bb421b9227fe1c0948 Author: Jungshik Shin <jshin@chromium.org> Date: Tue Apr 11 21:58:59 2017 Update trim_data to deal with locale fallback failure for units Delete empty units,units{Narrow,Short} blocks after trimming units data. Empty units* blocks in en_GB and a few other locales after trimming causes ICU to fail to fall back to get the duration data for those locales. In addition, fix source/data/translit/root_subset.txt. Rule*Ids block has to be present even though it's empty. When dropping Hans-Hant transform rules, root_subset.txt was changed to be completely empty, which broke "components_unittests --g_test_filter=AutofillProfileComparato*" . With these changes, regenerate ICU data files. The size is slightly smaller. android/icudtl.dat 6573872 => 6573792 common/icudt*dat 10130560 => 10130480 BUG= 707515 , 677043 , 684609 TEST=components_unittests --gtest_filter=AutofillProfileComparato* TEST=ui_base_unittests --gtest_filter=L10nUtilTest.TimeDurationForm* R=derat@chromium.org Review-Url: https://codereview.chromium.org/2812943003 . [modify] https://crrev.com/d5c238dcc2e801210749f3bb421b9227fe1c0948/android/icudtl.dat [modify] https://crrev.com/d5c238dcc2e801210749f3bb421b9227fe1c0948/common/icudtb.dat [modify] https://crrev.com/d5c238dcc2e801210749f3bb421b9227fe1c0948/common/icudtl.dat [modify] https://crrev.com/d5c238dcc2e801210749f3bb421b9227fe1c0948/scripts/trim_data.sh [modify] https://crrev.com/d5c238dcc2e801210749f3bb421b9227fe1c0948/source/data/translit/root_subset.txt
,
Apr 12 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/cf65b91bc7496b3dd2c958c890984e73c26ad22c commit cf65b91bc7496b3dd2c958c890984e73c26ad22c Author: jshin <jshin@chromium.org> Date: Wed Apr 12 00:34:12 2017 Roll third_party/icu from 450be73 to b34251f http://chromium.googlesource.com/chromium/deps/icu.git/+log/450be73..b34251f Changes include (along with minor data build script updates): 1. f0449ad Update IANA timezone db to 2017a from 2016i 2. c781b5f Drops unused Hans-Hant ICU transliteration data. ( cuts down the ICU data file size by ~25kB.) 3. d5c238d Update trim_data to deal with locale fallback failure for units and fix root_subset.txt in ICU's transliteration configuration. 4. b34251f Update timezone db to 2017b BUG= 684609 ,473288, 707515 , 677043 TEST=components_unittests --gtest_filter=AutofillProfileComparato* TEST=ui_base_unittests --gtest_filter=L10nUtilTest.TimeDurationForm* TEST=For timezone change test, see CL #1 and #4 TEST lines. TBR=riesa@chromium.org,derat@chromium.org Review-Url: https://codereview.chromium.org/2755963002 Cr-Commit-Position: refs/heads/master@{#463856} [modify] https://crrev.com/cf65b91bc7496b3dd2c958c890984e73c26ad22c/DEPS [modify] https://crrev.com/cf65b91bc7496b3dd2c958c890984e73c26ad22c/ui/base/l10n/l10n_util_unittest.cc
,
Apr 20 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/chrome/tools/buildspec/+/cd0a1e71a466bf114861203e987fb7ed07e5efd6 commit cd0a1e71a466bf114861203e987fb7ed07e5efd6 Author: Jungshik Shin <jungshik@google.com> Date: Thu Apr 20 19:47:02 2017
,
Apr 27 2017
|
|||||||||||||
►
Sign in to add a comment |
|||||||||||||
Comment 1 by riesa@chromium.org
, Jan 24 2017