New issue
Advanced search Search tips

Issue 863749 link

Starred by 2 users

Issue metadata

Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Chrome , Mac , Fuchsia
Pri: 3
Type: Bug



Sign in to add a comment

icu: sync up included locale variants across locale categories as much as possible (e.g. currency, locale, lang)

Project Member Reported by js...@chromium.org, Jul 15

Issue description

To keep the overall binary size under control, Chrome's ICU data is heavily customized to drop what's deemed unnecessary data. 

Some of dropped entries have been added back as time goes on (a lot of times, without actual size impact because other parts of ICU data got smaller or other unnecessary data entries were discovered and dropped, etc).  

When adding more locales to support, not all locale categories got the same treatment. 

For instance, 'locale' category has all the variants to pt-*, but currency category does not. 

Compare these two files:

https://cs.chromium.org/chromium/src/third_party/icu/source/data/locales/reslocal.mk?g=0

https://cs.chromium.org/chromium/src/third_party/icu/source/data/curr/reslocal.mk  

The former has a lot of pt-* but the latter has only a couple of pt-*. The same is true of en-*, fr-*, etc. 

A recent update to 62.1 cuts down the ICU data size by ~ 50 KiB. Adding the missing locales for curr is not likely to increase the data size relative to Chrome's ICU 61.1 data because most of missing locale variants in curr are very small. 

 
Cc: ftang@chromium.org gsat...@chromium.org js...@chromium.org
 Issue 900232  has been merged into this issue.
 bug 900232  is about currency for es-* variants. 

Status: Started (was: Assigned)
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/deps/icu.git/+/797b7c7359c0491c3d53a07f00d0815ca4247479

commit 797b7c7359c0491c3d53a07f00d0815ca4247479
Author: Jungshik Shin <jshin@chromium.org>
Date: Sat Nov 17 23:11:31 2018

Add more locale variants

Make the locale variant support more uniform across locale
categories. What's supported in the main locale categories (data/locales)
are added to data/{unit,curr} as well. The list of locales in data/zone
is not yet updated.

Moreover, sr and sw variants are added to data/{locales,unit,curr}.

The cast removal list is simplified using 'glob' pattern instead of
listing individual files for curr and unit categories.

The data size is still under control on most platforms. They're actually
smaller than the first ICU 63.1 update thanks to additional trimming in
zone/unit categories except on desktop (59kB increase).

 Initial 63.1  This CL   Platform
  6375056       6353648  android
  4916608       4745488  cast
 10268240      10324816  common
   880512        880928  flutter
  6361376       6313376  ios


TBR=almasrymina@chromium.org,ftang@chromium.org,gsathya@chromium.org
Change-Id: I21dc5ec752795f485cfeb64ab1eb7eb8b23f3991
Bug:   863739  
Test: {base,components,net}_unittests, blink, v8(intl/*,test262/intl402)
Reviewed-on: https://chromium-review.googlesource.com/c/1335789
Reviewed-by: Jungshik Shin <jshin@chromium.org>

[modify] https://crrev.com/797b7c7359c0491c3d53a07f00d0815ca4247479/android/icudtl.dat
[modify] https://crrev.com/797b7c7359c0491c3d53a07f00d0815ca4247479/cast/cast-removed-resources.txt
[modify] https://crrev.com/797b7c7359c0491c3d53a07f00d0815ca4247479/cast/icudtl.dat
[modify] https://crrev.com/797b7c7359c0491c3d53a07f00d0815ca4247479/cast/patch_locale.sh
[modify] https://crrev.com/797b7c7359c0491c3d53a07f00d0815ca4247479/common/icudtb.dat
[modify] https://crrev.com/797b7c7359c0491c3d53a07f00d0815ca4247479/common/icudtl.dat
[modify] https://crrev.com/797b7c7359c0491c3d53a07f00d0815ca4247479/flutter/flutter-removed-resources.txt
[modify] https://crrev.com/797b7c7359c0491c3d53a07f00d0815ca4247479/flutter/icudtl.dat
[modify] https://crrev.com/797b7c7359c0491c3d53a07f00d0815ca4247479/ios/icudtl.dat
[modify] https://crrev.com/797b7c7359c0491c3d53a07f00d0815ca4247479/patches/locale_google.patch
[modify] https://crrev.com/797b7c7359c0491c3d53a07f00d0815ca4247479/scripts/trim_data.sh
[modify] https://crrev.com/797b7c7359c0491c3d53a07f00d0815ca4247479/source/data/curr/reslocal.mk
[modify] https://crrev.com/797b7c7359c0491c3d53a07f00d0815ca4247479/source/data/locales/reslocal.mk
[modify] https://crrev.com/797b7c7359c0491c3d53a07f00d0815ca4247479/source/data/unit/reslocal.mk

Project Member

Comment 4 by bugdroid1@chromium.org, Nov 18

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7344a24efd2a222d6d45b709767cd4d050af10a3

commit 7344a24efd2a222d6d45b709767cd4d050af10a3
Author: Jungshik Shin <jshin@chromium.org>
Date: Sun Nov 18 01:37:31 2018

Roll ICU to 407b393 from 45f655f

  https://chromium.googlesource.com/chromium/deps/icu.git/+log/45f655f..407b393

The following changes are included:

  407b393 Update IANA timezone db to 2018g
  797b7c7 Add more locale variants
  d13a96f Unicoset Perf fix #2
  ecae5c0 Adjust calendar locale data trimming on Andorid

The ICU data files for mobile platforms got smaller. For desktop,
the size got increased by ~59 kb for more locale variant support.

TBR=ftang@chromium.org

Bug: 473288,863749, 899983 , 901532 , v8:8432 
Test: v8: intl/regress-8413*
Test: Android webview start-up perf and Windows perf graph
Test: See  crbug.com/900232 
Test: See ICU 407b393 CL description for tz test.
Change-Id: Id3ec92cbbe82275b120ffa9747e110db598905dc
Reviewed-on: https://chromium-review.googlesource.com/c/1341468
Commit-Queue: Jungshik Shin <jshin@chromium.org>
Reviewed-by: Jungshik Shin <jshin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#609129}
[modify] https://crrev.com/7344a24efd2a222d6d45b709767cd4d050af10a3/DEPS

Hello, news about this?
Labels: OS-Android OS-Chrome OS-Fuchsia OS-Linux OS-Mac OS-Windows
See the submitted CLs above. Quite a lot of locale variants are added. 

One more item to consider. 

There are quite many locales for which collation support can be added relatively cheaply. 

A lot of locales do not require any data at all. For instance, source/data/coll/nl.txt is just this:
// License & terms of use: http://www.unicode.org/copyright.html#License
nl{
    Version{"2.1.19.14"}
}
So, I'll just add them (it'll increase the data size slightly)
as long as they're in the Chrome desktop's supported UI lang list

As for 'no' in collation, I can add it with very little data impact. coll/no.txt is just an alias to nb
// License & terms of use: http://www.unicode.org/copyright.html#License
no{
    "%%ALIAS"{"nb"}
}


> Hello, news about this?

Any particular locale variants and locale categories you're interested in?  

I have an interest in locale es_CO.

Thank you!

Sign in to add a comment