Prepare for ICU's switch to char16_t from uint16_t/WChar for UChar. |
|||||||||
Issue descriptionICU 59 is going to use char16_t for UChar. It used to be 'uint16_t' on Mac/Linux/Android and 'wchar_t/WChar' on Windows (the same as Chromium's char16). From http://source.icu-project.org/repos/icu/tags/milestone-59-0-1/icu4c/readme.html#News , <quote> The type of UChar is char16_t, on all platforms. This is no longer configurable. This change may affect Windows code with assumptions that UChar was wchar_t, and thus type compatible with other Windows code expecting wchar_t strings. </quote> In quite many places, we construct string16 out of a UChar* buffer (extracted from UnicodeString or other ways) without any casting (length out of UnicodeString is cast to size_t). Because char16_t and 'unsigned short' are distinct types, we need to reinterpret_cast one to the other. The same can be said of String16Piece16 and perhaps WebString. Instead of sprinkling our tree with reinterpret_cast, we'd better centralize the conversion (between icu::UnicodeString and string16/StringPiece16) by adding helper functions or something. I'm aware of a discussion about switching to char16_t (and u16string), but it's on hold partly because of WChar* in Windows API. ( https://goo.gl/p1cEBi ).
,
Feb 17 2017
And, there's base/third_party/icu/icu_utf.h
,
Feb 20 2017
,
Feb 21 2017
ICU will add function overloading to take 'const WChar*' (for UnicodeString) on Windows so that Windows is fine in 'to UnicodeString' direction (char16 in Chromium is WChar.) Perhaps, ICU can do the same for char16_t in a couple of ctors taking 'const UChar*'. Then, 'to UnicodeString' direction would be mostly fine.
,
Feb 28 2017
ICU is going to add a little adaptor class to make it simpler to call ICU's C++ API that takes UChar*. However, Blink uses C API in many places. Also, we still have to find a way to deal with the other direction (ICU => Blink or Chromium). And, casting between char16_t* and uint16_t* is 'undefined' in that they're unrelated types and a pointer to one can be optimzed 'away' independent of a pointer to the other at a complier's discretion. MSVC will never do that between char16_t* and WChar*, btw.
,
Mar 1 2017
There's an experimental ICU branch set up by Markus to ease the pain as much as possible (even for C API callers). Once it's ready, I'll that branch through trybots.
,
Mar 2 2017
I wrote what the current plan is and how to deal with breakages: http://site.icu-project.org/download/59#TOC-ICU4C-char16_t
,
Mar 8 2017
Thank you, Markus ! https://codereview.chromium.org/2740673002# : I send this CL to trybot dryrun to see how many failures we get. This uses http://source.icu-project.org/repos/icu/branches/markus/ucharptr2/ along with BUILD.gn and other files updated as necessary for Chromium/Blink/v8 builds.
,
Mar 8 2017
A couple of errors I got with UCHAR_TYPE defined to be uint16_t and wchar_t for 'outside ICU' C API usage are shown below.
-----------------------
In file included from ../../base/i18n/rtl.cc:27:
In file included from ../../third_party/icu/source/i18n/unicode/coll.h:61:
../../third_party/icu/source/common/unicode/normlzr.h:799:10: error: no matching
function for call to 'unorm_compare_59'
return unorm_compare(s1.getBuffer(), s1.length(),
^~~~~~~~~~~~~
unorm_compare_59
^~~~~~~~~~~~~~~~
../../third_party/icu/source/common/unicode/unorm2.h:620:1: note: candidate function not viable: no known conversion from 'const char16_t *' to 'const UChar *' (aka 'const unsigned short *') for 1st argument
unorm_compare(const UChar *s1, int32_t length1,
---------------------
Cases like this (from icu/source/common/unicode/unistr.h) break:
inline UBool
UnicodeString::startsWith(ConstChar16Ptr srcChars, int32_t srcLength) const {
if(srcLength < 0) {
srcLength = u_strlen(srcChars);
}
return doCompare(0, srcLength, srcChars, 0, srcLength) == 0;
}
u_strlen_59
^~~~~~~~~~~
../../third_party/icu/source/common/unicode/unistr.h:56:1: note: candidate function not viable: no known conversion from 'icu_59::ConstChar16Ptr' to 'const UChar *' (aka 'const unsigned short *') for 1st argument
u_strlen(const UChar *s);
----------------
Adding "operator uint16_t *() const" to Char16Ptr leads to another issue - ambiguity in UnicodeString ctor because there are ctor overloads with both uint16_t and char16_t.
,
Mar 8 2017
Thanks for testing Jungshik. I fixed the unistr.h/startsWith() issue this afternoon. I will fix the normlzr.h issue tomorrow. I had tried to change return pointers to pointer-wrapper values, with output conversions, but got the same ambiguity errors you found, and reinterpret_cast etc. of such values fail, so this "ucharptr2" branch retracts that.
,
Mar 8 2017
normlzr.h should be fixed too now. Please check for changes on my branch that you don't have yet. http://bugs.icu-project.org/trac/review/12992
,
Mar 8 2017
,
Mar 9 2017
When I tried Markus's latest branch from the upstream along with a relatively small number of changes (about two dozens of changes in Chrome, 1 change in Blink and 2 changes in v8), almost everything (v8, chrome, blink) seemed fine with UCHAR_TYPE set to uint16_t (non-Windows) and wchar_t (Windows) for *customers* of ICU (while UChar being char16_t inside ICU) until I hit upon an unexpected goma issue. With UCHAR_TYPE configured that way for non-ICU callers, somehow goma keeps failing. On my local machine, it keeps on going nonetheless and finishes a build (although extremely slower than usual). On trybots, that leads to a massive failure (some bots go through while most bots give up). See bug 683012 comment 49 https://bugs.chromium.org/p/chromium/issues/detail?id=683012#c49. So, I tried leaving UChar alone (i.e. it's char16_t everywhere). Then, goma is happy, but I hit upon an unexpected issue in Blink (wtf/...). In https://codereview.chromium.org/2500723002/, Blink's wtf::String et al got a new overload taking char16_t. When UChar is char16_t, there are two ctors taking the same type. Obviously, this can be resolved by a conditional (ICU-version-dependent) compilation. A real question is how many changes to be made in Blink with UChar being char16_t. We'll see.
,
Mar 9 2017
,
Mar 10 2017
> With UCHAR_TYPE configured that way for non-ICU callers, somehow goma keeps failing
Actually, goma "hates" ICU 59-to-be whether UChar is configured to be {wchar_t, uint16_t} or chart16_t for C API users of ICU (inside ICU, it's always char16_t). goma team has filed two bugs internally. (see bug 683012 comment 52).
,
Mar 14 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ba7249e429cce02e006b08b1298765699dd48ea2 commit ba7249e429cce02e006b08b1298765699dd48ea2 Author: thestig <thestig@chromium.org> Date: Tue Mar 14 22:59:34 2017 Roll DEPS for sfntly 64f7856..84f7108 84f7108 Fix 'INCLUDE WHAT YOU USE' violation (#73) ed46801 Merge pull request #69 from HalCanary/cpp_cleanup_part_1 5ae041c Sfntly C++ Cleanup b1fbadb Merge pull request #68 from jfkthame/patch-1 38cc8d2 Fix incorrect test reported in glyph_table.cc BUG= 693640 Review-Url: https://codereview.chromium.org/2745933002 Cr-Commit-Position: refs/heads/master@{#456878} [modify] https://crrev.com/ba7249e429cce02e006b08b1298765699dd48ea2/DEPS
,
Mar 15 2017
> goma "hates" ICU 59-to-be It turned out that 'goma' checks for the first 4 bytes of source files to see if there is any non-printable ASCII. ICU 59 source files have the copyright sign in the first line (the 4th character in the 1st line), which led goma to fail. ( b/36108313 ). With that taken care of, all bots are happy as long as compilation is concerned. https://codereview.chromium.org/2747973002/
,
Mar 23 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6d1b9d119afca4a0384aba439b398188c0f6827b commit 6d1b9d119afca4a0384aba439b398188c0f6827b Author: jshin <jshin@chromium.org> Date: Thu Mar 23 09:57:16 2017 Prepare Chromium and Blink for ICU 59 ICU 59 uses char16_t as UChar instead of {wchar_t, uint16_t}. As a result, char16_t is not compatible with char16 any more. When constructing string16 from UnicodeString/UChar buffer, we need to reinterpret_cast with a barrier (to avoid an anti-aliasing optimzation by some compilers). Add UnicodeStringToString16() to base/i18n that utilizes ICU 59-to-be's helper for the casting regardless of anti-aliasing optimization. And, refactor UnicodeString->string16->UTF8 string to UnicodeString->UTF8 in a few places. For ICU C API "clients", UChar will be configured to be {wchar_t, uint16_t} so that there's little to be changed. This was tested with an ICU branch with char16_t as UChar. http://source.icu-project.org/repos/icu/branches/markus/ucharptr2/ BUG= 693640 TEST=trybots are all green. Review-Url: https://codereview.chromium.org/2740673002 Cr-Commit-Position: refs/heads/master@{#459034} [modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/ash/common/system/date/date_view.cc [modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/base/BUILD.gn [modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/base/i18n/message_formatter.cc [modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/base/i18n/number_formatting.cc [modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/base/i18n/string_compare.cc [modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/base/i18n/time_formatting.cc [modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/base/i18n/time_formatting_unittest.cc [modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/base/i18n/timezone.cc [add] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/base/i18n/unicodestring.h [modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/chrome/browser/chromeos/system/timezone_util.cc [modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/chrome/browser/ui/webui/md_downloads/downloads_list_tracker.cc [modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/chromeos/settings/timezone_settings.cc [modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/components/autofill/core/browser/autofill_profile_comparator.cc [modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/components/autofill/core/browser/credit_card.cc [modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/components/bookmarks/browser/titled_url_index.cc [modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/content/browser/android/date_time_chooser_android.cc [modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/content/renderer/android/email_detector.cc [modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/ios/chrome/browser/notification_promo_unittest.cc [modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/net/ftp/ftp_util.cc [modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/third_party/WebKit/Source/core/html/forms/EmailInputType.cpp
,
May 9 2017
This looks fixed, is it?
,
May 9 2017
,
May 9 2017
Yes, fixed as of early March. I may have to fix up a few more places that may have 'regressed' since. We'll see once v8 issue is fixed so that I can send it to trybots across platforms. |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by js...@chromium.org
, Feb 17 2017