New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 693640 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 3
Type: Bug

Blocking:
issue 699469



Sign in to add a comment

Prepare for ICU's switch to char16_t from uint16_t/WChar for UChar.

Project Member Reported by js...@chromium.org, Feb 17 2017

Issue description

ICU 59 is going to use char16_t for UChar. It used to be 'uint16_t' on Mac/Linux/Android and 'wchar_t/WChar' on Windows (the same as Chromium's char16). 

From http://source.icu-project.org/repos/icu/tags/milestone-59-0-1/icu4c/readme.html#News , 

<quote>
The type of UChar is char16_t, on all platforms. This is no longer configurable. This change may affect Windows code with assumptions that UChar was wchar_t, and thus type compatible with other Windows code expecting wchar_t strings.
</quote>

In quite many places, we construct string16 out of a UChar* buffer (extracted from UnicodeString or other ways) without any casting (length out of UnicodeString is cast to size_t).  Because char16_t and 'unsigned short' are distinct types, we need to reinterpret_cast one to the other.

The same can be said of String16Piece16 and perhaps WebString.  

Instead of sprinkling our tree with reinterpret_cast, we'd better centralize the conversion (between icu::UnicodeString and string16/StringPiece16) by adding helper functions or something. 


I'm aware of a discussion about switching to char16_t (and u16string), but it's on hold partly because of WChar* in Windows API. ( https://goo.gl/p1cEBi ). 


 

Comment 1 by js...@chromium.org, Feb 17 2017

Description: Show this description

Comment 2 by js...@chromium.org, Feb 17 2017

And, there's base/third_party/icu/icu_utf.h 


Comment 3 by js...@chromium.org, Feb 20 2017

Description: Show this description

Comment 4 by js...@chromium.org, Feb 21 2017

Summary: Prepare for ICU's switch to char16_t from uint16_t/WChar for UChar. (was: Prepare for ICU's switch to char16_t from unsigned short for UChar. )
ICU will add function overloading to take 'const WChar*' (for UnicodeString) on Windows so that Windows is fine in 'to UnicodeString' direction (char16 in Chromium is WChar.)

Perhaps, ICU can do the same for char16_t in a couple of ctors taking 'const UChar*'.  Then, 'to UnicodeString' direction would be mostly fine. 

Comment 5 by js...@chromium.org, Feb 28 2017

Cc: aheninger@google.com
ICU is going to add a little adaptor class to make it simpler to call ICU's C++ API that takes UChar*.   However, Blink uses C API in many places. 
Also, we still have to find a way to deal with the other direction (ICU => Blink or Chromium).

And, casting between char16_t* and uint16_t* is 'undefined' in that they're unrelated types and  a pointer to one can be optimzed 'away' independent of a pointer to the other at a complier's discretion. 

MSVC will never do that between char16_t* and WChar*, btw. 

Comment 6 by js...@chromium.org, Mar 1 2017

There's an experimental ICU branch set up by Markus to ease the pain as much as possible (even for C API callers). Once it's ready, I'll that branch through trybots.  
I wrote what the current plan is and how to deal with breakages: http://site.icu-project.org/download/59#TOC-ICU4C-char16_t

Comment 8 by js...@chromium.org, Mar 8 2017

Thank you, Markus !


https://codereview.chromium.org/2740673002# : I send this CL to trybot dryrun to see how many failures we get. This uses http://source.icu-project.org/repos/icu/branches/markus/ucharptr2/ along with BUILD.gn and other files updated as necessary for Chromium/Blink/v8 builds.

Comment 9 by js...@chromium.org, Mar 8 2017

A couple of errors I got with UCHAR_TYPE defined to be uint16_t and wchar_t for 'outside ICU' C API usage are shown below.

-----------------------

In file included from ../../base/i18n/rtl.cc:27:
In file included from ../../third_party/icu/source/i18n/unicode/coll.h:61:
../../third_party/icu/source/common/unicode/normlzr.h:799:10: error: no matching
 function for call to 'unorm_compare_59'
  return unorm_compare(s1.getBuffer(), s1.length(),
         ^~~~~~~~~~~~~

unorm_compare_59
^~~~~~~~~~~~~~~~
../../third_party/icu/source/common/unicode/unorm2.h:620:1: note: candidate function not viable: no known conversion from 'const char16_t *' to 'const UChar *' (aka 'const unsigned short *') for 1st argument
unorm_compare(const UChar *s1, int32_t length1,

---------------------

Cases like this (from icu/source/common/unicode/unistr.h) break: 

inline UBool
UnicodeString::startsWith(ConstChar16Ptr srcChars, int32_t srcLength) const {
  if(srcLength < 0) {
    srcLength = u_strlen(srcChars);
  }
  return doCompare(0, srcLength, srcChars, 0, srcLength) == 0;
}



u_strlen_59
^~~~~~~~~~~
../../third_party/icu/source/common/unicode/unistr.h:56:1: note: candidate function not viable: no known conversion from 'icu_59::ConstChar16Ptr' to 'const UChar *' (aka 'const unsigned short *') for 1st argument
u_strlen(const UChar *s);

----------------

Adding "operator uint16_t *() const" to Char16Ptr leads to another issue - ambiguity in UnicodeString ctor because there are ctor overloads with both uint16_t and char16_t. 

Thanks for testing Jungshik. I fixed the unistr.h/startsWith() issue this afternoon. I will fix the normlzr.h issue tomorrow.

I had tried to change return pointers to pointer-wrapper values, with output conversions, but got the same ambiguity errors you found, and reinterpret_cast etc. of such values fail, so this "ucharptr2" branch retracts that.
normlzr.h should be fixed too now. Please check for changes on my branch that you don't have yet. http://bugs.icu-project.org/trac/review/12992
Blocking: 699469
Cc: kojii@chromium.org
Labels: OS-All
When I tried Markus's latest branch from the upstream along with a relatively small number of changes (about two dozens of  changes in Chrome, 1 change in Blink and 2 changes in v8), almost everything (v8, chrome, blink) seemed fine with UCHAR_TYPE set to uint16_t (non-Windows) and wchar_t (Windows) for *customers* of ICU (while UChar being char16_t inside ICU) until I hit upon an unexpected goma issue. 

With UCHAR_TYPE configured that way for non-ICU callers, somehow goma keeps failing. On my local machine, it keeps on going nonetheless and finishes a build (although extremely slower than usual). On trybots, that leads to a massive failure (some bots go through while most bots give up). See bug 683012 comment 49 https://bugs.chromium.org/p/chromium/issues/detail?id=683012#c49. 

So, I tried leaving  UChar alone (i.e. it's char16_t everywhere). Then, goma is happy, but I hit upon an unexpected issue in Blink (wtf/...). 

In https://codereview.chromium.org/2500723002/,  Blink's wtf::String et al got a new overload taking char16_t. When UChar is char16_t, there are two ctors taking the same type.   Obviously, this can be resolved by a conditional (ICU-version-dependent) compilation.  A real question is how many changes to be made in Blink with UChar being char16_t. We'll see. 
Cc: mscherer@google.com

Comment 15 by js...@chromium.org, Mar 10 2017

> With UCHAR_TYPE configured that way for non-ICU callers, somehow goma keeps failing

Actually, goma "hates" ICU 59-to-be whether UChar is configured to be {wchar_t, uint16_t} or chart16_t for C API users of ICU (inside ICU, it's always char16_t).  goma team has filed two bugs internally. (see bug 683012 comment 52). 
Project Member

Comment 16 by bugdroid1@chromium.org, Mar 14 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ba7249e429cce02e006b08b1298765699dd48ea2

commit ba7249e429cce02e006b08b1298765699dd48ea2
Author: thestig <thestig@chromium.org>
Date: Tue Mar 14 22:59:34 2017

Roll DEPS for sfntly 64f7856..84f7108

84f7108 Fix 'INCLUDE WHAT YOU USE' violation (#73)
ed46801 Merge pull request #69 from HalCanary/cpp_cleanup_part_1
5ae041c Sfntly C++ Cleanup
b1fbadb Merge pull request #68 from jfkthame/patch-1
38cc8d2 Fix incorrect test reported in glyph_table.cc

BUG= 693640 

Review-Url: https://codereview.chromium.org/2745933002
Cr-Commit-Position: refs/heads/master@{#456878}

[modify] https://crrev.com/ba7249e429cce02e006b08b1298765699dd48ea2/DEPS

Comment 17 by js...@chromium.org, Mar 15 2017

>  goma "hates" ICU 59-to-be

It turned out that 'goma' checks for the first 4 bytes of source files to see if there is any non-printable ASCII. ICU 59 source files have the copyright sign in the first line (the 4th character in the 1st line), which led goma to fail. ( b/36108313 ). With that taken care of, all bots are happy as long as compilation is concerned. https://codereview.chromium.org/2747973002/
Project Member

Comment 18 by bugdroid1@chromium.org, Mar 23 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6d1b9d119afca4a0384aba439b398188c0f6827b

commit 6d1b9d119afca4a0384aba439b398188c0f6827b
Author: jshin <jshin@chromium.org>
Date: Thu Mar 23 09:57:16 2017

Prepare Chromium and Blink for ICU 59

ICU 59 uses char16_t as UChar instead of {wchar_t, uint16_t}.
As a result, char16_t is not compatible with char16 any more.
When constructing string16 from UnicodeString/UChar buffer, we need to
reinterpret_cast with a barrier (to avoid an anti-aliasing optimzation
by some compilers).

Add UnicodeStringToString16() to base/i18n that utilizes ICU 59-to-be's
helper for the casting regardless of anti-aliasing optimization.

And, refactor UnicodeString->string16->UTF8 string to UnicodeString->UTF8
in a few places.

For ICU C API "clients", UChar will be configured to be {wchar_t, uint16_t}
so that there's little to be changed.

This was tested with an ICU branch with char16_t as UChar.

http://source.icu-project.org/repos/icu/branches/markus/ucharptr2/

BUG= 693640 
TEST=trybots are all green.

Review-Url: https://codereview.chromium.org/2740673002
Cr-Commit-Position: refs/heads/master@{#459034}

[modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/ash/common/system/date/date_view.cc
[modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/base/BUILD.gn
[modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/base/i18n/message_formatter.cc
[modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/base/i18n/number_formatting.cc
[modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/base/i18n/string_compare.cc
[modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/base/i18n/time_formatting.cc
[modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/base/i18n/time_formatting_unittest.cc
[modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/base/i18n/timezone.cc
[add] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/base/i18n/unicodestring.h
[modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/chrome/browser/chromeos/system/timezone_util.cc
[modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/chrome/browser/ui/webui/md_downloads/downloads_list_tracker.cc
[modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/chromeos/settings/timezone_settings.cc
[modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/components/autofill/core/browser/autofill_profile_comparator.cc
[modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/components/autofill/core/browser/credit_card.cc
[modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/components/bookmarks/browser/titled_url_index.cc
[modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/content/browser/android/date_time_chooser_android.cc
[modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/content/renderer/android/email_detector.cc
[modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/ios/chrome/browser/notification_promo_unittest.cc
[modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/net/ftp/ftp_util.cc
[modify] https://crrev.com/6d1b9d119afca4a0384aba439b398188c0f6827b/third_party/WebKit/Source/core/html/forms/EmailInputType.cpp

This looks fixed, is it?
Cc: drott@chromium.org
Status: Fixed (was: Assigned)
Yes, fixed as of early March. I may have to fix up a few more places that may have 'regressed' since. We'll see once v8 issue is fixed so that I can send it to trybots across platforms.

Sign in to add a comment