New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 732026 link

Starred by 2 users

Issue metadata

Status: Duplicate
Merged: issue 911896
Owner: ----
Closed: Dec 10
Cc:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 3
Type: Bug



Sign in to add a comment

Consider switching to std::u16string and char16_t from base::string16 and 'wchar_t/uint16_t'.

Project Member Reported by js...@chromium.org, Jun 10 2017

Issue description

Now that all the compilers used by Chromium support std::u16string, char16_t and u"foobar", it's time to see how much work it is to switch to std::u16string and char16_t*.  

chromium-dev discussion thread: https://goo.gl/1s8cMl
 

Currently, base::string16 is defined to be 

1) std::wstring  where wchar_t is 16-bit (Windows)
2) basic_string<uint16_t, string16_trait..>  elsewhere

And, char16 is defined to be wchar_t (Win) and uint16_t elsewhere. 

A major pain point is interactions wtih Win32 API taking and returning LP(C)WSTR ((const) wchar_t* ) because C++ std. made 'char16_t' a *distinct* type from wchar_t (even when wchar_t* is UTF-16) and from 'uint16_t'. 

Another is JNI on Android. jchar is uint16_t. 

One might think that it's a matter of adding reinterprete_cast<>. However,
reinterpret_cast<> does not solve the issue, either. (quoting Richard Smith@ from an internal mail thread; cc'd him too)

<quote>
The biggest barrier to this will be C++'s aliasing rules. It's not sufficient to merely permit implicit conversions between these types, since accessing an element of a wchar_t array via a char16_t* or uint16_t* will result in undefined behavior, and not just in principle: implementations in practice will reorder loads and stores to wchar_t's and uint16_t's on the basis that they cannot alias: https://godbolt.org/g/bp7EDC
</quote>

One relief is that visual Studio team would not utilize this aliasing in VC++. So, we can use reinterpret_cast<> on Windows + VS.  For other compilers + platforms, anti-aliasing barrier has to be set up. 

ICU 59 made a switch to char16_t from wchar_t/uint16_t and introduced char16ptr class to take care of this by adding an aliasing-barrier before reinterpret_cast<>. [1] And, UnicodeString got a few overloading constructors to accept wchar_t* and uint16_t*. [2]
These two made upgrading to ICU 59 rather smooth with a "small" number of changes in Chrome. [3]

We can do something similar. 

Anyway, I made a test CL to see how bad it is. 

Linux was pretty easy except that I couldn't go all the way to "typedef std::u16string string16" because I'd not be able to overlod |std::ostream& operator <<(std::ostream&, const std::u16string&)|.

https://chromium-review.googlesource.com/c/530162/

I think most Android failures are due to |jchar*| in JNI. Mac compile failures are expected to be simple to fix. 

Windows is expected to be toughest. 


[1] https://cs.chromium.org/chromium/src/third_party/icu/source/common/unicode/char16ptr.h?q=char16ptr+package:%5Echromium$&l=1
And, C++ APIs that used to take (const) UChar* were changed to take ConstChar16Ptr or Char16Ptr instead. 

[2] https://cs.chromium.org/chromium/src/third_party/icu/source/common/unicode/unistr.h?rcl=dfa798fe694702b43a3debc3290761f22b1acaf8&l=3024  
[3] https://bugs.chromium.org/p/chromium/issues/detail?id=693640
https://bugs.chromium.org/p/v8/issues/detail?id=6062


 

Comment 1 by js...@chromium.org, Aug 23 2017

As for the optimization exploiting aliasing rules (pointers to distinct types can be assumed to be stored at different locations),  it turned out that Chrome currently disables it. -fno-strict-aliasing is explicitly specified for gcc/clang.   MSVC is known (at least so far) NOT to exploit the aliasing rule for the optimization.

See  bug 32204 .  So does v8.  (learned it from  bug v8:6487  )

OTOH, Chromium has base/bit_cast.h to strive to work correctly in the 'strict aliasing' world as well. 
Mergedinto: 911896
Status: Duplicate (was: Untriaged)

Sign in to add a comment