Now that all the compilers used by Chromium support std::u16string, char16_t and u"foobar", it's time to see how much work it is to switch to std::u16string and char16_t*.
chromium-dev discussion thread: https://goo.gl/1s8cMl
Currently, base::string16 is defined to be
1) std::wstring where wchar_t is 16-bit (Windows)
2) basic_string<uint16_t, string16_trait..> elsewhere
And, char16 is defined to be wchar_t (Win) and uint16_t elsewhere.
A major pain point is interactions wtih Win32 API taking and returning LP(C)WSTR ((const) wchar_t* ) because C++ std. made 'char16_t' a *distinct* type from wchar_t (even when wchar_t* is UTF-16) and from 'uint16_t'.
Another is JNI on Android. jchar is uint16_t.
One might think that it's a matter of adding reinterprete_cast<>. However,
reinterpret_cast<> does not solve the issue, either. (quoting Richard Smith@ from an internal mail thread; cc'd him too)
<quote>
The biggest barrier to this will be C++'s aliasing rules. It's not sufficient to merely permit implicit conversions between these types, since accessing an element of a wchar_t array via a char16_t* or uint16_t* will result in undefined behavior, and not just in principle: implementations in practice will reorder loads and stores to wchar_t's and uint16_t's on the basis that they cannot alias: https://godbolt.org/g/bp7EDC
</quote>
One relief is that visual Studio team would not utilize this aliasing in VC++. So, we can use reinterpret_cast<> on Windows + VS. For other compilers + platforms, anti-aliasing barrier has to be set up.
ICU 59 made a switch to char16_t from wchar_t/uint16_t and introduced char16ptr class to take care of this by adding an aliasing-barrier before reinterpret_cast<>. [1] And, UnicodeString got a few overloading constructors to accept wchar_t* and uint16_t*. [2]
These two made upgrading to ICU 59 rather smooth with a "small" number of changes in Chrome. [3]
We can do something similar.
Anyway, I made a test CL to see how bad it is.
Linux was pretty easy except that I couldn't go all the way to "typedef std::u16string string16" because I'd not be able to overlod |std::ostream& operator <<(std::ostream&, const std::u16string&)|.
https://chromium-review.googlesource.com/c/530162/
I think most Android failures are due to |jchar*| in JNI. Mac compile failures are expected to be simple to fix.
Windows is expected to be toughest.
[1] https://cs.chromium.org/chromium/src/third_party/icu/source/common/unicode/char16ptr.h?q=char16ptr+package:%5Echromium$&l=1
And, C++ APIs that used to take (const) UChar* were changed to take ConstChar16Ptr or Char16Ptr instead.
[2] https://cs.chromium.org/chromium/src/third_party/icu/source/common/unicode/unistr.h?rcl=dfa798fe694702b43a3debc3290761f22b1acaf8&l=3024
[3] https://bugs.chromium.org/p/chromium/issues/detail?id=693640
https://bugs.chromium.org/p/v8/issues/detail?id=6062
Comment 1 by js...@chromium.org
, Aug 23 2017