Set the source charset to UTF-8 in Visual C++ |
||||||
Issue descriptionNow that Visual Studio 2015 is required to compile Chromium on Windows and VS 2015 finally supports the source code encoding in UTF-8 without BOM, it's time to use that option. https://msdn.microsoft.com/en-us/library/mt708821.aspx Let's use either '/utf-8' or '/source-charset:utf-8'. Perhaps, the latter is safer. (I don't know the implication of '/execution-charset:utf-8' ). There might be some non-UTF-8 and non-ASCII files (mostly in comments). Turn '/source-charset:utf-8' on and those files will be identified. Then, we can get rid of a warning page about compiling Chromium/Blink on CJK Windows (to be precise, on Widnows with the default code page set to one of CJK legacy code pages)>
,
Aug 12 2016
,
Aug 13 2016
The biggest source of problems seems to be the copyright symbol (U+00A9). This shows up in 443 files encoded as a single 0xA9 byte instead of the 0xC2 0xA9 sequence that would be correct for UTF-8. Other problematic characters include Unicode Character 'SET TRANSMIT STATE' (U+0093) (aka left smart quote). Typical warnings are: third_party\pdfium\third_party\lcms2-2.6\src\cmstypes.c(910): warning C4828: The file contains a character starting at offset 0x7716 that is illegal in the current source character set (codepage 65001). This warning is frustrating because I have to load the file into VS in binary mode in order to find out what the bad character is. Most of the problems are in comments, but the warning doesn't indicate this. So far these appear to all be in third_party repos so it is probably best to start with just link, then just first-party Chromium, and then everything. crrev.com/2243093002 enables this flag for blink. PTAL? I'm not sure if that is a sufficient start.
,
Aug 14 2016
Thanks for taking a look. Your plan sounds good to me. As for non-UTF-8 / non-ASCII files, what I did in bug 454858 (obviously, I didn't go all the way through. there are lots of them as you found out) was to run 'iconv -f UTF-8' to list them up. Then, I replaced 'usual suspects' (copyright symbol, smart quotes, etc in ISO-8859-1 or Windows-1252) with their ASCII-equivalents (now, they can be replaced by their UTF-8 equivalents) in batch (with a script). After that, the number of files to touch would be much smaller (I hope). And, we can iterate with newly found (not-yet-replaced) 'offending' byte sequences. :-)
,
Aug 26 2016
https://chromiumcodereview.appspot.com/10797029 converted all files to utf-8 back then. Maybe we should just do that again?
,
Aug 26 2016
Agreed - converting all files to utf-8 either before adding the compiler switch or as part of adding the change that adds the compiler switch would be perfect. I haven't been looking at this lately because I have some higher profile bugs but I will get to it eventually. If somebody wants to do it first then just check with me and then grab it.
,
Aug 30 2016
crrev.com/2243093002 is my first stab at this, but I haven't looked at it for a week. Adding a reference here to tie things together. There are some useful comments on the CL, such us a suggestion that this should be applied to all of chromium_code instead of just blink.
,
Nov 8 2016
I'll see if I can get your CL landed, with it moved to chromium_code instead. I plan to take the approach of simply re-encoding everything as utf-8 if it's problematic.
,
Nov 8 2016
Thanks for grabbing this Scott. Going with chromium_code only sounds appropriate. I think you should definitely re-encode everything as utf-8 as well. If nothing else I hate the idea of telling the compiler that we are feeding it utf-8 and then feeding it text files that are a Franken-monster-mash of utf-8 and extended ASCII.
,
Nov 19 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c460f91295c10d709c0f743ba6a1e93748a320d7 commit c460f91295c10d709c0f743ba6a1e93748a320d7 Author: scottmg <scottmg@chromium.org> Date: Sat Nov 19 01:50:26 2016 Roll openmax_dl 57d33bee7..7acede9c src\third_party\openmax_dl>git log --oneline 57d33bee7..7acede9c 7acede9 Make omxtypes.h utf-8 TBR=rtoy@chromium.org BUG= 637203 , 454858 Review-Url: https://codereview.chromium.org/2515033002 Cr-Commit-Position: refs/heads/master@{#433373} [modify] https://crrev.com/c460f91295c10d709c0f743ba6a1e93748a320d7/DEPS
,
Nov 22 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7fff8524e1dc70a10e330b659acddf5ed72eee5f commit 7fff8524e1dc70a10e330b659acddf5ed72eee5f Author: scottmg <scottmg@chromium.org> Date: Tue Nov 22 17:23:43 2016 Strip invalid utf-8 characters from mc.exe header output R=brettw@chromium.org BUG= 454858 , 637203 Review-Url: https://codereview.chromium.org/2523593002 Cr-Commit-Position: refs/heads/master@{#433898} [modify] https://crrev.com/7fff8524e1dc70a10e330b659acddf5ed72eee5f/build/win/message_compiler.py
,
Nov 23 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/63384a79de37f89a88cfc02ff61eb649e4abe535 commit 63384a79de37f89a88cfc02ff61eb649e4abe535 Author: scottmg <scottmg@chromium.org> Date: Wed Nov 23 18:51:41 2016 Enable /utf-8 on Windows Applies only to chromium_code for now. There's some characters in third_party that cl.exe claims aren't representable in code page 65001 (which is its way of saying UTF-8 without a BOM). R=brucedawson@chromium.org BUG= 454858 , 637203 Review-Url: https://codereview.chromium.org/2488853002 Cr-Commit-Position: refs/heads/master@{#434212} [modify] https://crrev.com/63384a79de37f89a88cfc02ff61eb649e4abe535/build/config/compiler/BUILD.gn
,
Nov 30 2016
CED cl here: https://github.com/google/compact_enc_det/pull/3
,
Nov 30 2016
And pdfium LCMS2 here https://codereview.chromium.org/2545553002/
,
Dec 1 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/eecd11d7c8f2abd01b60ff088d8f66bcb50308b7 commit eecd11d7c8f2abd01b60ff088d8f66bcb50308b7 Author: scottmg <scottmg@chromium.org> Date: Thu Dec 01 20:07:40 2016 Roll CED to e57cdc4 [e57cdc4...]c:\src\cr\src\third_party\ced\src>git log --oneline 9012c0a..e57cdc4 e57cdc4 Merge pull request #3 from sgraham/master 3882135 Encode compact_enc_det.cc as utf-8 TBR=jinsukkim@chromium.org BUG= 454858 , 637203 Review-Url: https://codereview.chromium.org/2547623002 Cr-Commit-Position: refs/heads/master@{#435685} [modify] https://crrev.com/eecd11d7c8f2abd01b60ff088d8f66bcb50308b7/DEPS
,
Dec 2 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2b63f678948527e5e6ab1fee62093ee09fe9111b commit 2b63f678948527e5e6ab1fee62093ee09fe9111b Author: pdfium-deps-roller <pdfium-deps-roller@chromium.org> Date: Fri Dec 02 20:20:24 2016 Roll src/third_party/pdfium/ d7ecb5f27..0527ec571 (4 commits). https://pdfium.googlesource.com/pdfium.git/+log/d7ecb5f272de..0527ec571a88 $ git log d7ecb5f27..0527ec571 --date=short --no-merges --format='%ad %ae %s' 2016-12-02 tsepez Rename IFX_Stream to IFGAS_Stream. 2016-12-02 scottmg Encode lcms files as utf-8 2016-12-02 tsepez Tidy fx_stream.h 2016-12-02 tsepez Make concrete stream classes private to .cpp, part 4. BUG= 637203 , 454858 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, see: http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls TBR=dsinclair@chromium.org Review-Url: https://codereview.chromium.org/2543073003 Cr-Commit-Position: refs/heads/master@{#436006} [modify] https://crrev.com/2b63f678948527e5e6ab1fee62093ee09fe9111b/DEPS
,
Dec 4 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a53f3c3734ad9ee4322f6a9a60eede7f6b04530f commit a53f3c3734ad9ee4322f6a9a60eede7f6b04530f Author: scottmg <scottmg@chromium.org> Date: Sun Dec 04 06:50:21 2016 win: /utf-8 for all code Previously enabled for chromium_code, can now be turned on everywhere. BUG= 454858 , 637203 Review-Url: https://codereview.chromium.org/2543743002 Cr-Commit-Position: refs/heads/master@{#436189} [modify] https://crrev.com/a53f3c3734ad9ee4322f6a9a60eede7f6b04530f/build/config/compiler/BUILD.gn
,
Dec 4 2016
,
Dec 4 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/373af458aa9ab8f6f8650da115209c74d14f5dea commit 373af458aa9ab8f6f8650da115209c74d14f5dea Author: aleksandar.stojiljkovic <aleksandar.stojiljkovic@intel.com> Date: Sun Dec 04 16:47:15 2016 Revert "win: /utf-8 for all code" This reverts commit a53f3c3734ad9ee4322f6a9a60eede7f6b04530f. Original CL: https://codereview.chromium.org/2543743002 Reason for revert: Compilation issue on master.tryserver.chromium.win:win_optional_gpu_tests_rel Failing [1] task on the bot: FAILED: obj/third_party/angle/src/tests/angle_deqp_libgles3/es3pBufferDataUploadTests.obj ninja -t msvc -e environment.x86 -- E:\b\c\cipd\goma/gomacc.exe "E:\b\depot_tools\win_toolchain\vs_files\d5dc33b15d1b2c086f2f6632e2fd15882f80dbd3\VC\bin\amd64_x86/cl.exe" /nologo /showIncludes /FC @obj/third_party/angle/src/tests/angle_deqp_libgles3/es3pBufferDataUploadTests.obj.rsp /c ../../third_party/deqp/src/modules/gles3/performance/es3pBufferDataUploadTests.cpp /Foobj/third_party/angle/src/tests/angle_deqp_libgles3/es3pBufferDataUploadTests.obj /Fd"obj/third_party/angle/src/tests/angle_deqp_libgles3_cc.pdb" e:\b\c\b\win\src\third_party\deqp\src\modules\gles3\performance\es3pbufferdatauploadtests.cpp(5280): error C2220: warning treated as error - no 'object' file generated e:\b\c\b\win\src\third_party\deqp\src\modules\gles3\performance\es3pbufferdatauploadtests.cpp(5280): warning C4828: The file contains a character starting at offset 0x3714f that is illegal in the current source character set (codepage 65001). e:\b\c\b\win\src\third_party\deqp\src\modules\gles3\performance\es3pbufferdatauploadtests.cpp(5280): note: to simplify migration, consider the temporary use of /Wv:18 flag with the version of the compiler with which you used to build without warnings [1] https://build.chromium.org/p/tryserver.chromium.win/builders/win_optional_gpu_tests_rel/builds/5729/steps/compile%20%28without%20patch%29/logs/stdio BUG= 454858 , 637203 , 671021 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.win:win_optional_gpu_tests_rel NOTRY=true NOTREECHECKS=true TBR=scottmg@chromium.org, dpranke@chromium.org Review-Url: https://codereview.chromium.org/2550973002 Cr-Commit-Position: refs/heads/master@{#436195} [modify] https://crrev.com/373af458aa9ab8f6f8650da115209c74d14f5dea/build/config/compiler/BUILD.gn
,
Dec 13 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/663994ebf90ac57f5cc56a9639a16e29aee1ff2b commit 663994ebf90ac57f5cc56a9639a16e29aee1ff2b Author: scottmg <scottmg@chromium.org> Date: Tue Dec 13 03:34:05 2016 win: /utf-8 for all code Previously enabled for chromium_code, can now be turned on everywhere. BUG= 454858 , 637203 , 671021 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.win:win_optional_gpu_tests_rel Committed: https://crrev.com/a53f3c3734ad9ee4322f6a9a60eede7f6b04530f Cr-Commit-Position: refs/heads/master@{#436189} Review-Url: https://codereview.chromium.org/2543743002 Cr-Commit-Position: refs/heads/master@{#438029} [modify] https://crrev.com/663994ebf90ac57f5cc56a9639a16e29aee1ff2b/build/config/compiler/BUILD.gn
,
Dec 13 2016
source-charset turned out to not be what we wanted, despite it's relatively nice sounding name. But /utf-8 is on now everywhere.
,
Dec 13 2016
/utf-8 is equivalent to /source-charset:utf-8 /execution-charset:utf-8 , so if source-charset isn't what we want (why?) we shouldn't use /utf-8 either (?)
,
Dec 13 2016
Sorry, I meant that /source-charset:utf-8 alone just causes a mess. We need to have both, and that's what we have now.
,
Dec 13 2016
https://codereview.chromium.org/2488853002/#msg18 for reference. I guess those flag names make sense to C++/compiler people. :)
,
Mar 21 2017
Issue 417744 has been merged into this issue. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by js...@chromium.org
, Aug 12 2016