New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 637203 link

Starred by 9 users

Issue metadata

Status: Fixed
Owner:
Closed: Dec 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 3
Type: Bug

Blocked on:
issue 671021

Blocking:
issue 454858



Sign in to add a comment

Set the source charset to UTF-8 in Visual C++

Project Member Reported by js...@chromium.org, Aug 12 2016

Issue description

Now that Visual Studio 2015 is required to compile Chromium on Windows and VS 2015 finally supports the source code encoding in UTF-8 without BOM, it's time to use that option.

https://msdn.microsoft.com/en-us/library/mt708821.aspx

Let's use either '/utf-8' or '/source-charset:utf-8'.  Perhaps, the latter is safer. (I don't know the implication of '/execution-charset:utf-8' ). 

There might be some non-UTF-8 and non-ASCII files (mostly in comments). Turn '/source-charset:utf-8' on and those files will be identified. 

Then, we can get rid of a warning page about compiling Chromium/Blink on CJK Windows (to be precise, on Widnows with the default code page set to one of CJK legacy code pages)> 
 

Comment 1 by js...@chromium.org, Aug 12 2016

Blocking: 454858
Owner: brucedaw...@chromium.org
Status: Started (was: Untriaged)
The biggest source of problems seems to be the copyright symbol (U+00A9). This shows up in 443 files encoded as a single 0xA9 byte instead of the 0xC2 0xA9 sequence that would be correct for UTF-8.

Other problematic characters include Unicode Character 'SET TRANSMIT STATE' (U+0093) (aka left smart quote).

Typical warnings are:
third_party\pdfium\third_party\lcms2-2.6\src\cmstypes.c(910): warning C4828: The file contains a character starting at offset 0x7716 that is illegal in the current source character set (codepage 65001).

This warning is frustrating because I have to load the file into VS in binary mode in order to find out what the bad character is. Most of the problems are in comments, but the warning doesn't indicate this.

So far these appear to all be in third_party repos so it is probably best to start with just link, then just first-party Chromium, and then everything.

crrev.com/2243093002 enables this flag for blink. PTAL? I'm not sure if that is a sufficient start.

Comment 4 by js...@chromium.org, Aug 14 2016

Thanks for taking a look. Your plan sounds good to me. 

As for non-UTF-8 / non-ASCII files, what I did in  bug 454858  (obviously, I didn't go all the way through. there are lots of them as you found out) was to run 'iconv -f UTF-8' to list them up. Then, I replaced 'usual suspects' (copyright symbol, smart quotes, etc in ISO-8859-1 or Windows-1252) with their ASCII-equivalents (now, they can be replaced by their UTF-8 equivalents) in batch (with a script).  

After that, the number of files to touch would be much smaller (I hope). And, we can iterate with newly found  (not-yet-replaced) 'offending' byte sequences. :-)





Comment 5 by thakis@chromium.org, Aug 26 2016

https://chromiumcodereview.appspot.com/10797029 converted all files to utf-8 back then. Maybe we should just do that again?
Agreed - converting all files to utf-8 either before adding the compiler switch or as part of adding the change that adds the compiler switch would be perfect. I haven't been looking at this lately because I have some higher profile bugs but I will get to it eventually. If somebody wants to do it first then just check with me and then grab it.
Cc: kulshin@chromium.org
crrev.com/2243093002 is my first stab at this, but I haven't looked at it for a week. Adding a reference here to tie things together. There are some useful comments on the CL, such us a suggestion that this should be applied to all of chromium_code instead of just blink.
Cc: brucedaw...@chromium.org
Owner: scottmg@chromium.org
I'll see if I can get your CL landed, with it moved to chromium_code instead.

I plan to take the approach of simply re-encoding everything as utf-8 if it's problematic.
Thanks for grabbing this Scott.

Going with chromium_code only sounds appropriate. I think you should definitely re-encode everything as utf-8 as well. If nothing else I hate the idea of telling the compiler that we are feeding it utf-8 and then feeding it text files that are a Franken-monster-mash of utf-8 and extended ASCII.
Project Member

Comment 10 by bugdroid1@chromium.org, Nov 19 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c460f91295c10d709c0f743ba6a1e93748a320d7

commit c460f91295c10d709c0f743ba6a1e93748a320d7
Author: scottmg <scottmg@chromium.org>
Date: Sat Nov 19 01:50:26 2016

Roll openmax_dl 57d33bee7..7acede9c

src\third_party\openmax_dl>git log --oneline 57d33bee7..7acede9c
7acede9 Make omxtypes.h utf-8

TBR=rtoy@chromium.org
BUG= 637203 , 454858 

Review-Url: https://codereview.chromium.org/2515033002
Cr-Commit-Position: refs/heads/master@{#433373}

[modify] https://crrev.com/c460f91295c10d709c0f743ba6a1e93748a320d7/DEPS

Project Member

Comment 11 by bugdroid1@chromium.org, Nov 22 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7fff8524e1dc70a10e330b659acddf5ed72eee5f

commit 7fff8524e1dc70a10e330b659acddf5ed72eee5f
Author: scottmg <scottmg@chromium.org>
Date: Tue Nov 22 17:23:43 2016

Strip invalid utf-8 characters from mc.exe header output

R=brettw@chromium.org
BUG= 454858 ,  637203 

Review-Url: https://codereview.chromium.org/2523593002
Cr-Commit-Position: refs/heads/master@{#433898}

[modify] https://crrev.com/7fff8524e1dc70a10e330b659acddf5ed72eee5f/build/win/message_compiler.py

Project Member

Comment 12 by bugdroid1@chromium.org, Nov 23 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/63384a79de37f89a88cfc02ff61eb649e4abe535

commit 63384a79de37f89a88cfc02ff61eb649e4abe535
Author: scottmg <scottmg@chromium.org>
Date: Wed Nov 23 18:51:41 2016

Enable /utf-8 on Windows

Applies only to chromium_code for now. There's some characters in
third_party that cl.exe claims aren't representable in code page 65001
(which is its way of saying UTF-8 without a BOM).

R=brucedawson@chromium.org
BUG= 454858 , 637203 

Review-Url: https://codereview.chromium.org/2488853002
Cr-Commit-Position: refs/heads/master@{#434212}

[modify] https://crrev.com/63384a79de37f89a88cfc02ff61eb649e4abe535/build/config/compiler/BUILD.gn

And pdfium LCMS2 here https://codereview.chromium.org/2545553002/
Project Member

Comment 15 by bugdroid1@chromium.org, Dec 1 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/eecd11d7c8f2abd01b60ff088d8f66bcb50308b7

commit eecd11d7c8f2abd01b60ff088d8f66bcb50308b7
Author: scottmg <scottmg@chromium.org>
Date: Thu Dec 01 20:07:40 2016

Roll CED to e57cdc4

[e57cdc4...]c:\src\cr\src\third_party\ced\src>git log --oneline 9012c0a..e57cdc4
e57cdc4 Merge pull request #3 from sgraham/master
3882135 Encode compact_enc_det.cc as utf-8

TBR=jinsukkim@chromium.org
BUG= 454858 , 637203 

Review-Url: https://codereview.chromium.org/2547623002
Cr-Commit-Position: refs/heads/master@{#435685}

[modify] https://crrev.com/eecd11d7c8f2abd01b60ff088d8f66bcb50308b7/DEPS

Project Member

Comment 16 by bugdroid1@chromium.org, Dec 2 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2b63f678948527e5e6ab1fee62093ee09fe9111b

commit 2b63f678948527e5e6ab1fee62093ee09fe9111b
Author: pdfium-deps-roller <pdfium-deps-roller@chromium.org>
Date: Fri Dec 02 20:20:24 2016

Roll src/third_party/pdfium/ d7ecb5f27..0527ec571 (4 commits).

https://pdfium.googlesource.com/pdfium.git/+log/d7ecb5f272de..0527ec571a88

$ git log d7ecb5f27..0527ec571 --date=short --no-merges --format='%ad %ae %s'
2016-12-02 tsepez Rename IFX_Stream to IFGAS_Stream.
2016-12-02 scottmg Encode lcms files as utf-8
2016-12-02 tsepez Tidy fx_stream.h
2016-12-02 tsepez Make concrete stream classes private to .cpp, part 4.

BUG= 637203 , 454858 

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, see:
http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls

TBR=dsinclair@chromium.org

Review-Url: https://codereview.chromium.org/2543073003
Cr-Commit-Position: refs/heads/master@{#436006}

[modify] https://crrev.com/2b63f678948527e5e6ab1fee62093ee09fe9111b/DEPS

Project Member

Comment 17 by bugdroid1@chromium.org, Dec 4 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a53f3c3734ad9ee4322f6a9a60eede7f6b04530f

commit a53f3c3734ad9ee4322f6a9a60eede7f6b04530f
Author: scottmg <scottmg@chromium.org>
Date: Sun Dec 04 06:50:21 2016

win: /utf-8 for all code

Previously enabled for chromium_code, can now be turned on everywhere.

BUG= 454858 , 637203 

Review-Url: https://codereview.chromium.org/2543743002
Cr-Commit-Position: refs/heads/master@{#436189}

[modify] https://crrev.com/a53f3c3734ad9ee4322f6a9a60eede7f6b04530f/build/config/compiler/BUILD.gn

Comment 18 by kbr@chromium.org, Dec 4 2016

Blockedon: 671021
Project Member

Comment 19 by bugdroid1@chromium.org, Dec 4 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/373af458aa9ab8f6f8650da115209c74d14f5dea

commit 373af458aa9ab8f6f8650da115209c74d14f5dea
Author: aleksandar.stojiljkovic <aleksandar.stojiljkovic@intel.com>
Date: Sun Dec 04 16:47:15 2016

Revert "win: /utf-8 for all code"

This reverts commit a53f3c3734ad9ee4322f6a9a60eede7f6b04530f.

Original CL: https://codereview.chromium.org/2543743002

Reason for revert: Compilation issue on master.tryserver.chromium.win:win_optional_gpu_tests_rel

Failing [1] task on the bot:

FAILED: obj/third_party/angle/src/tests/angle_deqp_libgles3/es3pBufferDataUploadTests.obj
ninja -t msvc -e environment.x86 -- E:\b\c\cipd\goma/gomacc.exe "E:\b\depot_tools\win_toolchain\vs_files\d5dc33b15d1b2c086f2f6632e2fd15882f80dbd3\VC\bin\amd64_x86/cl.exe" /nologo /showIncludes /FC @obj/third_party/angle/src/tests/angle_deqp_libgles3/es3pBufferDataUploadTests.obj.rsp /c ../../third_party/deqp/src/modules/gles3/performance/es3pBufferDataUploadTests.cpp /Foobj/third_party/angle/src/tests/angle_deqp_libgles3/es3pBufferDataUploadTests.obj /Fd"obj/third_party/angle/src/tests/angle_deqp_libgles3_cc.pdb"
e:\b\c\b\win\src\third_party\deqp\src\modules\gles3\performance\es3pbufferdatauploadtests.cpp(5280): error C2220: warning treated as error - no 'object' file generated
e:\b\c\b\win\src\third_party\deqp\src\modules\gles3\performance\es3pbufferdatauploadtests.cpp(5280): warning C4828: The file contains a character starting at offset 0x3714f that is illegal in the current source character set (codepage 65001).
e:\b\c\b\win\src\third_party\deqp\src\modules\gles3\performance\es3pbufferdatauploadtests.cpp(5280): note: to simplify migration, consider the temporary use of /Wv:18 flag with the version of the compiler with which you used to build without warnings

[1]
https://build.chromium.org/p/tryserver.chromium.win/builders/win_optional_gpu_tests_rel/builds/5729/steps/compile%20%28without%20patch%29/logs/stdio

BUG= 454858 ,  637203 ,  671021 

CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.win:win_optional_gpu_tests_rel
NOTRY=true
NOTREECHECKS=true
TBR=scottmg@chromium.org, dpranke@chromium.org

Review-Url: https://codereview.chromium.org/2550973002
Cr-Commit-Position: refs/heads/master@{#436195}

[modify] https://crrev.com/373af458aa9ab8f6f8650da115209c74d14f5dea/build/config/compiler/BUILD.gn

Project Member

Comment 20 by bugdroid1@chromium.org, Dec 13 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/663994ebf90ac57f5cc56a9639a16e29aee1ff2b

commit 663994ebf90ac57f5cc56a9639a16e29aee1ff2b
Author: scottmg <scottmg@chromium.org>
Date: Tue Dec 13 03:34:05 2016

win: /utf-8 for all code

Previously enabled for chromium_code, can now be turned on everywhere.

BUG= 454858 , 637203 ,  671021 

CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.win:win_optional_gpu_tests_rel
Committed: https://crrev.com/a53f3c3734ad9ee4322f6a9a60eede7f6b04530f
Cr-Commit-Position: refs/heads/master@{#436189}

Review-Url: https://codereview.chromium.org/2543743002
Cr-Commit-Position: refs/heads/master@{#438029}

[modify] https://crrev.com/663994ebf90ac57f5cc56a9639a16e29aee1ff2b/build/config/compiler/BUILD.gn

Status: Fixed (was: Started)
source-charset turned out to not be what we wanted, despite it's relatively nice sounding name. But /utf-8 is on now everywhere.
/utf-8 is equivalent to /source-charset:utf-8 /execution-charset:utf-8 , so if source-charset isn't what we want (why?) we shouldn't use /utf-8 either (?)
Sorry, I meant that /source-charset:utf-8 alone just causes a mess. We need to have both, and that's what we have now.
https://codereview.chromium.org/2488853002/#msg18 for reference. I guess those flag names make sense to C++/compiler people. :)
 Issue 417744  has been merged into this issue.

Sign in to add a comment