Windows-1258 codepage doesn't work correctly |
|||
Issue descriptionChrome Version: 60.0.3112.113 (Official Build) (64-bit) (cohort: Stable) OS: Windows 10 A colleague forwarded me an email written in Vietnamese that was mangled. When she (and I) received the email, it began like this (I'm intentionally not posting the full email so as to omit any PII: -------------- Dear chiò Xuân Em Hýõng kêì toaìn trung tâm Diòch -------------- When I viewed the original source of the message, it looked like this: -------------- --_000_616c597f49574d4a87f15b11ef0bff11mobifonevn_ Content-Type: text/plain; charset="windows-1258" Content-Transfer-Encoding: quoted-printable Dear chi=F2 Xu=E2n Em H=FD=F5ng k=EA=EC toa=ECn trung t=E2m Di=F2ch -------------- In order to find out if this was really windows-1258 I copied this into a file, replacing each =xx three-character-sequence with a ?. Then I opened the file in a hex editor and changed all the ? characters to the specified hex character. For example, the three bytes "=FD" becomes the one byte 0xFD. After doing this, I opened the file in a text editor on my machine that allows me to manually choose the encoding. I chose windows-1258 and the file displayed: -------------- Em Hương kế toán trung tâm Dịch -------------- So, as far as I can tell, the client sent the message correctly encoded, and it is chrome's fault for not displaying it properly. Is this a known issue?
,
Sep 19 2017
I'm not super familiar with these dev tools, can you walk me through how to do this? What strikes me as odd -- and the reason I filed the bug -- is that: a) If I View Original, I get text that looks like this: Em H=FD=F5ng k=EA=EC toa=ECn trung t=E2m Di=F2ch which, when manually decoded using the charset that the email header claims to be encoded in, is correct. b) If I send the correct decoded text to myself using chrome+gmail, I get an email that contains text that looks like this: Em Hương kế toán trung tâm Dịch And if I then Show Original on that, I see text that looks like this: Em H=3DFD=3DF5ng k=3DEA=3DEC toa=3DECn In the first case, we have valid Windows-1258, which claims to be in Windows-1258 in the email header, and is displayed improperly. In the second case, we have valid UTF-8, which claims to be in UTF-8 in the email header, and is displayed properly. It seems this does occur on other browsers though, so perhaps the problem is in gmail?
,
Sep 20 2017
Should this for sure have iOS tagged?
,
Sep 20 2017
#3: I'm not entirely sure, but I definitely confirmed the bug exists on iOS.
,
Sep 21 2017
So far as I can tell, gmail never sends Windows-1258 encoded content to the browser. It's transcoding to UTF-8 on the server, whether for display or "show original". I found some old mail in gmail with Content-Type: text/plain; charset="windows-1252" and it's sent as UTF-8 when sent by the server. If you load the attached win1258.html file - which declares the encoding via <meta charset=windows-1258> - you'll see it decodes correctly and document.characterSet reports "windows-1258" So yes, this looks like a gmail issue (or at least gmail and the mail server are disagreeing). |
|||
►
Sign in to add a comment |
|||
Comment 1 by jsb...@chromium.org
, Sep 19 2017