New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 766813 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner: ----
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android , Windows , iOS
Pri: 3
Type: Bug



Sign in to add a comment

Windows-1258 codepage doesn't work correctly

Project Member Reported by zturner@chromium.org, Sep 19 2017

Issue description

Chrome Version: 60.0.3112.113 (Official Build) (64-bit) (cohort: Stable)
OS: Windows 10

A colleague forwarded me an email written in Vietnamese that was mangled.  When she (and I) received the email, it began like this (I'm intentionally not posting the full email so as to omit any PII:

--------------
Dear chiò Xuân

Em Hýõng kêì toaìn trung tâm Diòch
--------------

When I viewed the original source of the message, it looked like this:

--------------
--_000_616c597f49574d4a87f15b11ef0bff11mobifonevn_
Content-Type: text/plain; charset="windows-1258"
Content-Transfer-Encoding: quoted-printable

Dear chi=F2 Xu=E2n

Em H=FD=F5ng k=EA=EC toa=ECn trung t=E2m Di=F2ch

--------------

In order to find out if this was really windows-1258 I copied this into a file, replacing each =xx three-character-sequence with a ?.  Then I opened the file in a hex editor and changed all the ? characters to the specified hex character.  For example, the three bytes "=FD" becomes the one byte 0xFD.

After doing this, I opened the file in a text editor on my machine that allows me to manually choose the encoding.  I chose windows-1258 and the file displayed:

--------------
Em Hương kế toán trung tâm Dịch
--------------

So, as far as I can tell, the client sent the message correctly encoded, and it is chrome's fault for not displaying it properly.

Is this a known issue?
 

Comment 1 by jsb...@chromium.org, Sep 19 2017

Given that Chrome is not an email client, there's a whole system in between (presumably one of those fancy web-mail systems that are popular these days) that's processing the email before turning it into HTML which it gives to Chrome - likely a mix of server-side and client-side code.

For example, it appears that when I'm using gmail individual messages are sent as part of JSON data fetched with UTF-8 encoding. The text has been transcoded on the server already. The pages making up the web app have characterSet "UTF-8", and the encoding isn't changed. So far as I know, gmail doesn't have dependencies on the character encoding support of the client.

Do you have any indication that non-UTF-8 data is being sent to Chrome to decode? (You could look for XHR fetches on the Network tab in DevTools)

What is the behavior in other browsers?

I'm not super familiar with these dev tools, can you walk me through how to do this?

What strikes me as odd -- and the reason I filed the bug -- is that:

a) If I View Original, I get text that looks like this:

Em H=FD=F5ng k=EA=EC toa=ECn trung t=E2m Di=F2ch

which, when manually decoded using the charset that the email header claims to be encoded in, is correct.

b) If I send the correct decoded text to myself using chrome+gmail, I get an email that contains text that looks like this:

Em Hương kế toán trung tâm Dịch

And if I then Show Original on that, I see text that looks like this:

Em H=3DFD=3DF5ng k=3DEA=3DEC toa=3DECn


In the first case, we have valid Windows-1258, which claims to be in Windows-1258 in the email header, and is displayed improperly.

In the second case, we have valid UTF-8, which claims to be in UTF-8 in the email header, and is displayed properly.

It seems this does occur on other browsers though, so perhaps the problem is in gmail? 
Cc: linds...@chromium.org
Should this for sure have iOS tagged?
#3: I'm not entirely sure, but I definitely confirmed the bug exists on iOS.

Comment 5 by jsb...@chromium.org, Sep 21 2017

Status: WontFix (was: Untriaged)
So far as I can tell, gmail never sends Windows-1258 encoded content to the browser. It's transcoding to UTF-8 on the server, whether for display or "show original". I found some old mail in gmail with Content-Type: text/plain; charset="windows-1252" and it's sent as UTF-8 when sent by the server.

If you load the attached win1258.html file - which declares the encoding via <meta charset=windows-1258> - you'll see it decodes correctly and document.characterSet reports "windows-1258"

So yes, this looks like a gmail issue (or at least gmail and the mail server are disagreeing).
win1258.html
79 bytes View Download

Sign in to add a comment