New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 770210 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner: ----
Closed: Oct 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Compat



Sign in to add a comment

Encoding problem on Windows (-)

Reported by julesroh...@googlemail.com, Sep 29 2017

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36

Example URL:

Steps to reproduce the problem:
This is a rendering issue

What is the expected behavior?
Ignore strange character

What went wrong?
We are rending text from a pdf in the browser. When checking the data response from the server in the network tab, the text displays a strange character (please see the attached screenshot). On OSX the character appears to be ignored, but on Windows two commas (,,) are rendered in its place. The character is also ignored in stdout logging in the OSX terminal. Firefox and IE on Windows both appear to ignore this character and do not render anything in its place. Please see the following resources

http://www.i18nqa.com/debug/bug-iso8859-1-vs-windows-1252.html

http://www.i18nqa.com/debug/utf8-debug.html 

Does it occur on multiple sites: N/A

Is it a problem with a plugin? N/A 

Did this work before? N/A 

Does this work in other browsers? Yes

Chrome version: 61.0.3163.100  Channel: stable
OS Version: 7/10
Flash Version:
 
Screen Shot 2017-09-29 at 15.04.43.png
7.4 KB View Download
copying and pasting the strange red elipsis character into google provides a clue.
The paste command will display nothing, but google will run a search on „
Hovering over the red elipsis in chrome displays (\u84)
Cc: krajshree@chromium.org
Components: Blink>TextEncoding
Labels: Needs-Triage-M61 Needs-Feedback
julesrohanveling@ - Thanks for filing the issue...!!

Could you please provide a sample URL to test the issue from TE-end.
This will help us in triaging the issue further.

Thanks...!!
Hi, thanks for your response... 

Please go to http://www.sciencedirect.com/science/article/pii/S2214647416300393 and click on the download pdf link at the top of the page. 

Thanks,

Julian
Project Member

Comment 5 by sheriffbot@chromium.org, Oct 4 2017

Labels: -Needs-Feedback
Thank you for providing more feedback. Adding requester "krajshree@chromium.org" to the cc list and removing "Needs-Feedback" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
For those trying to follow the steps:

1. Go to http://www.sciencedirect.com/science/article/pii/S2214647416300393
2. Click the "Download PDF" link at the top, then "Article"
3. This will open a new tab. Note that this is NOT downloading a PDF; as the OP notes it's rendering the PDF to HTML on the server.
4. Open DevTools, go to the Network tab
5. Reload the page
6. In the DevTools Network tab, click on the document entry (first one)
7. In the DevTools Network tab, click on the Response header - it's big, on my powerful machine it took several seconds to appear.

The red dot appears in the <title> as per the screenshot. Here's what I get when I copy/paste it on windows: Cyclo(His-„Pro) 






Cc: e...@chromium.org
Components: -Blink>TextEncoding Blink>Fonts
And FYI the raw bytes by curling the URL piped through hexdump -C:

00000010  20 20 3c 68 74 6d 6c 20  6c 61 6e 67 3d 22 65 6e  |  <html lang="en|
00000020  22 3e 0a 20 20 3c 68 65  61 64 3e 0a 20 20 20 20  |">.  <head>.    |
00000030  3c 74 69 74 6c 65 3e 4d  65 74 61 62 6f 6c 69 63  |<title>Metabolic|
00000040  20 72 65 6c 61 74 69 6f  6e 73 68 69 70 20 62 65  | relationship be|
00000050  74 77 65 65 6e 20 64 69  61 62 65 74 65 73 20 61  |tween diabetes a|
00000060  6e 64 20 41 6c 7a 68 65  69 6d 65 72 26 61 70 6f  |nd Alzheimer&apo|
00000070  73 3b 73 20 44 69 73 65  61 73 65 20 61 66 66 65  |s;s Disease affe|
00000080  63 74 65 64 20 62 79 20  43 79 63 6c 6f 28 48 69  |cted by Cyclo(Hi|
00000090  73 2d c2 84 50 72 6f 29  20 70 6c 75 73 20 7a 69  |s-..Pro) plus zi|
000000a0  6e 63 20 74 72 65 61 74  6d 65 6e 74 3c 2f 74 69  |nc treatment</ti|
000000b0  74 6c 65 3e 0a 20 20 20  20 3c 6d 65 74 61 20 63  |tle>.    <meta c|
000000c0  68 61 72 73 65 74 3d 22  55 54 46 2d 38 22 3e 0a  |harset="UTF-8">.|

document.characterSet is "UTF-8"

The encoded bytes in question are c2 84:

new TextDecoder("UTF-8").decode(new Uint8Array([0xc2, 0x84])).charCodeAt(0).toString(16)

>> "84"

So what's there is U+0084 (as noted in comment #2)

U+0084 is a Unicode control character. Different fonts render that differently; on Windows I get the double-comma. On Linux I get a wide underscore.

So this doesn't look like an encoding issue. We're decoding the incoming UTF-8 exactly as expected.

Note that Chrome doesn't filter control characters out per  issue 530342 . In non-web content areas (window tabs, devtools) and clipboard we're likely just going to rely on what the system font provides.

Moving to Blink > Fonts, but I think this is WAI.


Comment 8 by e...@chromium.org, Oct 4 2017

Status: WontFix (was: Unconfirmed)
Yes this was an intentional change agreed to by the CSS WG to match the unicode specification. The other browsers are either already doing the same or are in the process of doing so.

Comment 9 Deleted

Sign in to add a comment