New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 772714 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner: ----
Closed: Oct 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug



Sign in to add a comment

Chromium must require UTF-8 for every page

Reported by bwsta...@gmail.com, Oct 8 2017

Issue description

UserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3235.0 Safari/537.36

Steps to reproduce the problem:
1. Access https://euckr.herokuapp.com

What is the expected behavior?
Chromium must force UTF-8 for the page regardless of meta charset

What went wrong?
Chromium accepted <meta charset="euc-kr" />

Did this work before? N/A 

Does this work in other browsers? N/A

Chrome version: 63.0.3235.0  Channel: canary
OS Version: 10.0
Flash Version: 

https://html.spec.whatwg.org/multipage/semantics.html#charset

>Regardless of whether a character encoding declaration is present or not, the actual character encoding used to encode the document must be UTF-8. [ENCODING]
 
Labels: Needs-Triage-M63

Comment 2 by bwsta...@gmail.com, Oct 9 2017

Ah, never mind, the parser spec didn't change and still requires non-UTF-8 charset support. https://html.spec.whatwg.org/multipage/parsing.html#character-encodings
Cc: sc00335...@techmahindra.com
Components: Blink>TextEncoding
Labels: Triaged-ET Needs-Feedback
Unable to reproduce this issue on reported version 63.0.3235.0 using Windows 10 with steps mentioned below.

1.Naviagted to https://euckr.herokuapp.com
2.Opened devtools and searched for UTF and found <meta charset="utf-8">. Attaching screenshot for reference.

@Reporter: Could you please confirm whether the steps mentioned are correct or not??

Thanks in advance!! 
Issue 772714.png
64.7 KB View Download
Status: WontFix (was: Unconfirmed)
That fragment of the HTML standard is for authors/authoring tools, not a normative description of how page encoding is determined.

It's not web-compatible to drop support for legacy encodings, unfortunately. Which is why HTML and Encoding spend so much time describing how to determine the encoding of documents.


To be clear, the phrase "Regardless of whether a character encoding declaration is present or not, the actual character encoding used to encode the document must be UTF-8." must be read within the later context:

"If an HTML document does not start with a BOM, and its encoding is not explicitly given by Content-Type metadata, and the document is not an iframe srcdoc document, then the encoding must be specified using a meta element with a charset attribute or a meta element with an http-equiv attribute in the Encoding declaration state."

And again, as an authoring requirement, UTF-8 is mandated. Implementations still need to handle legacy encodings, though.

Sign in to add a comment