Chromium must require UTF-8 for every page
Reported by
bwsta...@gmail.com,
Oct 8 2017
|
|||
Issue descriptionUserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3235.0 Safari/537.36 Steps to reproduce the problem: 1. Access https://euckr.herokuapp.com What is the expected behavior? Chromium must force UTF-8 for the page regardless of meta charset What went wrong? Chromium accepted <meta charset="euc-kr" /> Did this work before? N/A Does this work in other browsers? N/A Chrome version: 63.0.3235.0 Channel: canary OS Version: 10.0 Flash Version: https://html.spec.whatwg.org/multipage/semantics.html#charset >Regardless of whether a character encoding declaration is present or not, the actual character encoding used to encode the document must be UTF-8. [ENCODING]
,
Oct 9 2017
Ah, never mind, the parser spec didn't change and still requires non-UTF-8 charset support. https://html.spec.whatwg.org/multipage/parsing.html#character-encodings
,
Oct 9 2017
Unable to reproduce this issue on reported version 63.0.3235.0 using Windows 10 with steps mentioned below. 1.Naviagted to https://euckr.herokuapp.com 2.Opened devtools and searched for UTF and found <meta charset="utf-8">. Attaching screenshot for reference. @Reporter: Could you please confirm whether the steps mentioned are correct or not?? Thanks in advance!!
,
Oct 9 2017
That fragment of the HTML standard is for authors/authoring tools, not a normative description of how page encoding is determined. It's not web-compatible to drop support for legacy encodings, unfortunately. Which is why HTML and Encoding spend so much time describing how to determine the encoding of documents.
,
Oct 9 2017
To be clear, the phrase "Regardless of whether a character encoding declaration is present or not, the actual character encoding used to encode the document must be UTF-8." must be read within the later context: "If an HTML document does not start with a BOM, and its encoding is not explicitly given by Content-Type metadata, and the document is not an iframe srcdoc document, then the encoding must be specified using a meta element with a charset attribute or a meta element with an http-equiv attribute in the Encoding declaration state." And again, as an authoring requirement, UTF-8 is mandated. Implementations still need to handle legacy encodings, though. |
|||
►
Sign in to add a comment |
|||
Comment 1 by nyerramilli@chromium.org
, Oct 9 2017