Don't set fallback charset (US-ASCII) for Data URIs
Reported by
l446240525@gmail.com,
Sep 20 2016
|
|||
Issue descriptionUserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2864.0 Safari/537.36 Example URL: data:text/html,你好 Steps to reproduce the problem: See issue 597488 , manual encoding selection is gone, so I can't change the encoding to UTF-8 manually, if you remove 'charset=US-ASCII' in the Content-Type header, the encoding auto-detector will do that for me. What is the expected behavior? What went wrong? . Does it occur on multiple sites: No Is it a problem with a plugin? No Did this work before? N/A Does this work in other browsers? Yes Chrome version: 55.0.2864.0 Channel: n/a OS Version: OS X 10.10.4 Flash Version: Shockwave Flash 23.0 r0
,
Sep 20 2016
https://cs.chromium.org/chromium/src/net/base/data_url.cc?sq=package:chromium&l=83 Arguably, the spec doesn't say what to do when the mediatype is present but the charset is omitted.
,
Sep 20 2016
Note that CED encoding autodetector is currently *not* running on these data URLs. For example, data:text/html,ပိတောက်စာအုပ် also appears as latin1 mojibake on 55.0.2865.0, although this string is known to be correctly detectable as UTF-8 (it displays correctly if loaded as a text file with file:// URL). Personally I think it is definitely correct not to run the autodetector on such data URLs, since they are frequently used for implementation details instead of user-visible text, and running the detector on them might be extremely wasteful of CPU time. The remaining question is whether we should change the default data URL encoding to UTF-8 instead of latin1. If the spec allows us the flexibility to do so and that causes no regressions, I might support that in the spirit of "UTF-8 is king of encodings, moar UTF-8 is better" :).
,
Sep 20 2016
On further thought, I'm not sure the data coming in the data:// URL is correctly sanitized such that we could treat it as UTF-8, and I'm worried about the potential fallout of introducing such sanitization, so I don't think we should start down that path unless we have a strong reason to do so. Let's stick to the latin1 status quo. |
|||
►
Sign in to add a comment |
|||
Comment 1 by elawrence@chromium.org
, Sep 20 2016Labels: -OS-Mac
Status: Untriaged (was: Unconfirmed)