New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 648556 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner: ----
Closed: Sep 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Compat



Sign in to add a comment

Don't set fallback charset (US-ASCII) for Data URIs

Reported by l446240525@gmail.com, Sep 20 2016

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2864.0 Safari/537.36

Example URL:
data:text/html,你好

Steps to reproduce the problem:
See  issue 597488 , manual encoding selection is gone, so I can't change the encoding to UTF-8 manually, if you remove 'charset=US-ASCII' in the Content-Type header, the encoding auto-detector will do that for me.

What is the expected behavior?

What went wrong?
.

Does it occur on multiple sites: No

Is it a problem with a plugin? No 

Did this work before? N/A 

Does this work in other browsers? Yes 

Chrome version: 55.0.2864.0  Channel: n/a
OS Version: OS X 10.10.4
Flash Version: Shockwave Flash 23.0 r0
 
Components: Internals>Network
Labels: -OS-Mac
Status: Untriaged (was: Unconfirmed)
I believe this is Working as Intended

https://tools.ietf.org/html/rfc2397
"If <mediatype> is omitted, it defaults to text/plain;charset=US-ASCII."

https://cs.chromium.org/chromium/src/net/base/data_url.cc?sq=package:chromium&l=83

Arguably, the spec doesn't say what to do when the mediatype is present but the charset is omitted. 

Comment 3 by aelias@chromium.org, Sep 20 2016

Cc: aelias@chromium.org jinsuk...@chromium.org elawrence@chromium.org
Components: Blink>TextEncoding
Labels: OS-All
Status: Available (was: Untriaged)
Note that CED encoding autodetector is currently *not* running on these data URLs.  For example, data:text/html,ပိတောက်စာအုပ် also appears as latin1 mojibake on 55.0.2865.0, although this string is known to be correctly detectable as UTF-8 (it displays correctly if loaded as a text file with file:// URL).

Personally I think it is definitely correct not to run the autodetector on such data URLs, since they are frequently used for implementation details instead of user-visible text, and running the detector on them might be extremely wasteful of CPU time.

The remaining question is whether we should change the default data URL encoding to UTF-8 instead of latin1.  If the spec allows us the flexibility to do so and that causes no regressions, I might support that in the spirit of "UTF-8 is king of encodings, moar UTF-8 is better" :).

Comment 4 by aelias@chromium.org, Sep 20 2016

Status: WontFix (was: Available)
On further thought, I'm not sure the data coming in the data:// URL is correctly sanitized such that we could treat it as UTF-8, and I'm worried about the potential fallout of introducing such sanitization, so I don't think we should start down that path unless we have a strong reason to do so.  Let's stick to the latin1 status quo.

Sign in to add a comment