New issue
Advanced search Search tips

Issue 688502 link

Starred by 2 users

Issue metadata

Status: Duplicate
Merged: issue 586521
Owner:
Closed: Oct 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 3
Type: Bug



Sign in to add a comment

Fetch adds spurious charset=US-ASCII to data URL fetches

Project Member Reported by rbyers@chromium.org, Feb 3 2017

Issue description

Chrome Version: 58.0.3000.0
OS: Mac (but probably all).

What steps will reproduce the problem?
(1) Fetch a data URL for an image/png file, http://jsbin.com/vomirey/edit?html,js,output 
(2) Read the Content-Type of the result header

What is the expected result?
Content-Type: image/png

What happens instead?
Content-Type: image/png;charset=US-ASCII

This is causing a number of fetch WPT failures, eg: http://w3c-test.org/fetch/api/basic/scheme-data.html

Works correctly on Firefox 50.1.0
Works in Safari tech preview Release 22 (though latest official Safari doesn't yet support fetch).
Edge 14 seems not to support fetch from data URLs at all.


 
Owner: yhirano@chromium.org
Components: Internals>Network
The charset value is added in net::DataURL::Parse. There is a comment in net::URLRequestDataJob::BuildResponse saying

 // "charset" in the Content-Type header is specified explicitly to follow
 // the "token" ABNF in the HTTP spec. When DataURL::Parse() call is
 // successful, it's guaranteed that the string in |charset| follows the
 // "token" ABNF.

On the other hand, RFC2616 says in "3.7.1 Canonicalization and Text Defaults:

   The "charset" parameter is used with some media types to define the
   character set (section 3.4) of the data. When no explicit charset
   parameter is provided by the sender, media subtypes of the "text"
   type are defined to have a default charset value of "ISO-8859-1" when
   received via HTTP. Data in character sets other than "ISO-8859-1" or
   its subsets MUST be labeled with an appropriate charset value. See
   section 3.4.1 for compatibility problems.

So in order to be conformant to RFC2616 I think we should not add the default charset for media types other than "text/" media types.

By the way, the default value is updated in RFC 7231:

   Appendix B.  Changes from RFC 2616

   ...
   The default charset of ISO-8859-1 for text media types has been
   removed; the default is now whatever the media type definition says.
   Likewise, special treatment of ISO-8859-1 has been removed from the
   Accept-Charset header field.  (Section 3.1.1.3 and Section 5.3.3)

and I'm don't know for what media types we should complement the charset at all.

cc-ing net/ people. Do you agree with my observation above? Please correct me if I'm wrong.

Thanks!

yhirano, your reasoning looks right to me (though I wouldn't consider myself an expert here).

Note that for "text/plain" we still want US-ASCII per RFC 6657: https://tools.ietf.org/html/rfc6657#section-4
   The default "charset" parameter value for "text/plain" is unchanged
   from [RFC2046] and remains as "US-ASCII".


That is text/ doesn't get default behavior, but text/plain still does.
Status: Assigned (was: Untriaged)
Cc: mmenke@chromium.org csharrison@chromium.org
 Issue 694661  has been merged into this issue.
http://w3c-test.org/XMLHttpRequest/data-uri.htm (from dupe) has more tests.
Note that Chrome doesn't support US-ASCII as an encoding. Furthermore, just adding it to a MIME type is not really what default means there. And on top of that, I doubt Chrome actually consistently uses US-ASCII (or windows-1252 which it maps to) to decode text/plain resources. I'm pretty sure it doesn't. So all those arguments are rather spurious.
>#7
So you think we should not add charset at all, right?

Comment 9 by mmenke@chromium.org, Feb 22 2017

Sniffed mime types haven't historically been exposed directly to the web platform, have they?  So adding it to MIME type or not was historically an implementation detail, not something that mattered, but then exposing it via the fetch API has changed that, right?  Seems like more a fetch issue than a net stack issue.
Oh, oops - was thinking this was at the mime sniffing layer, not the headers layer.  Should we be creating bogus headers, either way?  Seems like the right thing to do may just be to expose this as if it were a sniffed mime type (Though that's done at a higher layer than URL parsing, so plumbing that may get hinky)
It's reproducible with XHR so it's not fetch API specific.
Yeah, I don't think charset should be included as it's not actually what we end up using. At some point someone needs to write a new data URL standard.
Cc: kouhei@chromium.org
Mergedinto: 586521
Status: Duplicate (was: Assigned)
This issue is fixed by shoon.kim@lge.com.

Sign in to add a comment