New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 892337 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Align MIME parsing with WHATWG spec at https://mimesniff.spec.whatwg.org/

Project Member Reported by mmenke@chromium.org, Oct 4

Issue description

Ways in which we deviate from the doc:

1) We don't Trim whitespace within quotes.
2) We ignore the first "(" in a charset, and everything after it.
3) We don't allow quoted empty strings as values (Unquoted ones we correctly disallow)
4) We take last value in a semi-colon separated list, instead of the first.
5) When we have multiple Content-Type lines, we take the last instead of the first.
6) We don't restrict the characters we allow in Content-Type fields.
7) We trim whitespace before the first quote in a Content-Type quoted string.

I plan to fix 1) through 4) in a single CL, and 5) in a followup.  I'll put off 6) until the changes go to stable and seem to be working.  I'm not entirely sure 7) is worth the potential compatibility issues.
 
Status: Assigned (was: Untriaged)
Cc: mmenke@chromium.org b...@chromium.org yhirano@chromium.org
 Issue 789089  has been merged into this issue.
 Issue 788491  has been merged into this issue.
Cc: rb...@igalia.com domfarolino@gmail.com domenic@chromium.org
 Issue 792880  has been merged into this issue.
Summary: Align MIME parsing with WHATWG spec at https://mimesniff.spec.whatwg.org/ (was: Align MIME parsing with W3C spec at https://mimesniff.spec.whatwg.org/)
Oops, sorry about that.  Pretty big mistake there.
So Content-Type parsing is discussed in https://github.com/whatwg/mimesniff/issues/30 and is effectively a superset of MIME type parsing, but we haven't quite reached a conclusion there yet, right? If 5-7 make Chrome closer to other browsers that seems good though. Just trying to make sure I'm not missing anything here.
That is the set of changes that should be sufficient to pass all WPT mimesniff/mime-types/charset-parameter.window-expected.html tests. 5 isn't needed for that, actually, though it may be tested elsewhere.  There may be other MIME sniffing differences, but I believe that seems a bit tangential to parsing Content-Types.  Also may be other Content-Type parsing differences that don't have tests, though looking through the spec, I don't see any.
To be clear, there are also parsing tests in mimesniff/mime-types/parsing.window.html . Not sure how those fit into the above.
You mean parsing.any?  Those look to be asserting on the "type" field of blob objects.  No idea where that code is - someone who knows blobs would need to look at that - I think whatever massaging we're asserting on there is done by renderer code, rather than network code.
Hmm yeah I guess that's all that's tested there. Hopefully they reuse the same parser though? Probably a separate bug to work on that.
I assume they use the same parser, but it looks like the code is massaging the data in some way - net's parser just outputs content-type and boundary values, while the test seems to be looking at the entire content-type header field, which does not come from net's parser.
Erm....sorry, charset and boundary values, rather.
@comment 12, mmenke blink/the renderer process has its own MIME type parsers, there is a bug for that. Maybe the test is going through a code path in blink you are not expecting?
That's entirely possible.  I'm not at all familiar with the renderer side of things.
Perhaps I merged the bug assigned to you to soon.  I assumed it was for the same code I was planning on working on, but if it's for the blink-side logic, perhaps we should re-open, and you can work on that?
Sure I can take a look at the renderer side of things.
Cc: asanka@chromium.org
Hrm...Actually, I guess that using the last Content-Type header is actually correct, per the spec "Set supplied-type to the value of the last Content-Type header associated with the resource."  So we should use the first charset in the last Content-Type header?
What test is testing multiple Content-Type headers/values? https://github.com/whatwg/mimesniff/issues/30#issuecomment-382721437 is the latest I have on Content-Type parsing (tests at https://github.com/web-platform-tests/wpt/pull/10525) and that isn't finalized nor integrated into any standard yet (I was thinking of maybe putting it in Fetch).
(And we likely need two Content-Type header parsers, with one simply using the MIME type parser on the combined value for added strictness, or failing on commas or some such. In particular this is needed for request Content-Type headers due to CORS.)
I was just reading the spec, and hadn't realize that the relevant changes hadn't been finalized yet.  My list of differences above was based on my understanding of the spec, rather than actually reading the spec or WHATWG 
tests.
I did land a CL to do 1) through 4), over a week back, but forgot to link this bug on it (https://chromium-review.googlesource.com/c/chromium/src/+/1257443).

Sign in to add a comment