Align MIME parsing with WHATWG spec at https://mimesniff.spec.whatwg.org/ |
|||||
Issue description
Ways in which we deviate from the doc:
1) We don't Trim whitespace within quotes.
2) We ignore the first "(" in a charset, and everything after it.
3) We don't allow quoted empty strings as values (Unquoted ones we correctly disallow)
4) We take last value in a semi-colon separated list, instead of the first.
5) When we have multiple Content-Type lines, we take the last instead of the first.
6) We don't restrict the characters we allow in Content-Type fields.
7) We trim whitespace before the first quote in a Content-Type quoted string.
I plan to fix 1) through 4) in a single CL, and 5) in a followup. I'll put off 6) until the changes go to stable and seem to be working. I'm not entirely sure 7) is worth the potential compatibility issues.
,
Oct 4
Issue 789089 has been merged into this issue.
,
Oct 4
Issue 788491 has been merged into this issue.
,
Oct 4
Issue 792880 has been merged into this issue.
,
Oct 4
,
Oct 4
Oops, sorry about that. Pretty big mistake there.
,
Oct 5
So Content-Type parsing is discussed in https://github.com/whatwg/mimesniff/issues/30 and is effectively a superset of MIME type parsing, but we haven't quite reached a conclusion there yet, right? If 5-7 make Chrome closer to other browsers that seems good though. Just trying to make sure I'm not missing anything here.
,
Oct 5
That is the set of changes that should be sufficient to pass all WPT mimesniff/mime-types/charset-parameter.window-expected.html tests. 5 isn't needed for that, actually, though it may be tested elsewhere. There may be other MIME sniffing differences, but I believe that seems a bit tangential to parsing Content-Types. Also may be other Content-Type parsing differences that don't have tests, though looking through the spec, I don't see any.
,
Oct 5
To be clear, there are also parsing tests in mimesniff/mime-types/parsing.window.html . Not sure how those fit into the above.
,
Oct 5
You mean parsing.any? Those look to be asserting on the "type" field of blob objects. No idea where that code is - someone who knows blobs would need to look at that - I think whatever massaging we're asserting on there is done by renderer code, rather than network code.
,
Oct 5
Hmm yeah I guess that's all that's tested there. Hopefully they reuse the same parser though? Probably a separate bug to work on that.
,
Oct 5
I assume they use the same parser, but it looks like the code is massaging the data in some way - net's parser just outputs content-type and boundary values, while the test seems to be looking at the entire content-type header field, which does not come from net's parser.
,
Oct 5
Erm....sorry, charset and boundary values, rather.
,
Oct 5
@comment 12, mmenke blink/the renderer process has its own MIME type parsers, there is a bug for that. Maybe the test is going through a code path in blink you are not expecting?
,
Oct 5
That's entirely possible. I'm not at all familiar with the renderer side of things.
,
Oct 5
Perhaps I merged the bug assigned to you to soon. I assumed it was for the same code I was planning on working on, but if it's for the blink-side logic, perhaps we should re-open, and you can work on that?
,
Oct 5
Sure I can take a look at the renderer side of things.
,
Oct 8
Hrm...Actually, I guess that using the last Content-Type header is actually correct, per the spec "Set supplied-type to the value of the last Content-Type header associated with the resource." So we should use the first charset in the last Content-Type header?
,
Oct 10
What test is testing multiple Content-Type headers/values? https://github.com/whatwg/mimesniff/issues/30#issuecomment-382721437 is the latest I have on Content-Type parsing (tests at https://github.com/web-platform-tests/wpt/pull/10525) and that isn't finalized nor integrated into any standard yet (I was thinking of maybe putting it in Fetch).
,
Oct 10
(And we likely need two Content-Type header parsers, with one simply using the MIME type parser on the combined value for added strictness, or failing on commas or some such. In particular this is needed for request Content-Type headers due to CORS.)
,
Oct 10
I was just reading the spec, and hadn't realize that the relevant changes hadn't been finalized yet. My list of differences above was based on my understanding of the spec, rather than actually reading the spec or WHATWG tests.
,
Oct 15
I did land a CL to do 1) through 4), over a week back, but forgot to link this bug on it (https://chromium-review.googlesource.com/c/chromium/src/+/1257443). |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by mmenke@chromium.org
, Oct 4