Add Accept-Encoding: identity to Range requests |
||||||||
Issue descriptionSpec change: https://github.com/whatwg/fetch/pull/751 Tests: https://github.com/web-platform-tests/wpt/pull/11291 Turns out a significant number of servers ignore Range headers unless "Accept-Encoding: identity" is set. This spec change applies that header to all requests with a Range header. We already kinda do this for media requests, although the header is needlessly verbose, and is applied before the service worker, when it should be afterwards.
,
Jun 6 2018
Tentatively assigning yhirano as he is probably the best person for the job. yhirano: please reassign or unassign if appropriate.
,
Jun 6 2018
davidben@, should this be implemented at URLRequestHttpJob::AddExtraHeaders?
,
Jun 6 2018
That's definitely a mmenke question, but this spec change honestly makes no sense. The place we typically insert Range headers is deep in the HTTP stack, once we've decided to reassemble stuff at the cache. But it would not be at all okay to add such a header at that layer because the resource may vary on that header. In reassembly, it is invalid to touch headers that may change the resource you get. You *must* have committed to all headers *before* you find out you need ranges. (Fetch doesn't seem to say much on this subject apart from "Some implementations might support caching of partial content, as per HTTP Range Requests. [HTTP-RANGE] However, this is not widely supported by browser caches." That's just fine because this is an implementation detail of the HTTP stack and out of scope for any Fetch header-munging.) What you could mean is that if someone calls the fetch() API or equivalent web platform entry-point with a Range header, fetch() will magically add another header. This sounds like it would make the platform less predictable, so I would not recommend it, but if you want to do that, you can. Moreover, the PR suggests the motivation was media being treated differently, and I believe media's range requests in Chrome happen via the low-level HTTP cache... where this spec change would not work. So, I would recommend just reverting that change and solving your problem a different way. That change is not consistent with reality.
,
Jun 6 2018
+1 to this making no sense. Just to give a more detailed example, suppose we send a request with: Accept-Encoding: gzip, identity Then we get an a response with the identity encoding, and "Vary: Accept-Encoding" (Maybe the server likes brotli?), but the response fails after sending us some of the response body. If the cache layer wants to resume it, we'd have to send "Accept-encoding: identity", per spec, which seems weird and problematic, particularly if the consumer explicitly set a different accept-encoding header from the default one in the first place. Note that Chrome's media code actually sets Accept-Encoding: Identity already itself, so this would only affect other requests (Or media requests made via xhr / fetch instead of via media elements).
,
Jun 6 2018
To that end, Matt's example also applies to doing this transformation at XHR/Fetch entry points, so my "This sounds like it would make the platform less predictable, so I would not recommend it, but if you want to do that, you can." was not strong enough of a NACK. While it would be internally consistent for the browser, it would not be consistent for the caller. Are you even allowed to set Range requests in XHR/Fetch? That can't really work. Range applies before Content-Encoding, rather than after. That is, you are getting gzip(foo)[5:10], not gzip(foo[5:10]). This is so the cache can reassemble a gzipped response. But the web platform generally undoes Content-Encoding before sending it to the caller, and you can't uncompress gzip(foo)[5:10].
,
Jun 6 2018
Range request with a non-identity content-encoding is generally broken, and I doubt any browser even tries to support it.
,
Jun 6 2018
> Range request with a non-identity content-encoding is generally broken, and I doubt any browser even tries to support it. Right, so only making range requests with `Accept-Encoding: identity` makes sense? Or is there a case where Chrome can make a request with `Accept-Encoding: gzip, identity` followed by a range request for the same resource that has the same value for that `Accept-Encoding` header? The layering might be slightly different in implementations, but if the outcome is the same that seems fine...
,
Jun 6 2018
If we send a request with "Accept-Encoding: gzip, identity", and the request fails partway through, we may try to resume it with a range request (And will use the same Accept-Encoding when we do). Adding the cache label, as that's where all Chrome's range request logic lives.
,
Jun 6 2018
This works fine because all the range and HTTP cache logic happens before undoing content encoding. This matches how HTTP works. As far as HTTP is concerned, the resource *is* the gzipped body. The HTTP cache lives at this layer and may internally send ranged requests to the server in order to service an unranged caller request. As far as the Fetch and most URL request APIs are concerned, the body is the unencoded one, so at the public API layer, we undo the encoding. This happens after the cache has done any reassembly it wants. Of course, nothing stops someone from passing a Range header to our public APIs (at the net stack level; I don't know what's web-exposed) but, yes, in that case they had better tell the server not to gzip or it won't work very well.
,
Jun 6 2018
re: comment #9/#10: we actually stopped doing that because at least one server impl was getting it wrong. See https://chromium-review.googlesource.com/c/chromium/src/+/962346 and bug 820862
,
Jun 6 2018
Oh hah. Lovely. :-) > and I believe media's range requests in Chrome happen via the low-level HTTP cache... where this spec change would not work Sorry, this was totally false. Let me try this again: I believe the media code makes ranged requests, which go to the cache. If the cache can service those out of its (potentially partial), it does. Otherwise, it fetches what it can't service and updates the partial cache entry as needed. So the ranges made ultimately are a combination of the two. I suppose adding it to URLRequest-level range requests, but not HTTP cache ones (those being treated as out of scope of Fetch, Service Worker, and everything else), then isn't totally absurd. But it means that JS code which does: A: fetch(<no range>) => read body, but it gets truncated to 10 bytes due to network interruption B: fetch(bytes=10-) => concatenate it with what I got from A will result in different Accept-Encoding headers for A and B, so you may be piecing together gunzip(gzip(body1))[:10] with body2[10:]. But HTTP doesn't mandate that body1 and body2 are the same. For instance, body1 might be "Hello world with gzip encoding" and body2 might be "Hello world with no encoding". That's quite plausible for a diagnostic site for instance. Usually you'd conditionalize your range requests on some strong validator. Rather, if your application is going to do this, you want A to *also* get that A-E header. Though the media code apparently sends bytes=0- so I guess it's fine here. "If you are going to do range stuff at the Fetch layer, please send the no-op bytes=0- logic so we turn off gzip" is an answer, if kinda questionable for platform predictability... The other worry with different Accept-Encodings floating about is I doubt servers are reliably sending Vary: Accept-Encoding or splitting ETags as they are supposed to. (We can get metrics... if we ever see Content-Encoding without that Vary, we know the server messed up. Other cases we don't know. Maybe the server only does identity + something we don't support.) Given morlovich's comment that we now give up reassembling encoded responses, this perhaps doesn't matter as we'd never concatenate things together, merely return body2 when we should have fetched body1. Otherwise we might have to just assume every resource is Vary: Accept-Encoding to work around it. The media code uses a different HTTP cache (does it still do this?), so the media-only version of this header did not have this problem. Separately, the URLRequest filter (Content-Encoding) logic, we should probably check for a Content-Range header and if we see Content-Encoding + Content-Range, just reject the response rather than try decode something garbage. Especially if A-E: identity is jammed in there, this would always be a server bug anyway.
,
Jun 6 2018
Let me make my comment clearer: we only stopped doing gzip + range for interrupted transfers; things which get range explicitly from client still can try that, and will still get the piece-wise assembly, too. That might also not be a good idea in practice, bug 824697 is sort of like that, too, except the bug there of serving gzip to things w/o that accept-encoding was likely masking buggyness of gzip + range.
,
Jun 7 2018
I'm getting a bit lost in the back-and-forth. Can someone summarise if this spec change is still considered nonsense, and why? > Are you even allowed to set Range requests in XHR/Fetch? Yes, you can set request headers in both.
,
Jun 7 2018
Fwiw:
fetch(url, {
headers: {Range: 'bytes=1-80'}
});
…will fetch the resource and decode the content. This means the fetch will fail if the response is encoded (assuming the server is returning a range of the encoded resource), as the decoding fails.
,
Jun 7 2018
When resuming downloads, it seems like we send `Accept-Encoding: identity`.
,
Jun 13 2018
*nudge* Is this spec change still considered nonsense? It looks like maybe minds were changed through discussion, but I'm not sure.
,
Jun 13 2018
I don't think so - if we send accept-encoding gzip, and get an ungzipped response, I still don't think we should be sending accept-encoding identity to resume the response (As, per my earlier comment, perhaps the server supports brotli, and thus should be sending vary: accept encoding).
,
Jun 13 2018
We currently send accept-encoding: identity when we resume downloads, and when we fetch media. To clarify: you disagree with Chrome's current behaviour here? Is there an API that currently sends range requests, and correctly processes a partial, encoded response?
,
Jun 13 2018
No, there is not - but if we're resuming a partial response that was send unencoded, it's a spec violation for the server to send us the rest of the response with an encoding, no?
,
Jun 13 2018
It would be a spec violation for a server to send an encoded response if the request has A-E: indentity… but I don't think that's what you mean?
,
Jun 13 2018
Correct...However, it woudl be reasonable for a server that supports range requetss (using identity encoding) and brotli to response to an Accept-Encoding: gzip request with an identity encoding and a Vary: Accept-Encoding response. I am, admittedly, not up to date on the ins and outs of caching and strong validators and the like, but trying to resume in that case and sending a different Accept-Encoding seems problematic.
,
Jun 13 2018
Ahh ok, I think I get it now. Fwiw I don't think we should support resuming unless the resource has strong validators, but we don't do a good job here right now https://bugs.chromium.org/p/chromium/issues/detail?id=737617.
,
Jun 14 2018
Fwiw, all the range-capable APIs we currently have send A-E: identity for all requests.
The spec change is building on what Chrome already does. However, it also adds it for things like fetch() and XHR. This seems sensible, because if:
fetch(url, {
headers: { Range: 'bytes=1-' }
});
…returns a gzipped/brotli response, the fetch will *always* fail.
,
Oct 31
,
Nov 27
So I think that, as a first pass at this, we can just have URLRequestHttpJob send "Accept-Ending: identity" whenever the caller explicitly requests a range, and doesn't provide an Accept-Encoding header. [+aarontag]: Suggest this as a starter bug. Should be a simple CL, and it will need a test.
,
Nov 27
,
Nov 27
,
Nov 30
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6c652aa360559e54d32a97fdf44f46b31027b644 commit 6c652aa360559e54d32a97fdf44f46b31027b644 Author: Aaron Tagliaboschi <aarontag@chromium.org> Date: Fri Nov 30 22:13:41 2018 Fix Accepted-Encoding to "identity" whenever there's a range request Unit test for encoding advertisement on range requests. Simple logic to change accepted encoding to identity in a range request Bug: 849952 Change-Id: I436ac5e4db118faa4f362a0b8c1943b5fe7020ea Reviewed-on: https://chromium-review.googlesource.com/c/1352672 Reviewed-by: Matt Menke <mmenke@chromium.org> Commit-Queue: Aaron Tagliaboschi <aarontag@chromium.org> Cr-Commit-Position: refs/heads/master@{#612816} [modify] https://crrev.com/6c652aa360559e54d32a97fdf44f46b31027b644/net/url_request/url_request_http_job.cc [modify] https://crrev.com/6c652aa360559e54d32a97fdf44f46b31027b644/net/url_request/url_request_http_job_unittest.cc [delete] https://crrev.com/89035f09a029139da8d81f18c93989dfdc9ac701/third_party/blink/web_tests/external/wpt/fetch/range/general.any-expected.txt [delete] https://crrev.com/89035f09a029139da8d81f18c93989dfdc9ac701/third_party/blink/web_tests/external/wpt/fetch/range/general.any.worker-expected.txt |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by domfarolino@gmail.com
, Jun 6 2018