New issue
Advanced search Search tips

Issue 849952 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Compat



Sign in to add a comment

Add Accept-Encoding: identity to Range requests

Project Member Reported by jakearchibald@chromium.org, Jun 6 2018

Issue description

Spec change: https://github.com/whatwg/fetch/pull/751
Tests: https://github.com/web-platform-tests/wpt/pull/11291

Turns out a significant number of servers ignore Range headers unless "Accept-Encoding: identity" is set. This spec change applies that header to all requests with a Range header.

We already kinda do this for media requests, although the header is needlessly verbose, and is applied before the service worker, when it should be afterwards.
 
Cc: domfarolino@gmail.com

Comment 2 by ricea@chromium.org, Jun 6 2018

Labels: -Type-Bug Type-Compat
Owner: yhirano@chromium.org
Status: Assigned (was: Untriaged)
Tentatively assigning yhirano as he is probably the best person for the job.

yhirano: please reassign or unassign if appropriate.
Cc: mmenke@chromium.org davidben@chromium.org
davidben@, should this be implemented at URLRequestHttpJob::AddExtraHeaders?
That's definitely a mmenke question, but this spec change honestly makes no sense.

The place we typically insert Range headers is deep in the HTTP stack, once we've decided to reassemble stuff at the cache. But it would not be at all okay to add such a header at that layer because the resource may vary on that header. In reassembly, it is invalid to touch headers that may change the resource you get. You *must* have committed to all headers *before* you find out you need ranges.

(Fetch doesn't seem to say much on this subject apart from "Some implementations might support caching of partial content, as per HTTP Range Requests. [HTTP-RANGE] However, this is not widely supported by browser caches." That's just fine because this is an implementation detail of the HTTP stack and out of scope for any Fetch header-munging.)

What you could mean is that if someone calls the fetch() API or equivalent web platform entry-point with a Range header, fetch() will magically add another header. This sounds like it would make the platform less predictable, so I would not recommend it, but if you want to do that, you can. Moreover, the PR suggests the motivation was media being treated differently, and I believe media's range requests in Chrome happen via the low-level HTTP cache... where this spec change would not work.

So, I would recommend just reverting that change and solving your problem a different way. That change is not consistent with reality.
+1 to this making no sense.  Just to give a more detailed example, suppose we send a request with:

Accept-Encoding: gzip, identity

Then we get an a response with the identity encoding, and "Vary: Accept-Encoding" (Maybe the server likes brotli?), but the response fails after sending us some of the response body.

If the cache layer wants to resume it, we'd have to send "Accept-encoding: identity", per spec, which seems weird and problematic, particularly if the consumer explicitly set a different accept-encoding header from the default one in the first place.

Note that Chrome's media code actually sets Accept-Encoding: Identity already itself, so this would only affect other requests (Or media requests made via xhr / fetch instead of via media elements).
To that end, Matt's example also applies to doing this transformation at XHR/Fetch entry points, so my "This sounds like it would make the platform less predictable, so I would not recommend it, but if you want to do that, you can." was not strong enough of a NACK. While it would be internally consistent for the browser, it would not be consistent for the caller.

Are you even allowed to set Range requests in XHR/Fetch? That can't really work. Range applies before Content-Encoding, rather than after. That is, you are getting gzip(foo)[5:10], not gzip(foo[5:10]). This is so the cache can reassemble a gzipped response. But the web platform generally undoes Content-Encoding before sending it to the caller, and you can't uncompress gzip(foo)[5:10].
Range request with a non-identity content-encoding is generally broken, and I doubt any browser even tries to support it.
> Range request with a non-identity content-encoding is generally broken, and I doubt any browser even tries to support it.

Right, so only making range requests with `Accept-Encoding: identity` makes sense? Or is there a case where Chrome can make a request with `Accept-Encoding: gzip, identity` followed by a range request for the same resource that has the same value for that `Accept-Encoding` header?

The layering might be slightly different in implementations, but if the outcome is the same that seems fine...
Components: Internals>Network>Cache
If we send a request with "Accept-Encoding: gzip, identity", and the request fails partway through, we may try to resume it with a range request (And will use the same Accept-Encoding when we do).

Adding the cache label, as that's where all Chrome's range request logic lives.
This works fine because all the range and HTTP cache logic happens before undoing content encoding. This matches how HTTP works. As far as HTTP is concerned, the resource *is* the gzipped body. The HTTP cache lives at this layer and may internally send ranged requests to the server in order to service an unranged caller request.

As far as the Fetch and most URL request APIs are concerned, the body is the unencoded one, so at the public API layer, we undo the encoding. This happens after the cache has done any reassembly it wants.

Of course, nothing stops someone from passing a Range header to our public APIs (at the net stack level; I don't know what's web-exposed) but, yes, in that case they had better tell the server not to gzip or it won't work very well.
re: comment #9/#10: we actually stopped doing that because at least one server impl was getting it wrong.

See https://chromium-review.googlesource.com/c/chromium/src/+/962346 and  bug 820862 
Oh hah. Lovely. :-)

> and I believe media's range requests in Chrome happen via the low-level HTTP cache... where this spec change would not work

Sorry, this was totally false. Let me try this again: I believe the media code makes ranged requests, which go to the cache. If the cache can service those out of its (potentially partial), it does. Otherwise, it fetches what it can't service and updates the partial cache entry as needed. So the ranges made ultimately are a combination of the two.

I suppose adding it to URLRequest-level range requests, but not HTTP cache ones (those being treated as out of scope of Fetch, Service Worker, and everything else), then isn't totally absurd. But it means that JS code which does:

A: fetch(<no range>) => read body, but it gets truncated to 10 bytes due to network interruption
B: fetch(bytes=10-) => concatenate it with what I got from A

will result in different Accept-Encoding headers for A and B, so you may be piecing together gunzip(gzip(body1))[:10] with body2[10:]. But HTTP doesn't mandate that body1 and body2 are the same. For instance, body1 might be "Hello world with gzip encoding" and body2 might be "Hello world with no encoding". That's quite plausible for a diagnostic site for instance. Usually you'd conditionalize your range requests on some strong validator.

Rather, if your application is going to do this, you want A to *also* get that A-E header. Though the media code apparently sends bytes=0- so I guess it's fine here. "If you are going to do range stuff at the Fetch layer, please send the no-op bytes=0- logic so we turn off gzip" is an answer, if kinda questionable for platform predictability...

The other worry with different Accept-Encodings floating about is I doubt servers are reliably sending Vary: Accept-Encoding or splitting ETags as they are supposed to. (We can get metrics... if we ever see Content-Encoding without that Vary, we know the server messed up. Other cases we don't know. Maybe the server only does identity + something we don't support.)

Given morlovich's comment that we now give up reassembling encoded responses, this perhaps doesn't matter as we'd never concatenate things together, merely return body2 when we should have fetched body1. Otherwise we might have to just assume every resource is Vary: Accept-Encoding to work around it. The media code uses a different HTTP cache (does it still do this?), so the media-only version of this header did not have this problem.

Separately, the URLRequest filter (Content-Encoding) logic, we should probably check for a Content-Range header and if we see Content-Encoding + Content-Range, just reject the response rather than try decode something garbage. Especially if A-E: identity is jammed in there, this would always be a server bug anyway.
Let me make my comment clearer: we only stopped doing gzip + range for interrupted transfers; things which get range explicitly from client still can try that, and will still get the piece-wise assembly, too.

That might also not be a good idea in practice, bug 824697 is sort of like that, too, except the bug there of serving gzip to things w/o that accept-encoding was likely masking buggyness of gzip + range.

I'm getting a bit lost in the back-and-forth. Can someone summarise if this spec change is still considered nonsense, and why?

> Are you even allowed to set Range requests in XHR/Fetch?

Yes, you can set request headers in both.
Fwiw:

fetch(url, {
  headers: {Range: 'bytes=1-80'}
});

…will fetch the resource and decode the content. This means the fetch will fail if the response is encoded (assuming the server is returning a range of the encoded resource), as the decoding fails.
When resuming downloads, it seems like we send `Accept-Encoding: identity`.
*nudge* Is this spec change still considered nonsense? It looks like maybe minds were changed through discussion, but I'm not sure.
I don't think so - if we send accept-encoding gzip, and get an ungzipped response, I still don't think we should be sending accept-encoding identity to resume the response (As, per my earlier comment, perhaps the server supports brotli, and thus should be sending vary: accept encoding).
We currently send accept-encoding: identity when we resume downloads, and when we fetch media. To clarify: you disagree with Chrome's current behaviour here?

Is there an API that currently sends range requests, and correctly processes a partial, encoded response?
No, there is not - but if we're resuming a partial response that was send unencoded, it's a spec violation for the server to send us the rest of the response with an encoding, no?
It would be a spec violation for a server to send an encoded response if the request has A-E: indentity… but I don't think that's what you mean?
Correct...However, it woudl be reasonable for a server that supports range requetss (using identity encoding) and brotli to response to an Accept-Encoding: gzip request with an identity encoding and a Vary: Accept-Encoding response.  I am, admittedly, not up to date on the ins and outs of caching and strong validators and the like, but trying to resume in that case and sending a different Accept-Encoding seems problematic.
Ahh ok, I think I get it now. Fwiw I don't think we should support resuming unless the resource has strong validators, but we don't do a good job here right now https://bugs.chromium.org/p/chromium/issues/detail?id=737617.
Fwiw, all the range-capable APIs we currently have send A-E: identity for all requests. 

The spec change is building on what Chrome already does. However, it also adds it for things like fetch() and XHR. This seems sensible, because if:

fetch(url, {
  headers: { Range: 'bytes=1-' }
});

…returns a gzipped/brotli response, the fetch will *always* fail.
Owner: ----
Status: Available (was: Assigned)
Cc: aarontag@chromium.org
So I think that, as a first pass at this, we can just have URLRequestHttpJob send "Accept-Ending: identity" whenever the caller explicitly requests a range, and doesn't provide an Accept-Encoding header.

[+aarontag]:  Suggest this as a starter bug.  Should be a simple CL, and it will need a test.
Owner: aarontag@chromium.org
Status: Assigned (was: Available)
Project Member

Comment 29 by bugdroid1@chromium.org, Nov 30

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6c652aa360559e54d32a97fdf44f46b31027b644

commit 6c652aa360559e54d32a97fdf44f46b31027b644
Author: Aaron Tagliaboschi <aarontag@chromium.org>
Date: Fri Nov 30 22:13:41 2018

Fix Accepted-Encoding to "identity" whenever there's a range request

Unit test for encoding advertisement on range requests.


Simple logic to change accepted encoding to identity in a range request

Bug: 849952
Change-Id: I436ac5e4db118faa4f362a0b8c1943b5fe7020ea
Reviewed-on: https://chromium-review.googlesource.com/c/1352672
Reviewed-by: Matt Menke <mmenke@chromium.org>
Commit-Queue: Aaron Tagliaboschi <aarontag@chromium.org>
Cr-Commit-Position: refs/heads/master@{#612816}
[modify] https://crrev.com/6c652aa360559e54d32a97fdf44f46b31027b644/net/url_request/url_request_http_job.cc
[modify] https://crrev.com/6c652aa360559e54d32a97fdf44f46b31027b644/net/url_request/url_request_http_job_unittest.cc
[delete] https://crrev.com/89035f09a029139da8d81f18c93989dfdc9ac701/third_party/blink/web_tests/external/wpt/fetch/range/general.any-expected.txt
[delete] https://crrev.com/89035f09a029139da8d81f18c93989dfdc9ac701/third_party/blink/web_tests/external/wpt/fetch/range/general.any.worker-expected.txt

Sign in to add a comment