New issue
Advanced search Search tips

Issue 900223 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Blink and Chrome handle parsing HTTP headers differently

Project Member Reported by mmenke@google.com, Oct 30

Issue description

Blink treats '\x0B', '\x0C', and '\x0D' as spaces (See StringImpl::StripWhiteSpace), while Chrome's network stack does not.  This results in inconsistent HTTP header handling, particularly for headers handled in both blink and net/.

For instance:  Blink will only allow text/plain responses to be treated as Javascript if there's no nosniff header.  So the network stack may sniff a response with an "X-Content-Type-Options: \x0Cnosniff\x0D" as text/plain, but then Blink's Javascript loader will decide that's a valid nosniff header, and refuse to treat the response as Javascript.  But for types where blink doesn't check the nosniff header (Like images?), the network stack's handling wins out.

Blink also has its own handling of caching directives, so its cache header handling likely does not match the network stack's, either.

third_party/blink/renderer/platform/network/http_parsers.cc probably doesn't cover all cases where this happens, but it does include at least some of them.

Fixing this is probably going to be a long term effort, since we have to be careful about breakage (Caching doesn't matter so much, since it doesn't break sites, and the on-disk cache is the authoritative cache.  nosniff is more concerning, however).
 
As discussed OOB I agree that fixing this would involve gracefully removing the additional characters from being treated as LWS. In particular it's troubling that \x0D (CR) is treated as part of a header rather than part of the boundary.
It skips '\x0A' as well.  I suspect CRs and LFs are already removed by that point, anyways.

Ideally, we'd just use a single parser, but given that blink uses its own string types everywhere, that's not terribly feasible, without some even more invasive changes.
(No CR or LF should be present, indeed. However, it does seem good to be clear about what can actually be removed, so wrote https://github.com/whatwg/fetch/pull/818/commits/3515d9bb72516098557e04a5d4131aad83f03c77 inspired by Asanka's comment.)
Cc: yhirano@chromium.org
Cc: kinuko@chromium.org
Talked with kinuko@. It is probably good to just convert strings and call net/ functions, though we are not completely sure about performance characteristics.

[yhirano, kinuko]:  One problem with that approach is that a lot of net/ header extraction methods work on the entire net::HttpResponseHeaders structure, as opposed to individual headers (Which breaks up individual headers around commas, and preserves the original separate header lines as well), while blink merges headers into a single line per header name.

I'd actually argue that *all* net methods should operate on an HttpResponseHeaders structure, to guarantee consistent behavior in the case there are multiple headers (Some places, for instance, only look at the first Content-Disposition header, while other locations look at all of them).

So that would imply we either need net::HttpResponseHeaders accessible in blink, or we need to preserve the entire unmodified response headers in blink, and create a new HttpResponseHeaders object as needed (Which is more expensive than just converting one line to an std::string, though we might do it rarely enough that that's fine).  I suppose another option would be to pre-extract everything blink cares about from the HttpResponseHeaders the renderer gets from net/.
Wouldn't it be better if everything operated the way Blink reportedly does? I.e., header parsers always operate on the combined value? (That way intermediaries that combine will only have a minimal impact (basically whitespace).)
That would mean every single header parser would have to invoke the split on comma logic redundantly, which I don't think scales.

Also, everything using HttpResponseHeaders would actually let us convert all consumers to have different behavior more easily (By modifying the parser HttpResponseHeaders users), instead of at a gazillion callsites.
So I've looked into some of the consumer code of response header handling in blink (i.e. the code that's accessing blink::ResourceResponse::http_header_fields_), and it didn't feel too bad imagining that we just use net::HttpResponseHeaders in blink. We'd need another peruse to turn it a real project but it could possibly be a plan I think.
Status: Available (was: Untriaged)

Sign in to add a comment