HTTP/2 resource sequencing causes HOL blocking for images and async scripts |
||||||||
Issue description
I didn't pay too much attention when the H2 stream dependencies were added but it looks like Chrome is setting the exclusive bit on every request and building dependency chains that are reducing the effectiveness of prioritization.
Specifically, with the exclusive bit set, only one resource can download at a time. If you get the sequencing right, this makes sense for CSS and blocking JS but it can be bad for things like images and async scripts. It also doesn't behave well when higher priority resources are added after lower priority resources (causing the high priority resources to block until some of the queued low priority/exclusive requests have completed).
I set up a test page that basically has the following structure:
<head>
external css (a)
async js (b)
defered js (c)
blocking script (d) that document.writes another script (e)
blocking script (f)
</head>
<body>
11 progressive JPEG's
blocking script (g)
</body>
HTTP/1.1
Page: http://test.patrickmeenan.com/priorities/index.html
WebPageTest: https://www.webpagetest.org/result/160929_Z7_1b6765c67b8534819e4b23438be34f77/
HTTP/2
Page: https://www.webpagetest.org/test/priorities/index.html
WebPageTest: https://www.webpagetest.org/result/160929_ZJ_d5b8777020444f1ab27c7bfeacc12e93/
WebPageTest Filmstrip comparison: https://www.webpagetest.org/video/compare.php?tests=160929_Z7_1b6765c67b8534819e4b23438be34f77,160929_ZJ_d5b8777020444f1ab27c7bfeacc12e93
The HTTP/2 version is served from Fastly using h2o.
In the case of the HTTP/1.1 server, Chrome holds the images back until it reaches the body but it also issues (and receives) the written script (e) early. The JPEG images are all downloaded concurrently and you get the benefit of the progressive render of the images.
In the case of the HTTP/2 server, Most of the requests are issued immediately because the preload scanner discovered the resources and the document.written script (e) is requested as the last resource.
All resources are requested with the exclusive bit.
The resources are requested in the order discovered, depending on the most recent equal or higher priority resource:
The css file (a) is requested first (stream 3, VeryHigh priority, depends on 0, weight 256)
The async js file (b) depends on the css file (a) (stream 5, depends on 3, Low priority, weight 147)
The defer js file (c) depends on the async js file (b) (stream 7, depends on 5, Low priority, weight 147)
The blocking js file that has the docwrite code (d) depends on the css file (a) (stream 9, depends on 3, High priority, weight 220)
The blocking js file in the head (f) depends on the script right before it (d) (stream 11, depends on 9, High priority, weight 220)
The 11 images are requested next, the first one depends on the defer js (c) and each subsequent one depends on the one right before it (all are low priority, weight 147).
The end of body js (g) depends on the script at the end of the head (f) (stream 35, depends on 11, Medium priority, 183 weight)
The script written by document.write (e) is requested last after (d) completes and everything after it is in-flight. It doesn't depend on anything since all higher priority requests already completed (Stream 37, High priority, 220 weight).
It looks like 6 of the images are delivered sequentially (probably in-flight) before the scripts can be injected which causes a 2 second delay in getting the document.written script (while it is blocked in the head).
I think that is all working as intended and the main pain is coming from the fact that Chrome doesn't delay requesting the low priority resources in the H2 case like it does for H1 while it is still in the head.
What isn't intended is that the images all download serially and we lose the ability to progressively render images concurrently.
To handle this case, I think what we actually need is phantom parent streams like Firefox uses as a container/parent for each different priority grouping. They can be marked as exclusive of each other and requests with VeryHigh and High priorities can still be chained after each other (with exclusive bits set) but Medium, Low and VeryLow resources should be allowed to run in parallel with each other (not dependency chained and no exclusive bit).
To fix the HOL blocking on the wire from low-priority resources (as well as help with cross-connection prioritization) we should apply the resource scheduler hold-back logic even in the case of H2 connections.
,
Sep 30 2016
Actually, +csharrison, just FHI.
,
Sep 30 2016
Some information about Firefox's priority model, at least the early implementation: http://bitsup.blogspot.com/2015/01/http2-dependency-priorities-in-firefox.html. I agree with everything in #1.
,
Oct 3 2016
Yeah, sorry - as I was writing up the issue and going through the cause it became clear there were two distinct issues. I'll change the title of this one to be clear that the resource sequencing causes HOL issues for images and async JS and open a separate issue for the "delaying low priority requests while in the head" issue. FIFO makes sense for things that have an explicit dependency/blocking behavior (css, blocking scripts, defer scripts) but it's less clear for other resources (not just projressive JPEGs). You don't necessarily want one large image on the page to cause everything else to be serialized behind it. Lots of small images loading concurrently with the large one (say, navigation icons, etc) would be a great case for using weights and bandwidth splitting. I'm a little less sure about async scrips. In theory having scripts of different sizes loading concurrently would give small scripts the chance to complete earlier and execute without blocking on the larger script but serializing all scripts might make more sense regardless since it will let you spread the parse/execution of earlier scripts during the download of later ones instead of bunching the parses all together after they all complete from sharing bandwidth.
,
Oct 3 2016
Resource scheculing issue opened: https://crbug.com/652321
,
Oct 3 2016
In Chromium we use buckets for priorities, and HTTP/2 uses dependencies. Dependencies allow requests in the lowest priority bucket to be unordered, but not in any of the higher priority buckets, because each stream can only depend on at most one other stream. This is the fundamental problem, as already described in #1. My impression is that images are all in the incorrectly named LOWEST priority bucket, but there is one other lower bucket: IDLE. In this framework, there are two ways to make images be at the same level (resulting in parallel loading) instead of being chained up as they are right now (resulting in undesirable serial loading): (1) Ignore the difference between IDLE and LOWEST buckets at the HTTP/2 level, make all IDLE and LOWEST priority request be at the same level, all dependent on the last LOW priority request; or (2) Make images IDLE, make all IDLE priority requests be at the same level, all dependent on the last LOWEST priority request; leave LOWEST priority streams chained up as they are right now. I do not know whether either of these would be feasible, and if either of these changes would adversely affect performance. Also, the question whether this special-casing does belong to the network stack, as already brought up in #1, still remains.
,
Oct 4 2016
Re c#4: Flesh this paragraph out a little bit? I'm finding myself still disagreeing with you at this level :-}: > You don't necessarily want one large image on the page to cause everything else to be serialized behind it. Lots of small images loading concurrently with the large one (say, navigation icons, etc) would be a great case for using weights and bandwidth splitting. Assuming non-progressive, why is weights and bandwidth splitting better than loading the small images in sequence first? I'm making the assumption that, if non-progressive, no images are useful until they're fully loaded, in which case I'd rather get *something* loaded than nothing.
,
Oct 4 2016
I'm thinking of a case where you have something like a hero image (200KB for example) and a bunch of UI elements that are also graphics (say css backgrounds on buttons, say 1KB each for example). CSS backgrounds wouldn't be discovered until layout happens which would put them behind in-page images in the queue. If they load in parallel, the navigation icons would be visible and useful MUCH earlier than if they had to wait for the hero image to finish loading. I expect it wouldn't be hard to fabricate both cases and find lots of real-world examples. FWIW, moving to sequenced loading is an explicit change from how H1 behaves (at least beyond 6 concurrent downloads on a single host). The explicit sequencing itself feels like a fair bit of special casing in the networking stack. Long-term it really feels like we need to be able to plumb this kind of logic through the external interface and let the renderer (or other clients) specify the sequencing/concurrency bits.
,
Oct 4 2016
So my response to that case is "But then we should just (sic) have the small images jump the queue to become higher priority than the large image". Of course, there are two obvious problems with that: a) We don't currently have a way of specifying that for two different requests that are in the same RequestPriority bucket (though I've had fantasies of overloading SetPriority(<requests current priority>) to mean "Please jump this to the head of the FIFO queue"), and b) once a request has been dispatched, we have very little ability to tell the server to reprioritize it (we're BDP limited, basically). I'll point out that we have problem (b) in the "same bucket with weights" model too. I'm not sure whether I consider my suggested solution for problem (a) a reasonable short-term one or not, though it wouldn't be too hard to add a new interface to do the same thing if we didn't like the semantics.
,
Oct 4 2016
Just because I think it's relevant to this discussion, a quick summary of the Firefox approach (IIUC): It uses a combination of dependencies and weights to deal with the issue bnc@ calls out in c#6. The phantom streams have a specific dependency relationship (Root -> {Leaders -> Followers, Unblocked, Background -> Speculative}), but since that phantom relationship will still results in effectively peer relationships between elements of, e.g., the real children of the Leaders and Followers phantom streams, the real children are given default weights, s.t. the large majority of the resources will go to the Leaders while any Leaders exist, and only to the followers after that.
It's an interesting model; specifically, it's an interesting way to address the shortcomings (IMO) of the HTTP/2 spec called out by bnc@ in c#6. I'd still rather explore some more absolute stack rank (as I'm driving towards in my comments above) but it might be a way to handle those cases where a stack rank isn't appropriate but we still want to prioritize groups of requests.
,
Oct 4 2016
For reference, since I just looked up this code yesterday for unrelated reasons, here's the Firefox code implementing the scheme that Randy mentioned (but with "Unblocked" renamed "Other"): https://dxr.mozilla.org/mozilla-central/rev/9baec74b3db1bf005c66ae2f50bafbdb02c3be38/netwerk/protocol/http/Http2Session.cpp#915 https://dxr.mozilla.org/mozilla-central/rev/9baec74b3db1bf005c66ae2f50bafbdb02c3be38/netwerk/protocol/http/Http2Stream.cpp#1173
,
Oct 6 2016
Hello, Thank you for looking into this. I wanted to bring another aspect to your attention. As Kazuho Oku pointed out here: https://lists.w3.org/Archives/Public/ietf-http-wg/2016JulSep/0597.html because the requirements about keeping track of closed streams in the priority tree are vague, it's possible that stream dependencies reference streams that the server has ceased tracking. Those streams end up depending on 0, resulting in widely different views of the priority tree by the client and server. Frederik
,
Oct 6 2016
fdeweerdt@, your concern is valid but it's a separate issue. That race is also pointed out explicitly by the H2 spec: http://httpwg.org/specs/rfc7540.html#priority-gc The spec says: "To avoid these problems, an endpoint SHOULD retain stream prioritization state for a period after streams become closed." I don't think you can solve this issue without a spec change. That said, I've been doing some experiments and found that a simple server-side heuristic of keeping the last 5-10 closed nodes in the tree seems to make this problem disappear.
,
Oct 6 2016
tombergan@ thanks, that makes sense. I do agree that a spec change would address the concern. I'm raising this point because in the context of the existing spec, relying on a phantom priority tree is more robust than chaining dependencies.
,
Oct 6 2016
> I'm raising this point because in the context of the existing spec, relying on a phantom priority tree is more robust than chaining dependencies. I don't think that's true. If I understand the Firefox implementation correctly (see the links above), Firefox's Leader/Background/etc. priority nodes are phantom streams that are never actually opened. Firefox is making an assumption that servers will track priorities for phantom streams. A closed stream is essentially a phantom stream, so Firefox has the same problem -- there is no guarantee that the server will track those phantom streams indefinitely. It may be that some servers have custom logic for handling Firefox's phantom streams (and if you know of any such servers, I'd be interested in seeing links to code), but in that case, they could just as easily add custom logic for handling Chrome's chained dependencies. As mentioned above, a simple server-side heuristic of keeping the last 5-10 closed streams seems to work well for Chrome clients.
,
Oct 7 2016
> It may be that some servers have custom logic for handling Firefox's phantom streams (and if you know of any such servers, I'd be interested in seeing links to code) h2o creates a stream structure and adds it to it's internal scheduling tree when receiving a PRIORITY frame: https://github.com/h2o/h2o/blob/master/lib/http2/connection.c#L624 I'm less familiar with nginx, but it appears to do the same: https://github.com/nginx/nginx/blob/f8a9d528df92c7634088e575e5c3d63a1d4ab8ea/src/http/v2/ngx_http_v2.c#L1826
,
Oct 8 2016
I found an interesting page that shows both good and bad behavior with sequential image downloading. I've been running experiments to compare the performance of Spdy-style priorities and H2-style priorities. Turns out I can do this since Chrome sends the old Spdy-style priority as an H2 weight -- this means I can write one scheduler that looks at the weight only (Spdy-style) and another scheduler that looks at dependencies only (H2-style). The Spdy-style scheduler interleaves responses at the same priority level in a random way. My test pages are loaded into a replay server, then served with both the Spdy and H2 schedulers over 3GFast on WPT. Here's the interesting page: http://m.interia.pl/ WPT video with spdy-style scheduling: https://www.webpagetest.org/video/compare.php?tests=161007_GY_f5995e463e585d2944ba90c86cee5852-r%3A1-c%3A0&thumbSize=200&ival=100&end=visual WPT video with H2-style scheduling: https://www.webpagetest.org/video/compare.php?tests=161007_4A_083e9768d48791143fa9f78628791d56-r%3A1-c%3A0&thumbSize=200&ival=100&end=visual You can instantly notice that images are serialized with H2 and randomized/interleaved with Spdy. How H2 is better: the main image in the page (linked below, visible under the video play arrow icon) loads more quickly with H2 than Spdy. With H2, this image is fully loaded by the first paint at 6.1s (the image actually loads in under 4s), while with Spdy, the image is only partially loaded at 6.1s and not fully-loaded until 14s. http://e5.pudelek.pl/728c029587b900f7bf07fd98664b13fac3faaed3_375x220 How Spdy is better: the banner image at the top of the page (linked below) is built from sprites. In both cases, this sprite is requested about 5s into the page load. However, with Spdy, the sprite has finished downloading and is displayed by 13.3s, while with H2, the sprite is not downloaded until 21s into the pageload because it's blocked by previously-requested images. http://www.pudelek.pl/img/min/mobile/sprites-s9204825150.png I think this is a nice demonstration that deterministic, sequential ordering works great when you know the right order. If you don't know the right order, then the deterministic ordering can be deterministically bad, while the randomized, interleaved ordering performs better on average.
,
Oct 25 2016
,
Nov 16 2016
I strongly agree with Pat that strict in-order sending of each resource in its entirety is not the right way to go for resources that are progressively rendered. That is not only progressive images, but all images as well as any HTML resources. I also agree with Pat that these should not be special-cased in the network stack, but the "progressiveness" of the resource should be piped through from the renderer. Also worth noting is that the introduction of link preload breaks the (slightly too optimistic, IMO) assumption that order of discovery is an indication for order of importance. Authors can now add late discovered resources to the document's head (e.g. an important bg image or font), and rely on proper priorities to make sure those resources are not sent before more important ones (e.g. blocking CSS and JS).
,
May 1 2017
,
Feb 16 2018
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by rdsmith@chromium.org
, Sep 30 2016