New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 651538 link

Starred by 12 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

HTTP/2 resource sequencing causes HOL blocking for images and async scripts

Project Member Reported by pmeenan@chromium.org, Sep 29 2016

Issue description

I didn't pay too much attention when the H2 stream dependencies were added but it looks like Chrome is setting the exclusive bit on every request and building dependency chains that are reducing the effectiveness of prioritization.

Specifically, with the exclusive bit set, only one resource can download at a time.  If you get the sequencing right, this makes sense for CSS and blocking JS but it can be bad for things like images and async scripts.  It also doesn't behave well when higher priority resources are added after lower priority resources (causing the high priority resources to block until some of the queued low priority/exclusive requests have completed).

I set up a test page that basically has the following structure:

<head>
    external css (a)
    async js (b)
    defered js (c)
    blocking script (d) that document.writes another script (e)
    blocking script (f)
</head>
<body>
    11 progressive JPEG's
    blocking script (g)
</body>

HTTP/1.1
Page: http://test.patrickmeenan.com/priorities/index.html
WebPageTest: https://www.webpagetest.org/result/160929_Z7_1b6765c67b8534819e4b23438be34f77/


HTTP/2
Page: https://www.webpagetest.org/test/priorities/index.html
WebPageTest: https://www.webpagetest.org/result/160929_ZJ_d5b8777020444f1ab27c7bfeacc12e93/


WebPageTest Filmstrip comparison: https://www.webpagetest.org/video/compare.php?tests=160929_Z7_1b6765c67b8534819e4b23438be34f77,160929_ZJ_d5b8777020444f1ab27c7bfeacc12e93

The HTTP/2 version is served from Fastly using h2o.

In the case of the HTTP/1.1 server, Chrome holds the images back until it reaches the body but it also issues (and receives) the written script (e) early.  The JPEG images are all downloaded concurrently and you get the benefit of the progressive render of the images.

In the case of the HTTP/2 server, Most of the requests are issued immediately because the preload scanner discovered the resources and the document.written script (e) is requested as the last resource.

All resources are requested with the exclusive bit.

The resources are requested in the order discovered, depending on the most recent equal or higher priority resource:

The css file (a) is requested first (stream 3, VeryHigh priority, depends on 0, weight 256)
The async js file (b) depends on the css file (a) (stream 5, depends on 3, Low priority, weight 147)
The defer js file (c) depends on the async js file (b) (stream 7, depends on 5, Low priority, weight 147)
The blocking js file that has the docwrite code (d) depends on the css file (a) (stream 9, depends on 3, High priority, weight 220)
The blocking js file in the head (f) depends on the script right before it (d) (stream 11, depends on 9, High priority, weight 220)

The 11 images are requested next, the first one depends on the defer js (c) and each subsequent one depends on the one right before it (all are low priority, weight 147).

The end of body js (g) depends on the script at the end of the head (f) (stream 35, depends on 11, Medium priority, 183 weight)

The script written by document.write (e) is requested last after (d) completes and everything after it is in-flight. It doesn't depend on anything since all higher priority requests already completed (Stream 37, High priority, 220 weight).

It looks like 6 of the images are delivered sequentially  (probably in-flight) before the scripts can be injected which causes a 2 second delay in getting the document.written script (while it is blocked in the head).

I think that is all working as intended and the main pain is coming from the fact that Chrome doesn't delay requesting the low priority resources in the H2 case like it does for H1 while it is still in the head.

What isn't intended is that the images all download serially and we lose the ability to progressively render images concurrently.

To handle this case, I think what we actually need is phantom parent streams like Firefox uses as a container/parent for each different priority grouping.  They can be marked as exclusive of each other and requests with VeryHigh and High priorities can still be chained after each other (with exclusive bits set) but Medium, Low and VeryLow resources should be allowed to run in parallel with each other (not dependency chained and no exclusive bit).

To fix the HOL blocking on the wire from low-priority resources (as well as help with cross-connection prioritization) we should apply the resource scheduler hold-back logic even in the case of H2 connections.
 
Cc: b...@chromium.org
[+bnc]

A couple of points:
* Nit: IIUC, the exclusive bit is just syntactic sugar; the issue here is that we're (intentionally on my part) creating a degenerate linear tree in which each request has a single ancestor and a single descendant.

* I think in most cases within a priority band, FIFO order is actually a good thing, because for most resources, they aren't really usable until they arrive, so having one resource arrive fast and another slow is better than having them both arrive slow (what would happen if they're requested in parallel).  I think progressive images are an exception to this rule because they can be used when partially loaded.  

* Can you point me to a document describing Firefox' phantom parent stream model?  I know H2 allows phantom streams, but I don't think that can produce the tree you're describing, because there's no way to have a node in the tree dependent on a group of other nodes, just a single other node. 

* Given the above and the rest of your description, I would rephrase this bug as being that there are two problems: 
** If you send out low priority requests and then later a high priority request comes in, it won't be able to jump the queue because the bytes for the low priority requests are already in flight (though this depends on BDP and latency; once an H2 server finds out about the high priority request it can totally switch over to the HP request).  This happens on HTTP/1.1 connections because of throttling, but we don't do throttling on HTTP/2 connections.
** Progressive images should be loaded in some fashion that allows all in-viewport images to have low-resolution versions painted before the high resolution portions of the image are downloaded.

Would you agree with that rephrasing?  I wonder if we should split the bug, since those two things seem fairly different to me.

* I'm reluctant to solve the progressive image issue in a way that makes more special case implementations in the network stack; maybe it's worth it, but it seems like very much a special case.  (Of course, the general way to solve it is to expand out the information the priority API expresses, which seems like it's very much the end of a long road which I'm not sure anyone currently has the energy to start out on, so maybe we should just take the special case hit.)  

* The inability to give higher priority to a highpri request that arrives after a lowpri request is a real issue, and one I've struggled with in the abstract for a while.  I have no objection to implementing throttling for H2 connections; I think it's also important for inter-H2 prioritization (and in fact the CLs I'm currently working on will do that, though I'm not certain in a way which will solve this issue).  But the "right" answer for this, IMO, is to throttle the number of outstanding connections based on network quality (BDP, available bandwidth) to make it more likely that, when it matters, the high pri request will actually arrive before the low pri request was dispatched. 

[Administrative note: I'm not just taking ownership of this because I consider this a large can of worms which we don't yet have consensus on how to deal with, and I could image parts of this falling into other people's laps.  Having said that, yeah, my change caused these problems, and I acknowledge responsibility on that basis.]

Cc: csharrison@chromium.org
Actually, +csharrison, just FHI.

Comment 3 by b...@chromium.org, Sep 30 2016

Some information about Firefox's priority model, at least the early implementation: http://bitsup.blogspot.com/2015/01/http2-dependency-priorities-in-firefox.html.

I agree with everything in #1.
Summary: HTTP/2 resource sequencing causes HOL blocking for images and async scripts (was: HTTP/2 exclusive flag cause HOL blocking for images)
Yeah, sorry - as I was writing up the issue and going through the cause it became clear there were two distinct issues.

I'll change the title of this one to be clear that the resource sequencing causes HOL issues for images and async JS and open a separate issue for the "delaying low priority requests while in the head" issue.

FIFO makes sense for things that have an explicit dependency/blocking behavior (css, blocking scripts, defer scripts) but it's less clear for other resources (not just projressive JPEGs).

You don't necessarily want one large image on the page to cause everything else to be serialized behind it.  Lots of small images loading concurrently with the large one (say, navigation icons, etc) would be a great case for using weights and bandwidth splitting.

I'm a little less sure about async scrips.  In theory having scripts of different sizes loading concurrently would give small scripts the chance to complete earlier and execute without blocking on the larger script but serializing all scripts might make more sense regardless since it will let you spread the parse/execution of earlier scripts during the download of later ones instead of bunching the parses all together after they all complete from sharing bandwidth.
Resource scheculing issue opened: https://crbug.com/652321

Comment 6 by b...@chromium.org, Oct 3 2016

In Chromium we use buckets for priorities, and HTTP/2 uses dependencies.  Dependencies allow requests in the lowest priority bucket to be unordered, but not in any of the higher priority buckets, because each stream can only depend on at most one other stream.  This is the fundamental problem, as already described in #1.

My impression is that images are all in the incorrectly named LOWEST priority bucket, but there is one other lower bucket: IDLE.

In this framework, there are two ways to make images be at the same level (resulting in parallel loading) instead of being chained up as they are right now (resulting in undesirable serial loading):
(1) Ignore the difference between IDLE and LOWEST buckets at the HTTP/2 level, make all IDLE and LOWEST priority request be at the same level, all dependent on the last LOW priority request; or
(2) Make images IDLE, make all IDLE priority requests be at the same level, all dependent on the last LOWEST priority request; leave LOWEST priority streams chained up as they are right now.

I do not know whether either of these would be feasible, and if either of these changes would adversely affect performance.  Also, the question whether this special-casing does belong to the network stack, as already brought up in #1, still remains.
Re c#4: Flesh this paragraph out a little bit?  I'm finding myself still disagreeing with you at this level :-}:

> You don't necessarily want one large image on the page to cause everything else to be serialized behind it.  Lots of small images loading concurrently with the large one (say, navigation icons, etc) would be a great case for using weights and bandwidth splitting.

Assuming non-progressive, why is weights and bandwidth splitting better than loading the small images in sequence first?  

I'm making the assumption that, if non-progressive, no images are useful until they're fully loaded, in which case I'd rather get *something* loaded than nothing.
I'm thinking of a case where you have something like a hero image (200KB for example) and a bunch of UI elements that are also graphics (say css backgrounds on buttons, say 1KB each for example).

CSS backgrounds wouldn't be discovered until layout happens which would put them behind in-page images in the queue.

If they load in parallel, the navigation icons would be visible and useful MUCH earlier than if they had to wait for the hero image to finish loading.

I expect it wouldn't be hard to fabricate both cases and find lots of real-world examples.

FWIW, moving to sequenced loading is an explicit change from how H1 behaves (at least beyond 6 concurrent downloads on a single host).

The explicit sequencing itself feels like a fair bit of special casing in the networking stack.  Long-term it really feels like we need to be able to plumb this kind of logic through the external interface and let the renderer (or other clients) specify the sequencing/concurrency bits.
So my response to that case is "But then we should just (sic) have the small images jump the queue to become higher priority than the large image".  

Of course, there are two obvious problems with that: a) We don't currently have a way of specifying that for two different requests that are in the same RequestPriority bucket (though I've had fantasies of overloading SetPriority(<requests current priority>) to mean "Please jump this to the head of the FIFO queue"), and b) once a request has been dispatched, we have very little ability to tell the server to reprioritize it (we're BDP limited, basically).  

I'll point out that we have problem (b) in the "same bucket with weights" model too.  I'm not sure whether I consider my suggested solution for problem (a) a reasonable short-term one or not, though it wouldn't be too hard to add a new interface to do the same thing if we didn't like the semantics.

Just because I think it's relevant to this discussion, a quick summary of the Firefox approach (IIUC): It uses a combination of dependencies and weights to deal with the issue bnc@ calls out in c#6.  The phantom streams have a specific dependency relationship (Root -> {Leaders -> Followers, Unblocked, Background -> Speculative}), but since that phantom relationship will still results in effectively peer relationships between elements of, e.g., the real children of the Leaders and Followers phantom streams, the real children are given default weights, s.t. the large majority of the resources will go to the Leaders while any Leaders exist, and only to the followers after that.  

It's an interesting model; specifically, it's an interesting way to address the shortcomings (IMO) of the HTTP/2 spec called out by bnc@ in c#6.  I'd still rather explore some more absolute stack rank (as I'm driving towards in my comments above) but it might be a way to handle those cases where a stack rank isn't appropriate but we still want to prioritize groups of requests.

For reference, since I just looked up this code yesterday for unrelated reasons, here's the Firefox code implementing the scheme that Randy mentioned (but with "Unblocked" renamed "Other"):
https://dxr.mozilla.org/mozilla-central/rev/9baec74b3db1bf005c66ae2f50bafbdb02c3be38/netwerk/protocol/http/Http2Session.cpp#915
https://dxr.mozilla.org/mozilla-central/rev/9baec74b3db1bf005c66ae2f50bafbdb02c3be38/netwerk/protocol/http/Http2Stream.cpp#1173

Hello,

Thank you for looking into this. I wanted to bring another
aspect to your attention. As Kazuho Oku pointed out here:
https://lists.w3.org/Archives/Public/ietf-http-wg/2016JulSep/0597.html
because the requirements about keeping track of closed streams in
the priority tree are vague, it's possible that stream dependencies
reference streams that the server has ceased tracking.

Those streams end up depending on 0, resulting in widely different
views of the priority tree by the client and server.

Frederik
fdeweerdt@, your concern is valid but it's a separate issue. That race is also pointed out explicitly by the H2 spec:
http://httpwg.org/specs/rfc7540.html#priority-gc

The spec says: "To avoid these problems, an endpoint SHOULD retain stream prioritization state for a period after streams become closed." I don't think you can solve this issue without a spec change. That said, I've been doing some experiments and found that a simple server-side heuristic of keeping the last 5-10 closed nodes in the tree seems to make this problem disappear.
tombergan@ thanks, that makes sense. I do agree that a spec change would address the concern. I'm raising this point because in the context of the existing spec, relying on a phantom priority tree is more robust than chaining dependencies.
> I'm raising this point because in the context of the existing spec, relying on a phantom priority tree is more robust than chaining dependencies.

I don't think that's true. If I understand the Firefox implementation correctly (see the links above), Firefox's Leader/Background/etc. priority nodes are phantom streams that are never actually opened. Firefox is making an assumption that servers will track priorities for phantom streams. A closed stream is essentially a phantom stream, so Firefox has the same problem -- there is no guarantee that the server will track those phantom streams indefinitely.

It may be that some servers have custom logic for handling Firefox's phantom streams (and if you know of any such servers, I'd be interested in seeing links to code), but in that case, they could just as easily add custom logic for handling Chrome's chained dependencies. As mentioned above, a simple server-side heuristic of keeping the last 5-10 closed streams seems to work well for Chrome clients.
> It may be that some servers have custom logic for handling Firefox's phantom streams (and if you know of any such servers, I'd be interested in seeing links to code)

h2o creates a stream structure and adds it to it's internal scheduling tree when receiving a PRIORITY frame: https://github.com/h2o/h2o/blob/master/lib/http2/connection.c#L624

I'm less familiar with nginx, but it appears to do the same: https://github.com/nginx/nginx/blob/f8a9d528df92c7634088e575e5c3d63a1d4ab8ea/src/http/v2/ngx_http_v2.c#L1826


Cc: tombergan@chromium.org
I found an interesting page that shows both good and bad behavior with sequential image downloading. I've been running experiments to compare the performance of Spdy-style priorities and H2-style priorities. Turns out I can do this since Chrome sends the old Spdy-style priority as an H2 weight -- this means I can write one scheduler that looks at the weight only (Spdy-style) and another scheduler that looks at dependencies only (H2-style). The Spdy-style scheduler interleaves responses at the same priority level in a random way. My test pages are loaded into a replay server, then served with both the Spdy and H2 schedulers over 3GFast on WPT.

Here's the interesting page:
http://m.interia.pl/

WPT video with spdy-style scheduling:
https://www.webpagetest.org/video/compare.php?tests=161007_GY_f5995e463e585d2944ba90c86cee5852-r%3A1-c%3A0&thumbSize=200&ival=100&end=visual

WPT video with H2-style scheduling:
https://www.webpagetest.org/video/compare.php?tests=161007_4A_083e9768d48791143fa9f78628791d56-r%3A1-c%3A0&thumbSize=200&ival=100&end=visual

You can instantly notice that images are serialized with H2 and randomized/interleaved with Spdy.

How H2 is better: the main image in the page (linked below, visible under the video play arrow icon) loads more quickly with H2 than Spdy. With H2, this image is fully loaded by the first paint at 6.1s (the image actually loads in under 4s), while with Spdy, the image is only partially loaded at 6.1s and not fully-loaded until 14s.
http://e5.pudelek.pl/728c029587b900f7bf07fd98664b13fac3faaed3_375x220

How Spdy is better: the banner image at the top of the page (linked below) is built from sprites. In both cases, this sprite is requested about 5s into the page load. However, with Spdy, the sprite has finished downloading and is displayed by 13.3s, while with H2, the sprite is not downloaded until 21s into the pageload because it's blocked by previously-requested images.
http://www.pudelek.pl/img/min/mobile/sprites-s9204825150.png

I think this is a nice demonstration that deterministic, sequential ordering works great when you know the right order. If you don't know the right order, then the deterministic ordering can be deterministically bad, while the randomized, interleaved ordering performs better on average.

Comment 18 by pasko@chromium.org, Oct 25 2016

Cc: pasko@chromium.org

Comment 19 by y...@yoav.ws, Nov 16 2016

Cc: y...@yoav.ws
I strongly agree with Pat that strict in-order sending of each resource in its entirety is not the right way to go for resources that are progressively rendered. That is not only progressive images, but all images as well as any HTML resources. I also agree with Pat that these should not be special-cased in the network stack, but the "progressiveness" of the resource should be piped through from the renderer.

Also worth noting is that the introduction of link preload breaks the (slightly too optimistic, IMO) assumption that order of discovery is an indication for order of importance. Authors can now add late discovered resources to the document's head (e.g. an important bg image or font), and rely on proper priorities to make sure those resources are not sent before more important ones (e.g. blocking CSS and JS).


Labels: -Performance Performance-Loading Performance-Browser
Cc: -rdsmith@chromium.org

Sign in to add a comment