btw Kouhei mentioned today that Takashi's work on Loading Dispatcher (see bug 729953 ) might mean we don't need all the logic in content/browser/loader anymore. I'm not very familiar, so CCing a bunch of folks. Thanks!
Merging this code into Loading Dispatcher was one of the goal under the GRC, but as far as I know, recently Josh and Tarun are running field studies here to have better resource scheduling that uses network layer information, such as underlying protocol versions, H1 vs H2 or QUIC.
It would be great if we can implement everything in the render side as a part of Loading Dispatcher, but I'd hear Josh and Tarun's opinions and plans.
The field study I'm running doesn't use any more information than ResourceScheduler already had (e.g., is it H1 vs H2 vs QUIC). The ResourceScheduler has always skipped throttling for H2&QUIC requests. My study disabled the skipping part.
@jkarlin: I'm not sure I understand your reply. Are you saying you're ok with removing ResourceScheduler in favor of the renderer side Loading Dispatcher? FTR I'm also in favor of that approach, as it would simplify the browser and make this work with network service effort.
I'm responding to the first part of #4 by saying that my experiments aren't using any network layer information than ResourceScheduler already had. So if you want to move it to GRC you'll still need that information. I'm not sure what network information Tarun's work needs.
I'm not sure that I've formed an opinion on where ResourceScheduler should live.
My understanding is that not all ResourceScheduler's doing cannot be replaced with renderer-side thing, and there're things that are better moved into the network layer. (So ideally we should have network-layer thing that handles requests with global knowledge and renderer-side thing that does what need to be done with Web-side context)
+rdsmith@
My experiments definitely need the network layer information. Also, part of the resource scheduler has to be outside of renderer where the global information (e.g., information of all the in-flight and pending requests) is available.
I think my design @ https://docs.google.com/document/d/1p-XGz-BFFehs5j2olbkDkjrXzxzT3n45PVcLSZzOV3c/edit#heading=h.5scdxk3ekn4e is still relevant here, with appropriate abstraction. IMO, there are a couple of issues to engage with:
* ResourceScheduler, as currently written, is a hack; it's a collection of ad-hoc rules that have evolved over time that seem to work reasonably well. This both means that those rules aren't sacred if we feel a need to change them, and that there's a reasonably good chance that performance will regress if we change them blindly. (+pmeenan for commentary on this point.)
* If we just porting the functionality of ResourceScheduler as it currently is, the major blocker here is that several aspects of ResourceScheduler are per-tab, and OOPIF means we're moving away from having per-tab information pretty much anywhere. I could imagine architectures that would allow for per-tab information, but I've gotten push back in the past on those architectures from both jam@ and the OOPIF folks, and (pace' a holistic attack on priorities that includes use cases like deprioritizing background tabs) I don't really see a need.
* I do think that long-term we should try to move throttling mostly down into the network stack, as it has the most information as to what network and other resource usage conditions are. But I think that's separable from servicification.
* My belief is that we're planning to make network condition information available to the renderer, and that there aren't any *global* based throttles in ResourceScheduler, so I'd think that Tarun's experiments could be pulled back into the render.
Given all this, my recommendation would be that someone take on the task of tweaking the current ResourceScheduler so it does well enough (i.e. no performance regressions) without per-tab information, then pull it all into the renderer (presuming the information Tarun needs can be made available there).
And I'll continue to hope that at some point in the future we pick back up the global priority management problem :-}.
There are a couple of different pieces to what the scheduler does that make sense to split between the renderer and net stack.
1 - Holding back low priority requests until render-blocking resources have loaded. This part of the logic really belongs in the renderer and would clean up a lot of the state tracking that the net stack shouldn't have to care about.
2 - Throttling requests to no more than 10 concurrent low-priority requests at a time. This part belongs deep in the network stack and the higher levels shouldn't care. The net stack should be responsible for determining available bdp and making decisions on how best to fill the pipe with pending requests.
There are some side-effects with throttling right now that happen to help downstream but those should probably be scheduled explicitly. Specifically, the resource scheduler's throttling of requests helps keep the cache from contending low and high priority work and it reduces the amount of work the renderer has to do in processing cached low-priority resources early in the loading.
We also need to figure out a plan for the h2 bypass we have in place right now (which is what @jkarlin's experiment tried turning off). The resource holdback logic should be the same for #1 for h1 and h2 and if not it needs to be tuned.
Optimally we'd also be able to do conditional loads from the net stack. Something like "load this if available in the cache, otherwise pause it before sending it to the network" with the ability to resume them explicitly. Then the renderer holdback logic could load all of the cached resources when in holdback (and let service worker not be throttled) and only apply the actual holdback before hitting the wire. There will be regrressions unless we teach the rest of the renderer how to prioritize processing of responses though so cached low-priority JS parsing doesn't content with critical resources.
Since the perf numbers will be changing a lot anyway with the new arch I feel we probably anyway need to start with something different from what we currently have, which doesn't really fit super well with the new architecture (e.g. for the per-tab things Randy mentioned). Also there're a several interesting ideas like conditional loading we wanted to experiment, but in the first phase I'd like to focus on architectural issues first (i.e. which logic should live where).
Utilizing what Patrick summarized:
1. The first one, i.e. holding back low-prio one while render-blocking, could be and should be probably implemented in the renderer side. I think this one's a good candidate of what renderer side's ResourceLoadScheduler can do. (The implementation needs more than what we know at the platform layer, but that's more about impl details...)
2. It looks the second one could probably implemented in the Network Service side, maybe below the url_loader_impl.cc. I'd like to make this one relatively simple throttling mechanism without doing any context-dependent things (which should be done in the renderer side).
I think these two items can be possibly worked in parallel, and the first one can be possibly enabled without waiting for the full launch of Network Service- wdyt?
(I assume this isn't really blocking things yet)
c#12 SGTM as well. I'll note that there are a lot of options as to what layer #2 could be implemented at, and the lower it is in the network stack, the less it depends on progress on servicification.
I have some possibly relevant work in this space that I'm in the middle of reverting to fix a low frequency crasher, so whoever works on it should ping me and I'll give them a dump about it.
Another +1 for c#12
For #1 I initially thought we might have to do something special with frames and have them get notified of the parent document state but since they won't exist before a body exists the only possibility for render-blocking contention is with css that is still pending when the frame is created and the css would already be in-flight so I'm fairly certain it is safe to ignore and just treat frames and documents the same.
There is already logic in the renderer that tracks the state of pending render-blocking resources for the work to not block painting for in-body css so the throttling logic just needs to attach it's logic to that.
I'd love to see that be the first step because that would let us remove the notifications to net about the body being inserted and all of the logic to track layout-blocking resources.
For #2, the only thing that really needs to be implemented somewhere in net is the number of in-flight delayable requests to allow at one time across all connections. It was always kind of a hack that that was tracked on a per-tab basis and the scheduling of those could (and should) be smarter and global.
One issue that came up in the work I did is hanging gets. If the delayable request limit is per-tab, it's much less likely that a lot of hanging gets will fill it up, whereas with a global limit it's a worry. The way I dealt with that was not to throttle requests that had been alive for 5x the median lifetime of the the population (obviously with some time based decay).
TL;DR: There's some trickiness beyond the obvious to doing this in the net stack.
#15 "I'd love to see that be the first step because that would let us remove the notifications to net about the body being inserted"
Yeah removing that part is one of the things I'm expecting to see too.
#16: "One issue that came up in the work I did is hanging gets."
Is that some gets (almost) never hit the network when it gets many? Yeah sounds like we would need some starvation avoidance logic like you did before.
By the way Tokyo folks are interested in taking on the renderer-side work if no one else wants. I'm also willing to help fleshing out the design for the latter one, i.e. global throttling too, while it also should get a lot of help from rdsmith@ / pmeenan@.
c#17: I'm not sure I understand the question, but it's not that a lot of gets result in some gets never hitting the network, it's that sometimes pages use gets for bidirectional communication (client opens a connection, server sends information when it feels the urge) and such connections are long-lived but don't use a lot of bandwidth. Including them in a global outstanding request limit could fill up all slots for that limit without allowing any bandwidth to actually be used.
I am curious how does the logic for changing the priority of pending requests fit in this new architecture?
Also, a related question: What are the benefits of holding back the requests in two different places: the renderer and then later in the net? Is it possible to let the net do all the throttling work while renderer dynamically changes the priority of the requests based on its own knowledge (body tag found or first paint happened etc.)?
I can see one benefit of holding back requests only in the //net: If the device has super fast network, and lot of available memory/cpu, then there is probably no benefit in throttling low priority requests even if the renderer is in layout blocking phase.
Re #16: I think net stack is probably in a better place to decide if a request is hanging or not based on its estimates of RTT, bandwidth and congestion. So, I am not too worried about that.
In the //net, it is also possible to throttle requests based on packet loss or RTT based sophisticated congestion signals, instead of using request-count based congestion model. This would also help us get around the problem of hanging GETs.
I don't see a reason why this plan would affect reprioritization; can you share the issue you see in that space?
WRT your other point, throttling in //net requires clear communication of prioritization from all higher levels to //net. Finding a way to cleanly represent that prioritization over the space of all possible usage models is ... I guess I'll just go with "hard"; at least, I failed to manage it with quite a bit of trying. So my conclusion was that it made more sense to solve small, well defined subproblems in the priority space, with clear success criteria and see if it was possible to build up from those subproblems to the larger goal. In the context of network servicification, solving the general priority representation problem also counts as a little bit of a yak shave :-}. So I'm personally inclined to go with the limited approach sketched out above.
Having said all that, I want to be clear that I agree with all the points you raise and I'd love to see work done to push all throttling into //net. I just don't think that's a simple task. Also, any request that makes it to //net takes up resources that compete with other network activity, which need to be managed so that one renderer doesn't (accidentally or on purpose) interfere inappropriately with other network activity. Currently the amount of resources taken is sadly fairly high.
> I don't see a reason why this plan would affect reprioritization; can you share the issue you see in that space?
Ooh, I guess I was just not sure if it would affect or not. If it does not, then that's good.
> Currently the amount of resources taken is sadly fairly high.
I can totally see how throttling in the renderer helps with reducing the resource usage (memory etc.). For that reason, throttling first in the renderer and then later in the //net makes sense even though it sort of duplicates the logic at two different places.
Update: We've found a problem: issue 847890 . I fixed it on ToT and Beta but failed to merge the fix to stable. Since the issue is expected to be small the feature is still enabled.
Comment 1 by dougt@chromium.org
, Oct 20 2017