New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 836552 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 9
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Experiment with delaying low priority requests to H2 servers

Project Member Reported by tbansal@chromium.org, Apr 25 2018

Issue description

Experiment with delaying low priority requests to H2 servers similar to how low priority requests to non-H2 proxies are throttled.

The experiment should be guarded behind a finch trial, and should be limited to connections Slow2G, 2G, 3G. This is ~22% of page loads (http://shortn/_xUycFJmKqb). Initially, the experiment should be limited only ton Android.
 

Comment 1 Deleted

Comment 2 Deleted

Note that Issue 655585 tracks the work related to the throttling of requests at the cache layer/IO thread. On the other hand, the goal here is to reduce the network contention. Fixing this issue is not likely to completely mitigate cache/IO thread contention problem that Issue 655585 refers to, especially on faster connections.

The previous work to throttle HTTP 1.1 requests (https://groups.google.com/a/chromium.org/d/topic/loading-dev/OuOX94uraN8/discussion) shows that there is some benefit in network adaptive scheduling. So, it seems reasonable to experiment with bringing the same parity to h2 requests.

The plan here is to experiment with throttling h2 requests on slow connections (at most effective connection type of 3G, which corresponds to ~22% of webpages). We plan to keep the code around, and re-run the experiment once Issue 655585 is fixed.
Cc: bengr@chromium.org jkarlin@chromium.org lassey@chromium.org
Labels: -Pri-3 M-68 OS-Chrome OS-Linux OS-Windows Pri-2
Project Member

Comment 5 by bugdroid1@chromium.org, May 1 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4256c9caa969bb88eda65ce3df074d11d3b60e62

commit 4256c9caa969bb88eda65ce3df074d11d3b60e62
Author: Tarun Bansal <tbansal@chromium.org>
Date: Tue May 01 20:59:11 2018

Minor cleanup in Resource Scheduler

This CL does not introduce any functionality change, and
simply cleans up the exit points of an existing function.

Bug:  836552 
Cq-Include-Trybots: master.tryserver.chromium.linux:linux_mojo
Change-Id: I586dd475351ddc2334d460cea8a2c18ddb590866
Reviewed-on: https://chromium-review.googlesource.com/1034453
Reviewed-by: Matt Menke <mmenke@chromium.org>
Reviewed-by: Ryan Sturm <ryansturm@chromium.org>
Commit-Queue: Tarun Bansal <tbansal@chromium.org>
Cr-Commit-Position: refs/heads/master@{#555180}
[modify] https://crrev.com/4256c9caa969bb88eda65ce3df074d11d3b60e62/services/network/resource_scheduler.cc

Project Member

Comment 6 by bugdroid1@chromium.org, May 21 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/45c07e544d80baa561f994edb068343bb9b95e26

commit 45c07e544d80baa561f994edb068343bb9b95e26
Author: Tarun Bansal <tbansal@chromium.org>
Date: Mon May 21 22:36:32 2018

Throttle requests to H2 servers on slow connections only

This CL adds mechanism to throttle requests to H2 servers
similar to how requests to HTTP 1.1 are throttled. Note
that this CL does not add 6/host limit to H2 servers even when
throttling is in place.

Cq-Include-Trybots: master.tryserver.chromium.linux:linux_mojo
Change-Id: Ia9f86335b97031a459a1374695727f4efc8bcb3a
Bug:  836552 
Reviewed-on: https://chromium-review.googlesource.com/912732
Reviewed-by: Matt Menke <mmenke@chromium.org>
Reviewed-by: Ryan Sturm <ryansturm@chromium.org>
Commit-Queue: Tarun Bansal <tbansal@chromium.org>
Cr-Commit-Position: refs/heads/master@{#560382}
[modify] https://crrev.com/45c07e544d80baa561f994edb068343bb9b95e26/services/network/public/cpp/features.cc
[modify] https://crrev.com/45c07e544d80baa561f994edb068343bb9b95e26/services/network/public/cpp/features.h
[modify] https://crrev.com/45c07e544d80baa561f994edb068343bb9b95e26/services/network/resource_scheduler.cc
[modify] https://crrev.com/45c07e544d80baa561f994edb068343bb9b95e26/services/network/resource_scheduler_params_manager.cc
[modify] https://crrev.com/45c07e544d80baa561f994edb068343bb9b95e26/services/network/resource_scheduler_params_manager.h
[modify] https://crrev.com/45c07e544d80baa561f994edb068343bb9b95e26/services/network/resource_scheduler_params_manager_unittest.cc
[modify] https://crrev.com/45c07e544d80baa561f994edb068343bb9b95e26/services/network/resource_scheduler_unittest.cc

Project Member

Comment 7 by bugdroid1@chromium.org, May 25 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0240990479a1bc3e0fd1ae64c346441579bf09be

commit 0240990479a1bc3e0fd1ae64c346441579bf09be
Author: Tarun Bansal <tbansal@chromium.org>
Date: Fri May 25 17:53:44 2018

Add DelayRequestsOnMultiplexedConnections to the field trial testing config

Bug:  836552 
Change-Id: I1a2fa3efe8b4e5818f8dcb54b49004039e52053d
Reviewed-on: https://chromium-review.googlesource.com/1069705
Reviewed-by: Doug Arnett <dougarnett@chromium.org>
Reviewed-by: Robert Kaplow <rkaplow@chromium.org>
Commit-Queue: Tarun Bansal <tbansal@chromium.org>
Cr-Commit-Position: refs/heads/master@{#561937}
[modify] https://crrev.com/0240990479a1bc3e0fd1ae64c346441579bf09be/testing/variations/fieldtrial_testing_config.json

tbansal, Can you update the status of this bug?
Currently running at 50% M-68+ Canary, Dev, Beta channels. Waiting for M-68 to hit stable, and collect some data.
Status: Started (was: Assigned)
Cc: pmeenan@chromium.org
Is this (or a flavor of this) running against stable for Chrome 69, independently of connection type?

Don't get me wrong, I think it's necessary given the sad state of support for priorities but I was surprised to start seeing split waterfalls in WPT testing this morning.

Just checking to make sure that it was intentional and not a result of moving to the network service or something like that.
Yes, it is currently enabled for 50%, but only on slow connections (ECT of 3G or slower). The slow connection decision is made at the start of the navigation. I am somewhat surprised that the effect is observable on WPT since WPT network throttling is not always detected *before* the first page load by the network quality estimator.

What network throttling were you using? Is it possible to get the netlog?

> not a result of moving to the network service or something like that.

I am not aware of any network service experiments running right now that may affect this.
Also, this experiment has been running for 50% M-69+ stable for a month or so. I am writing up the analysis doc. Would share soon!
I'm seeing it in 100% of my tests including those with no traffic-shaping (1GBps ethernet on the same network as the test server).  Similar tests with Chrome 68 from a few days ago did not show the tell-tale 2-stage waterfall.  M-69 just went stable a few days ago so I assume the existing stable test data is from 68, right?  Otherwise there's no way you have a month of data for stable.

Here is a set of tests for Chrome 69 with Cable throttling: https://www.webpagetest.org/result/180906_5Q_185cdd8a59a2c66511d60a4fe718c163/

Links to the netlog are to the left of the waterfall for each test.
> Similar tests with Chrome 68 from a few days ago did not show the tell-tale 2-stage waterfall.  M-69 just went stable a few days ago so I assume the existing stable test data is from 68, right?  Otherwise there's no way you have a month of data for stable.

aah, you are right. I meant it has been enabled for 50% stable for M-68+ for 1+ month.
Pat, can you say a bit more on what I should be looking at in the netlog? Is it that some of the H2 requests are dispatched later? 

Also, whatever is happening is definitely not because of this experiment since the network is fast (NQE detects network quality as 4G). Also, it seems that the field trials are disabled. The "Active Field Trial Groups" box is empty in the netlog.
Yes.  The waterfall has 2 distinct phases.  The critical resources are requested and then after they finish loading all of the images load.

In 68 a couple of days ago all of the requests are sent out immediately: https://www.webpagetest.org/result/180904_Q1_75c40b9be7dfb6cdc154e0a6b8ad8321/

Chrome is "delaying low priority requests to H2 servers" by default in 69, just trying to figure out if it was intentional or not.
> Chrome is "delaying low priority requests to H2 servers" by default in 69, just trying to figure out if it was intentional or not.


If that's what's happening, then it's not intentional. Looking into it.
I tried with --force-effective-connection-type=4G flag, and still see the 2-phase waterfall. I also verified locally that 2 phase waterfall happens when this experiment is disabled. 

It's still not clear why this is happening.
Cc: kinuko@chromium.org yhirano@chromium.org
In the netlog, I see only a few milliseconds of delay between REQUEST_ALIVE and HTTP_TRANSACTION_HTTP2_SEND_REQUEST_HEADERS. So, throttling is happening before the request hits the resource scheduler in browser process.

I looked at a bit more. It seems this is because of RendererSideResourceScheduler. It was enabled by default in M-69.

WPT with default params for RendererSideResourceScheduler:
https://www.webpagetest.org/result/180906_6T_d065c176f109d3f8c678e5917ebc66a8/

WPT with tight_limit param overridden to 200:
https://www.webpagetest.org/result/180906_CJ_49192ed2431e04a4075cb6db53675781/

The former has 2 phase pattern, latter does not. RendererSideResourceScheduler experiment was enabled by default in M-69.

Pat, if this negatively affects the performance of the pageload, please file a bug. +yhirano, +kinuko as FYI.
I don't know that it negatively affects performance (I'm actually a fan of the 2-phase loading in all cases) but I don't have access to UMA anymore so I can't really say.

What that does mean though is that this experiment is pretty much moot though since the renderer-side scheduling applies the delays regardless of HTTP/1.1 or HTTP/2.
> What that does mean though is that this experiment is pretty much moot though since the renderer-side scheduling applies the delays regardless of HTTP/1.1 or HTTP/2.

Not really, UMA still shows statistically significant performance improvements for H2 experiment even when RendererSideResourceScheduler is enabled. 
In that case I guess the question is "how".  Is the experiment doing more than just enabling the delaying of requests for HTTP/2 connections?  If so that should already be handled.

Maybe the max-concurrent-request limits across all connections?
yhirano@ can talk more about RendererSideResourceScheduler, but looking at the code it seems that tight_limit in RendererSideResourceScheduler kicks only in the layout blocking phase (i.e., until the frame sees a <body> element). Browser-side resource scheduler can do throttling beyond the layout blocking phase (e.g., between seeing the <body> and first paint). There could be some other differences too. Here is the analysis doc for render side scheduler: https://docs.google.com/document/d/1oreWB98RmEhqGIM9vQgOs9s7llBLxHsxubhNyN8Cmrg/edit#

> Maybe the max-concurrent-request limits across all connections?

I am not sure what you mean? Assuming, I understand the question correctly, max-concurrent-request in the browser scheduler does not apply to H2 by default. When determining if H1 request needs to be throttled, in-flight H2 requests are counted against the limit of 10 (or whatever is the limit).
Just looked at the UMA data for this experiment (delay low priority H2 requests on slow connections):

On cellular 2G like connections across all Android users:
* FCP sample count increased by 0.87%. This increase in FCP count is accompanied by a statistically significant decrease in the abort metrics.
* FCP reduced by 0.99% and 1.17% at 95th and 99th percentile, respectively.
* FMP reduced by 1.10% at both 95th and 99th percentile.
* No statistical change in user-initiated page reload/refresh rates.

On cellular 2G and 3G like connections across all Android users
* FCP Count increased by 0.16%. This increase in FCP count is accompanied by a statistically significant decrease in the abort metrics.
* FCP reduced by 0.20% and 0.43% at 95th and 99th percentile, respectively.
* No statistical change in user-initiated page reload/refresh rates.

My point is, if the scheduling is done in the renderer for "delaying low priority requests to H2 servers" then what exactly is the experiment doing on top of that?  The scheduler in the browser process shouldn't even run if the scheduling is being done in the renderer, right?  Even if it is running, it shouldn't be doing anything that the logic in the renderer didn't already do.

The scheduler in renderer is not tuned for performance optimization. It's goal is not to reduce network contention. kinuko's doc https://docs.google.com/document/d/1-IRpbQF4KrathwhO0V7KgpDnnGNDr3v-HY4u7kAi5pQ/edit#heading=h.fw7h3iiq6ktu talks a bit about how the two schedulers would work together. At a high level:

The renderer scheduler looks at the renderer level signals: Tab in background/foreground, what phase the page load is in (has seen <body> tag).

The browser scheduler tries to optimize loading based on network quality/utilization, and other net layer constraints (max connections per origin, server is an h2 server or not). In future, it can also do the global scheduling (across renderers).

Without the global scheduling, theoretically we could move parts of browser scheduling to renderer. In current format, they are still working at different layers, and using different input signals for making decisions.
Right, but the specific logic around delaying low priority requests to H2 servers is all around the "has seen <body> tag" logic.  If you are still seeing differences then it's because that part of the logic isn't behaving the same in the renderer implementation.

The browser implementation waits for both the body to be inserted and all pending stylesheets to have finished loading.  Maybe the renderer implementation doesn't also track the stylesheets.

The rest of the scheduling logic in the browser side is disabled for HTTP/2 connections AFAIK so your experiment should only be controlling the scheduling that is now also done in the renderer (and isn't controlled by the experiment).
> Right, but the specific logic around delaying low priority requests to H2 servers is all around the "has seen <body> tag" logic. 

Sorry for the confusion. That's not correct. On slow connections, this experiment could affect scheduling of low priority H2 requests even after <body> tag is seen. e.g., between <body> tag seen and FCP, the renderer scheduler would not do throttling, but the browser scheduler might.
Ah, gotcha.  I didn't realize that FCP was plumbed through to the browser scheduler as a signal since I last worked on it.  Thanks.
>  I didn't realize that FCP was plumbed through to the browser scheduler as a signal since I last worked on it.

No, it is not. I was just giving an example. Resource scheduler in browser can throttle at all times (and it does too). It is NOT restricted to "has seen <body> tag" logic.
Cc: domfarolino@gmail.com
TL;DR from results doc:

On cellular 2G like connections across all Android users:

* FCP sample count increased by 0.87%. This increase in FCP count is accompanied by a statistically significant decrease in the abort metrics.

* FCP reduced by 0.99% and 1.17% at 95th and 99th percentile, respectively. No statistically significant change in the median value. Due to an increase in FCP count, the actual reduction in FCP may be more than the value shown by UMA dashboard.

* FMP reduced by 1.10% at both 95th and 99th percentile. 

* No statistical change in user-initiated page reload/refresh rates.


On cellular 2G and 3G like connections across all Android users

* FCP Count increased by 0.16%. This increase in FCP count is accompanied by a statistically significant decrease in the abort metrics.

* FCP reduced by 0.20% and 0.43% at 95th and 99th percentile, respectively. No statistically significant change in the median value. Due to an increase in FCP count, and reduction in Previews shown, the actual reduction in FCP may be more than the value shown by UMA dashboard.

* No statistical change in user-initiated page reload/refresh rates.

* Count of Previews shown reduced: 2.24% (lite pages), LoFi (4.43%), NoScript (3.90%), Offline (1.06%) (dashboard)
Project Member

Comment 35 by bugdroid1@chromium.org, Oct 3

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f283e3915631c032dee6caae7a55108adfdb8cfd

commit f283e3915631c032dee6caae7a55108adfdb8cfd
Author: Tarun Bansal <tbansal@chromium.org>
Date: Wed Oct 03 06:03:38 2018

Enable the experiment to delay low priority requests on multiplexed connections

Enable the experiment to delay low priority requests on multiplexed connections.
net-dev thread: https://groups.google.com/a/chromium.org/forum/#!topic/net-dev/abRbGlGlNd0
loading-dev thread: https://groups.google.com/a/chromium.org/forum/#!topic/loading-dev/s22tVFiDru0
predictability-advice: http://shortn/_WJaP3jsRbk

Cq-Include-Trybots: luci.chromium.try:linux_mojo
Change-Id: I6141733ec7bfbd45c6b3c3d8ed98857104dee442
Bug:  836552 
Reviewed-on: https://chromium-review.googlesource.com/c/1257677
Reviewed-by: Matt Menke <mmenke@chromium.org>
Reviewed-by: Kinuko Yasuda <kinuko@chromium.org>
Commit-Queue: Tarun Bansal <tbansal@chromium.org>
Cr-Commit-Position: refs/heads/master@{#596129}
[modify] https://crrev.com/f283e3915631c032dee6caae7a55108adfdb8cfd/services/network/public/cpp/features.cc
[modify] https://crrev.com/f283e3915631c032dee6caae7a55108adfdb8cfd/services/network/resource_scheduler_params_manager.cc
[modify] https://crrev.com/f283e3915631c032dee6caae7a55108adfdb8cfd/services/network/resource_scheduler_params_manager_unittest.cc

Project Member

Comment 36 by bugdroid1@chromium.org, Oct 5

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/35137ea8d00a5b32588b1d10e444829f73736099

commit 35137ea8d00a5b32588b1d10e444829f73736099
Author: Tarun Bansal <tbansal@chromium.org>
Date: Fri Oct 05 16:32:22 2018

Remove DelayRequestsOnMultiplexedConnections from field trial testing

Remove DelayRequestsOnMultiplexedConnections from field trial testing
config since the experiment is now default enabled in Chromium.

Bug:  836552 
Change-Id: I9c0e0e233a47f9a92ea0562b77f6dcadea727e5b
Reviewed-on: https://chromium-review.googlesource.com/c/1259963
Reviewed-by: Robert Kaplow (sloooow) <rkaplow@chromium.org>
Commit-Queue: Tarun Bansal <tbansal@chromium.org>
Cr-Commit-Position: refs/heads/master@{#597163}
[modify] https://crrev.com/35137ea8d00a5b32588b1d10e444829f73736099/testing/variations/fieldtrial_testing_config.json

Status: Fixed (was: Started)
This is now enabled for 100% of M-68+ population.
Cc: y...@yoav.ws

Sign in to add a comment