New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 727544 link

Starred by 5 users

Issue metadata

Status: WontFix
Owner:
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 1
Type: Bug



Sign in to add a comment

Defer the preconnect tasks and start a service worker speculatively instead.

Project Member Reported by horo@chromium.org, May 30 2017

Issue description

Currently Chrome's predictor starts preconnections to the servers at PredictorTabHelper::DidStartNavigation().
And if Chrome starts the handshakes of QUIC and HTTPS super quickly, the IO thread is blocked by the handshake tasks.
This handshake looks delaying the service worker startup and causing slow page load especially on low-end devices. 

I created a demo page and a CL which adds TRACE_EVENTs related to QUIC and preconnect.
 demo page: https://horo-t.github.io/serviceworker/demo/tmp/20170530/preconnectdemo.html
 CL: https://codereview.chromium.org/2907263002/
And recorded some chrome://traces (navigation, net, netlog, ServiceWorker, toplevel) on Nexus 4.

The attached trace image is recorded on Nexus 4 using the test page.
In this trace, the service worker startup is disturbed by the QUIC and HTTPS handshakes.

I think we should start the service worker at PredictorTabHelper::DidStartNavigation() and defer the preconnect tasks after the service worker startup.


 
predictor preconnect bad async case.png
111 KB View Download
trace_predictor_preconnect_bad_async_case.json
946 KB View Download

Comment 1 by y...@yoav.ws, May 30 2017

Cc: y...@yoav.ws
Labels: M-61

Comment 3 by lizeb@chromium.org, May 30 2017

Cc: lizeb@chromium.org

Comment 4 by lizeb@chromium.org, May 30 2017

It seems that we have several things here:

1. Preconnection does non-trivial synchronous work on the IO thread in the case of QUIC "preconnects" because we don't need to send any packets
2. Service Workers could start earlier

About (1), I would be of the opinion to fix the issue of "head of line" blocking on the IO thread caused by preconnection, rather than delay preconnection. Assuming that we predicted the right domains, then we should start connecting as soon as possible.
However it is clear that 100+ms in a single task on the IO thread is way too much. This seems fixable though, perhaps by posting tasks for each preconnection request, where each task posts the next one upon completion. This way, when the service worker startup tasks come in, they can skip ahead of preconnect.

What do you think?
Cc: rsleevi@chromium.org
It is important to remember that 100ms scales with the CPU and is based on what's happening. I think it's important that we should figure out a CPU profile on what's happening in those 100ms before we begin significantly rearchitecting things.

In general, //net has been designed to ensure that it does not block on external events (e.g. IO events, user input), but CPU is a finite resource, and the overheads involved with PostTask can easily swamp any optimizations - and result in worse user experiences.

A prime example of this tradeoff has been in the buffer sizes used for reading and writing data from HTTP/2 streams. For mobile devices, which may be more CPU bound on cryptographic operations, you want smaller buffers so that you can yield CPU cycles quicker. However, that same optimization absolutely destroys performance for desktop users - 4x to 10x worse - due to the additional overheads involved with yielding.

Our strategy has been:
- Optimize for low-hanging CPU fruit (e.g. unnecessary copies, non-optimized instructions)
- Optimize for the most representative user experience (e.g. which may mean leaving a long-tail of users behind)
- Optimize for 'acceptable, even if not ideal' behaviours - as hard as that is to accept a 'less than perfect' solution, at times, reduced complexity helps maintain a faster trajectory and greater improvement of the overall system, even if trades short-term impact to the user experience
- Only after all of these have been exhausted do we begin micro-optimizing tasks.

I realize that this is an important issue, but these are lessons learned from five years of tuning on the network stack, and, just like JS engines in general, microbenchmarking can easily cause holistic reductions in speed or systemic complexity growth. Figuring out the right balance between what is possible is difficult, especially if it means leaving 'speed on the table', but at times is right.

That's not to say we shouldn't continue to investigate and understand this, but to suggest we should keep in mind these general principles and priorities when doing so. Given the wide diversity of network speeds (e.g. 2G vs Fiber) and CPU profiles (mobile with LTE vs fast laptop with spotty wifi) and memory usage (high-end mobile phones vs low-end notebooks), we have a lot of dimensions to optimize against and to balance.

Comment 6 by rch@chromium.org, May 30 2017

Cc: rch@chromium.org
> 1. Preconnection does non-trivial synchronous work on the IO thread in the case of QUIC "preconnects" because we don't need to send any packets

Just to clarify, a QUIC preconnect *does* send the handshake packet to start the connection.

Comment 7 by horo@chromium.org, May 31 2017

Thank you for the comments.

We are planing to introduce new options for the behavior of predictor, like this:

Network predictor options for service worker controlled pages.
- PreconnectOnly: Execute preconnect on navigation. This is the current behavior.
- StartServiceWorkerAndPreconnect: Start service worker and execute preconnect in parallel on navigation.
- StartServiceWorkerAndDeferPreconnect: Start service worker on navigation, and defer preconnection until the service worker startup.
- StartServiceWorkerOnly: Start service worker on navigation and do not execute preconnection.

And we will do experiment in the real world using Chrome Variations framework and measure the UMA such as NavigationToParseStart and NavigationToFirstMeaningfulPaint.
(I just added FirstMeaningfulPaint UMAs for service worker:  issue 727599 )
And then we will choose the default behavior.

Comment 9 by horo@chromium.org, Jul 6 2017

Status: WontFix (was: Started)
As I commented in https://codereview.chromium.org/2916533002/#msg34, if PlzNavigate is enabled the optimizations doesn't have big impact.

So I'd like to set the status to WontFix.

Sign in to add a comment