New issue
Advanced search Search tips

Issue 795766 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner:
Closed: Dec 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression



Sign in to add a comment

1.2%-115.6% regression in loading.desktop at 524194:524348

Project Member Reported by chiniforooshan@chromium.org, Dec 18 2017

Issue description

See the link to graphs below.
 
Project Member

Comment 1 by 42576172...@developer.gserviceaccount.com, Dec 18 2017

All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=795766

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=e4496b7f8427b22b5ca217abd0de2695eceb7872e465423dc1d6677cc25976c0


Bot(s) for this bug's original alert(s):

chromium-rel-mac11-air
chromium-rel-win10
chromium-rel-win7-dual
chromium-rel-win7-gpu-ati
chromium-rel-win7-gpu-nvidia
chromium-rel-win7-x64-dual
chromium-rel-win8-dual
win-high-dpi
Project Member

Comment 2 by 42576172...@developer.gserviceaccount.com, Dec 18 2017

๐Ÿ“ Pinpoint job started.
https://pinpoint-dot-chromeperf.appspot.com/job/129583a6040000
Project Member

Comment 3 by 42576172...@developer.gserviceaccount.com, Dec 18 2017

Cc: mmenke@chromium.org csharrison@chromium.org
Owner: csharrison@chromium.org
Status: Assigned (was: Untriaged)
๐Ÿ“ Found a significant difference after 1 commit.
https://pinpoint-dot-chromeperf.appspot.com/job/129583a6040000

Reland: Predictor: Dispatch all predictions async from the resource throttle
By csharrison@chromium.org ยท Fri Dec 15 03:36:35 2017
chromium @ ded5a15654abc9dc42bb155663345d1efc0cae41

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions
I looked that the traces for 

ChromiumPerf/chromium-rel-win8-dual/loading.desktop / timeToFirstContentfulPaint_avg / warm / Elmundo

The regression for that case is 135ms. The navigation has a redirect, and it looks like the redirect is taking 137ms longer with my patch, because there is no idle socket to use (I think).

My main confusion is: Why does this have anything to do with preconnect? We load the URL, then load it again. Shouldn't the redirected host be available as an idle socket for the second load? There is definitely less than 10s of idle time between the runs.

I tried reproing this locally using the same site (and clearing browsing data before doing the experiment), and I couldn't repro. In both cases it seems like we do the right thing wrt predicting the redirect, but I can land a diagnostic trace which will show up on subsequent perf bot runs.
Do you need to actually land the diagnostic CL or is it possible to just try it on *_perf_bisect bots (e.g. win_8_perf_bisect)?

Comment 6 by mmenke@chromium.org, Dec 18 2017

The redirect is to the same host?  We normally preconnect two sockets, don't we?  Maybe the first response is "Connection: Close"?
The redirect is to the host with a "www" prefix. I feel like landing the diagnostic CL is a good idea regardless, since (imo) it makes reading traces with preconnect a piece of cake.

I can try to run the bots from the CL directly too, but it tends to be a lot flakier (in my experience).
Project Member

Comment 8 by bugdroid1@chromium.org, Dec 18 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7114dab11caa20f2941bb70dd04ce909701cceff

commit 7114dab11caa20f2941bb70dd04ce909701cceff
Author: Charles Harrison <csharrison@chromium.org>
Date: Mon Dec 18 23:47:35 2017

Preconnect: add trace events

These will be useful in debugging regressions on the perf bots.

Bug:  795766 
Change-Id: I20d4465fb7d60b7bb48685fd2e24a1b7f48503da
Reviewed-on: https://chromium-review.googlesource.com/822770
Reviewed-by: Matt Menke <mmenke@chromium.org>
Commit-Queue: Charlie Harrison <csharrison@chromium.org>
Cr-Commit-Position: refs/heads/master@{#524862}
[modify] https://crrev.com/7114dab11caa20f2941bb70dd04ce909701cceff/content/browser/loader/resource_hints_impl.cc

Project Member

Comment 9 by 42576172...@developer.gserviceaccount.com, Dec 19 2017

๐Ÿ“ Pinpoint job started.
https://pinpoint-dot-chromeperf.appspot.com/job/12d9c92e040000
I'm having trouble reproducing this, and I think to really understand we'll have to get a netlog on the bots.

A lot of these graphs looks like this is reverting back to a bimodality, which could be the result of some subtle timing changes caused by this patch. I sent out a bisect for one of the "wins" earlier in november, where before we see ttfcp similar to where it is now.
Project Member

Comment 11 by 42576172...@developer.gserviceaccount.com, Dec 20 2017

Cc: tedc...@chromium.org alexilin@chromium.org jwd@chromium.org xunji...@chromium.org
Owner: alexilin@chromium.org
๐Ÿ“ Found a significant difference after 1 commit.
https://pinpoint-dot-chromeperf.appspot.com/job/12d9c92e040000

Revert "Enable SpeculativePreconnect Field Trial Testing"
By alexilin@chromium.org ยท Thu Nov 16 21:07:36 2017
chromium @ 698e803c4335c3e9429f580851a580b18e7daef0

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions
Cc: -xunji...@chromium.org -tedc...@chromium.org -jwd@chromium.org -alexilin@chromium.org
Owner: csharrison@chromium.org
Apologies for the noise, I didn't realize pinpoint would do this. Removing added people. It is very interesting that the other preconnect experiment had similar performance characteristics, although I'm not sure if the underlying problem is the same, since this patch should just be moving the timing of preconnect dispatch a bit.
I just ran some of the benchmarks on my windows machine and couldn't reproduce either. In all cases the redirect successfully reuses the existing socket.

If we can grab a pinpoint with the netlog from a recent run on the perf bots it would help things.
Components: Speed>Bisection
I saw something similar while investigating  https://crbug.com/784206 .
I have two traces illustrating the problem with enabled netlog category.
Since it's a warm benchmark, we load a page two times and sockets to the main frame host should be opened and idle at the moment of the second load. 

In the first case everything works as expected and the second load completes in 100ms.
https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/Donga_2017-11-21_02-44-52_58332.html

In the second case the main frame socket is closed with SOCKET_CLOSED event before the second load happens and it takes 350ms to load the page because we have to open a new connection. 
https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/Donga_2017-11-21_04-32-16_32153.html

The difference between these two cases is that in the first case we don't preconnect to the main frame host (the one after a redirect) and in the second case we do.

I'm wondering, how do we determine the order in which we close sockets if we reached a limit of opened connections? Can this order be affected by change like this? I don't see other reasons to close sockets so quickly.
Cc: alexilin@chromium.org
The limit of total connections with no proxy is 256.  With a proxy it's 32.  I had assumed the server was hanging up on us, but if the test server is using a proxy, that could also be it.  We don't do a good job of logging why a socket was closed.
The test server is indeed using a socks5 proxy I'm fairly sure.

The main thing this patch does is moves from a world where we:
1. Preconnect main frame once
2. Preconnect predicted things (or the main frame twice if db is empty)
3. Initiate the navigation / redirect

To
1. Initiate the navigation / redirect
2. Preconnect predicted things (or the main frame twice if db is empty)

Here is one way how things could go wrong with a 32 limit:
BEFORE:
Navigate to a.com which redirects to b.com, use up all 32 connections, including one connection to a.com, and one connection to b.com.
Upon starting navigation to a.com
1. Preconnect main frame once, is a no-op
2. Predict connections to b.com, these close some other idle sockets
3. Initiate navigation to a.com, closing some other idle sockets

AFTER:
1. Initiate navigation to a.com, closes the connection to b.com (CAUSE OF REGRESSION)
2. Predict connections to b.com, must use new sockets and close some other ones.

I'm not sure how plausible this scenario is though.
I will say that if the guess in #18 is correct, it is probably WontFix, since it's exercising an edge case of an edge case.
Status: WontFix (was: Assigned)
Just chatted with mmenke about this now that we have a net-log repro for elmundo. It seems like something that's described in #18 is happening.

Basically, since the patch also subtly re-orders subframe preconnects, we just end up in a situation where the redirected URL is evicted from the 32 connection pool first, before we re-navigate in the story. This is also likely for amazon.co.jp, the other fairly significant regression here.

Marking as WontFix.

Sign in to add a comment