Issue metadata
Sign in to add a comment
|
1.2%-115.6% regression in loading.desktop at 524194:524348 |
||||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Dec 18 2017
๐ Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/129583a6040000
,
Dec 18 2017
๐ Found a significant difference after 1 commit. https://pinpoint-dot-chromeperf.appspot.com/job/129583a6040000 Reland: Predictor: Dispatch all predictions async from the resource throttle By csharrison@chromium.org ยท Fri Dec 15 03:36:35 2017 chromium @ ded5a15654abc9dc42bb155663345d1efc0cae41 Understanding performance regressions: http://g.co/ChromePerformanceRegressions
,
Dec 18 2017
I looked that the traces for ChromiumPerf/chromium-rel-win8-dual/loading.desktop / timeToFirstContentfulPaint_avg / warm / Elmundo The regression for that case is 135ms. The navigation has a redirect, and it looks like the redirect is taking 137ms longer with my patch, because there is no idle socket to use (I think). My main confusion is: Why does this have anything to do with preconnect? We load the URL, then load it again. Shouldn't the redirected host be available as an idle socket for the second load? There is definitely less than 10s of idle time between the runs. I tried reproing this locally using the same site (and clearing browsing data before doing the experiment), and I couldn't repro. In both cases it seems like we do the right thing wrt predicting the redirect, but I can land a diagnostic trace which will show up on subsequent perf bot runs.
,
Dec 18 2017
Do you need to actually land the diagnostic CL or is it possible to just try it on *_perf_bisect bots (e.g. win_8_perf_bisect)?
,
Dec 18 2017
The redirect is to the same host? We normally preconnect two sockets, don't we? Maybe the first response is "Connection: Close"?
,
Dec 18 2017
The redirect is to the host with a "www" prefix. I feel like landing the diagnostic CL is a good idea regardless, since (imo) it makes reading traces with preconnect a piece of cake. I can try to run the bots from the CL directly too, but it tends to be a lot flakier (in my experience).
,
Dec 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7114dab11caa20f2941bb70dd04ce909701cceff commit 7114dab11caa20f2941bb70dd04ce909701cceff Author: Charles Harrison <csharrison@chromium.org> Date: Mon Dec 18 23:47:35 2017 Preconnect: add trace events These will be useful in debugging regressions on the perf bots. Bug: 795766 Change-Id: I20d4465fb7d60b7bb48685fd2e24a1b7f48503da Reviewed-on: https://chromium-review.googlesource.com/822770 Reviewed-by: Matt Menke <mmenke@chromium.org> Commit-Queue: Charlie Harrison <csharrison@chromium.org> Cr-Commit-Position: refs/heads/master@{#524862} [modify] https://crrev.com/7114dab11caa20f2941bb70dd04ce909701cceff/content/browser/loader/resource_hints_impl.cc
,
Dec 19 2017
๐ Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/12d9c92e040000
,
Dec 19 2017
I'm having trouble reproducing this, and I think to really understand we'll have to get a netlog on the bots. A lot of these graphs looks like this is reverting back to a bimodality, which could be the result of some subtle timing changes caused by this patch. I sent out a bisect for one of the "wins" earlier in november, where before we see ttfcp similar to where it is now.
,
Dec 20 2017
๐ Found a significant difference after 1 commit. https://pinpoint-dot-chromeperf.appspot.com/job/12d9c92e040000 Revert "Enable SpeculativePreconnect Field Trial Testing" By alexilin@chromium.org ยท Thu Nov 16 21:07:36 2017 chromium @ 698e803c4335c3e9429f580851a580b18e7daef0 Understanding performance regressions: http://g.co/ChromePerformanceRegressions
,
Dec 20 2017
Apologies for the noise, I didn't realize pinpoint would do this. Removing added people. It is very interesting that the other preconnect experiment had similar performance characteristics, although I'm not sure if the underlying problem is the same, since this patch should just be moving the timing of preconnect dispatch a bit.
,
Dec 20 2017
I just ran some of the benchmarks on my windows machine and couldn't reproduce either. In all cases the redirect successfully reuses the existing socket. If we can grab a pinpoint with the netlog from a recent run on the perf bots it would help things.
,
Dec 20 2017
,
Dec 20 2017
I saw something similar while investigating https://crbug.com/784206 . I have two traces illustrating the problem with enabled netlog category. Since it's a warm benchmark, we load a page two times and sockets to the main frame host should be opened and idle at the moment of the second load. In the first case everything works as expected and the second load completes in 100ms. https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/Donga_2017-11-21_02-44-52_58332.html In the second case the main frame socket is closed with SOCKET_CLOSED event before the second load happens and it takes 350ms to load the page because we have to open a new connection. https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/Donga_2017-11-21_04-32-16_32153.html The difference between these two cases is that in the first case we don't preconnect to the main frame host (the one after a redirect) and in the second case we do. I'm wondering, how do we determine the order in which we close sockets if we reached a limit of opened connections? Can this order be affected by change like this? I don't see other reasons to close sockets so quickly.
,
Dec 20 2017
,
Dec 20 2017
The limit of total connections with no proxy is 256. With a proxy it's 32. I had assumed the server was hanging up on us, but if the test server is using a proxy, that could also be it. We don't do a good job of logging why a socket was closed.
,
Dec 20 2017
The test server is indeed using a socks5 proxy I'm fairly sure. The main thing this patch does is moves from a world where we: 1. Preconnect main frame once 2. Preconnect predicted things (or the main frame twice if db is empty) 3. Initiate the navigation / redirect To 1. Initiate the navigation / redirect 2. Preconnect predicted things (or the main frame twice if db is empty) Here is one way how things could go wrong with a 32 limit: BEFORE: Navigate to a.com which redirects to b.com, use up all 32 connections, including one connection to a.com, and one connection to b.com. Upon starting navigation to a.com 1. Preconnect main frame once, is a no-op 2. Predict connections to b.com, these close some other idle sockets 3. Initiate navigation to a.com, closing some other idle sockets AFTER: 1. Initiate navigation to a.com, closes the connection to b.com (CAUSE OF REGRESSION) 2. Predict connections to b.com, must use new sockets and close some other ones. I'm not sure how plausible this scenario is though.
,
Dec 20 2017
I will say that if the guess in #18 is correct, it is probably WontFix, since it's exercising an edge case of an edge case.
,
Dec 20 2017
Just chatted with mmenke about this now that we have a net-log repro for elmundo. It seems like something that's described in #18 is happening. Basically, since the patch also subtly re-orders subframe preconnects, we just end up in a situation where the redirected URL is evicted from the 32 connection pool first, before we re-navigate in the story. This is also likely for amazon.co.jp, the other fairly significant regression here. Marking as WontFix. |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Dec 18 2017