Issue metadata
Sign in to add a comment
|
Google services and internet connection periodically drop out.
Reported by
john.sum...@jpress.co.uk,
Jun 21 2016
|
||||||||||||||||||||||||
Issue description
Chrome connection to Google Apps and then all websites drop out, and stay down for 5 to 10 minutes at a time. Loading an Incognito tab during this downtime will successfully re-connect to the resource like google apps Gmail etc. Refreshing the original tab with fail. The system proxy is set by group policy and is an autoconfiguration PAC file. The issues effecting many users within the enterprise.
Chrome Version : <51.0.2704.103 m (64-bit)>
URLs (if applicable) :
Other browsers tested:
Add OK or FAIL, along with the version, after other browsers where you
have tested this issue:
Firefox: version 47 fail
IE: 11 OK
What steps will reproduce the problem?
(1) intermittent, no way to reproduce
(2)
(3)
What is the expected result?
What happens instead?
Please provide any additional information below. Attach a screenshot if
possible.
,
Jun 21 2016
In a lot of events, the proxy seems to be giving back: HTTP/1.1 502 Connection timed out after a very long timeout. Event 2409240 also looks like fetching the PAC script is timing out. +eroman, do you have thoughts on what to look for here? I rather suspect this proxy is just falling over.
,
Jun 21 2016
It looks like fetching the PAC script (http://internetspc.jpress.co.uk/jpdefault.pac) is intermittently flaky, and this is causing things to get wedged. @john.summers: The following is my interpretation of how the log was captured and what was happening at the time, can you confirm if this roughly matches what happened (so I know if I am looking at the right part of the log): (1) Requests started hanging (no error, but pages just spinning) (2) About 4 minutes after things got wedged, you opened chrome://net-internals and started capturing (3) About 2 minutes after starting to capture events, things returned to normal. (4) Continued capturing events for another 2-3 minutes before saving the log file. What happened is fetching the PAC script (http://internetspc.jpress.co.uk/jpdefault.pac) took 5 minutes. While it was waiting for this to complete many new request would get blocked. (a) That this timeout is 5 minutes in Chrome ( issue 251682 ) that we need to fix. (b) That it timed out fetching on your network is a problem for you to investigate on your network. The NetLog doesn't differentiate incognito events. But I expect why this works in incognito mode is that it initiates a separate fetch of http://internetspc.jpress.co.uk/jpdefault.pac, and this time it completes quickly rather than timing out. Fundamentally solving (b) is critical (not having the PAC script timeout), since even if Chrome had a lower timeout things would still be broken here. At various other points in the log a DIRECT connection is used having failed to fetch the PAC script, and that just results in a socket reset. Making PAC script fetching optional rather than mandatory (as per issue 558754) would also help in general, although maybe not directly in this case. So some options for you are: * Do some testing of the serving infrastructure for http://internetspc.jpress.co.uk/jpdefault.pac -- try repeatedly fetching this file and see if you can reproduce it hanging, then figure out why and resolve that. * Instead of setting a PAC script via GPO, can you set an explicit proxy server instead? (Would avoid needing to fetch this file) * Or, instead of setting a PAC script http:// URL, set it as a data: URL (also avoids doing a nework fetch, however probably won't work in in chrome applications) * Or push the proxy settings via an extension, and use a data:URL in that extension
,
Jun 22 2016
@Eroman, Your interpretation sounds just about spot on. We are seeing mutiple users with this issue daily. If a user uses an extension like proxy switchy (pointing at the PAC address), the issue is resolved. We also don't see the issue with users browsing with IE, which suggests no infrastructure problem? We see slight outages with Firefox, but the timeouts must be much shorter in this browser. We have used this PAC script for many years without issues. The only recent changes are it's now hosted on Win 2008R2 server, up from 2003, but was fine for a first few months. Using a direct proxy not an option as we use custom exceptions in the PAC script for special cases. I will run some tests using the Chrome command-line switch in different configurations like using a flat data file etc. I expect this may show no issues like proxy switchy extension.
,
Jun 22 2016
It's just occurred again and I have a new dump. This time I accessed Ingognito during the outage and should show that this connected fine. I also had no issues downloading the PAC file using Notepad++ during the symptoms, just Chrome.
,
Jun 22 2016
Thanks John. Certainly there are changes that need to be done on the Chrome side to improve this situation (such as reducing that timeout), and the other improvements linked to. Unfortunately those changes are not going to happen right away, so your best bet is figuring out why the PAC script fetch is hanging. The fact that it happens in Firefox as well suggests there is an infrastructure problem (Chrome polls the PAC script for changes at various times, which can increase its chance relative to Firefox of getting stuck on a bad fetch of it). You may want to start by running a prober that repeatedly fetches http://internetspc.jpress.co.uk/jpdefault.pac every couple seconds or so (*WITHOUT* going over any proxy) until you reproduce the hangs (which might be periodic, in response to server load, or some other network event). Then you can dig deeper with wireshark/server logs to see what contributed to that.
,
Jun 23 2016
If the proxy is not mandatory, IE could be timing out the PAC request earlier, or racing connections. Or it could be repeatedly trying to request your PAC file. IE having additional magic that hides a network issue that FF and Chrome both see isn't really a Chrome bug. It may be worth investigating, if someone's trying to improve PAC performance at the 99th percentile, but this does sound like case that falls into the long tail.
,
Jun 23 2016
> If the proxy is not mandatory, There isn't a notion of non-mandatory PAC. It is mandatory. ... but not really, because in practice browsers fallback to DIRECT upon failure ... except in Chrome's case when marking a script as mandatory via an extension > IE could be timing out the PAC request earlier, or racing connections. IE is an entirely different beast. For starters it has a per-host cache for proxy resolutions in front of everything, so it doesn't actually hit pac evaluation as often. It also caches scripts a very long time, and can update outside of the application with better network change integrations. Chrome is more aggressive in checking PAC scripts for updates, and re-checking whenever it thinks your network has "changed". The latter uses imperfect signals like the IP address changing, and has historically caused a ton of problems. Not the least of which is that trying to fetch the PAC script near network change edges hits a number of other spurious errors for which we have other workaround. Sigh. Because Chrome is fetching more frequently (including polling it for changes on a roughly exponential period), and because it has a long timeout, it will be easier to hit these problems in Chrome. PAC in general is flawed, and there is tension between its pseudo-mandatory status vs being responsive. If we were to just give up on treating PAC as mandatory (i.e. issue 558754) then all of these problems would go away... and instead be replaced by a different set of problems (like network failures and security/privacy issues due to fetching a resource *without* using the right proxy because you decided not to wait for the PAC script). Sure, there is even more complexity we can throw at Chrome's PAC fetching implementation -- racing multiple connections for the PAC scripts, improving network transition detection, making background polls not block resolution, trying to cache pac scripts across chrome invocations (needing some affinity to the network it was fetched from) etc.
,
Jun 24 2016
Many thanks for the pointers on this. I have run performance tests on our IIS hosting the pack script using Fiddler and have identified some performance issues under heavy stress, where some sessions were taking in excess of 10 to 80 seconds to retrieve the file when 500+ requests made simultaneously. Chrome is used extensively in our organisation and the way Chrome seems to poll for the PAC, (too regularly in my opinion, as changes to this file are not too regular) is causing unnecessary stress on the IIS servers. To improve the situation we have enabled compression on the PAC txt file to reduce it from 17k down to 3k. This appears to have greatly improved the delivery to client requests. I will report back as soon as I know if this has stabilised our Chrome sessions, too early to say at present. Ultimately, I agree Chrome should not be so catastrophic when it cannot refresh a PAC file which is generally quite static.
,
Jun 24 2016
Thanks for the update John! We want to make improvements to Chrome's PAC fetching in the future (but unfortunately don't have the bandwidth for it right now). I am glad you have found workaround solutions.
,
Jul 20 2016
Hi All, I Just thought I'd make a quick update to this thread to say that our enterprise Chrome has stabilized and no further connection dropouts reported. Applying compression to the pac file within IIS cured the performance issue. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by john.sum...@jpress.co.uk
, Jun 21 201612.8 MB
12.8 MB View Download