New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Chrome hangs after waking up from sleep/switching network when using proxy auto-detect and have virtual adapters enabled

Reported by zacdbla...@gmail.com, Sep 29 2017

Issue description

UserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36

Steps to reproduce the problem:
1. Open up chrome on one wifi network, load a few pages
2. Put windows laptop to sleep/hibernate
3. Move to location with new WiFi network
4. Wake up laptop, automatically connect to new wifi network
5. Try to open a new tab or try to refresh an existing one

What is the expected behavior?
If the WiFi is connected properly it should just reload the page or open a tab as normal

What went wrong?
The new tab page has a title of  "Loading..." which just sits there indefinitely.

The same thing occurs on a page refresh (or trying to navigate anywhere else). I just get a loading spinner on the tab indefinitely

Did this work before? Yes Unknown

Chrome version: 61.0.3163.100  Channel: stable
OS Version: 10.0
Flash Version: 

I fix it by closing the entire chrome process and then restarting the browser.

When I "close" I just click the "X" on the top right. That should initiate a normal shutdown, however when I re-open chrome I get the "Chrome did not shutdown properly. Click restore to open your tabs"
 

Comment 1 by mef@chromium.org, Sep 29 2017

Components: Internals>Network>Connectivity
Labels: Needs-Feedback
Hi, could you provide the network details by following instructions here: https://sites.google.com/a/chromium.org/dev/for-testers/providing-network-details
Hi,

So I did some more testing of my own. I don't necessarily have to change WiFi networks to get the undesirable behavior. Simply putting the computer to sleep and then waking it up results in tabs being unable to load. The entire chrome process seems sluggish. I've attached the log file below.

First I made a few requests while chrome ran properly. Then I put the computer to sleep for a minute, woke it up, then tried to open new tabs and navigate to google/facebook etc.
chrome-net-export-log.json
14.0 MB View Download
Project Member

Comment 3 by sheriffbot@chromium.org, Oct 1 2017

Cc: mef@chromium.org
Labels: -Needs-Feedback
Thank you for providing more feedback. Adding requester "mef@chromium.org" to the cc list and removing "Needs-Feedback" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: Needs-Feedback
The log contains many successful requests so it's hard to debug what's going wrong.  Can you elaborate on which requests paused (i.e. the URL and time initiated)?  or could you try navigating to some simple tests URLs, e.g. http://google.com/gen_204
zacdblanco@, were you able to get the info requested in comment #4?

Comment 6 by mge...@chromium.org, Oct 12 2017

Components: Blink>ServiceWorker Internals>Network
Labels: -Needs-Feedback
Status: Untriaged (was: Unconfirmed)
This doesn't need to be Needs-Feedback, you can find the slow requests if you sort by duration. Unfortunately they're not very clear about what's going wrong. Here's what I see:

- Event 12955, google.com, hangs for a minute between URL_REQUEST_START_JOB and SERVICE_WORKER_START_REQUEST
- Event 12988, inbox.google.com, same thing
- Event 13117, google.com, same thing
- Event 12641, facebook.com, which is slow but not as egregiously slow, and looks like it might just be the network

It looks like the problem might be related to service workers, but I'm not sure about that.

Comment 7 by mge...@chromium.org, Oct 12 2017

Issue 768998 has been merged into this issue.

Comment 8 by mge...@chromium.org, Oct 12 2017

Note to people looking at this issue: the bug I just merged in has more useful logs and information.

Comment 9 by horo@chromium.org, Oct 26 2017

Labels: Needs-Feedback
What happens if you disable all extensions?
According to the log, the requests were blocked by "extension Privacy Badger".

Comment 10 by amdr...@gmail.com, Oct 27 2017

I did tried running Chrome with "--disable-extensions" switch. The problem did also occur. Some of the logs I provided were made with this switch active.
Labels: -Needs-Feedback
Dropping Needs-Feedback since it was provided, probably needs someone on service-worker end to take a look?

Comment 12 by shimazu@google.com, Nov 10 2017

Owner: horo@chromium.org
Status: Assigned (was: Untriaged)
(bug triaging) horo@: could you triage this? Does this seem a bug of sw?

Comment 13 by horo@chromium.org, Nov 28 2017

Cc: horo@chromium.org
Components: -Blink>ServiceWorker
Owner: mge...@chromium.org
Sorry for slow response.

According to the site list in the comment https://crbug.com/768998#c19, this hang happens even without service worker.
So I think this issue is not related to service worker. 

mgersh@
You merged the issue 768998 to this issue with this comment:
> This looks like the same symptom as  issue 770201 : requests hang for a long time between URL_REQUEST_START_JOB and SERVICE_WORKER_START_REQUEST.
But I couldn't find such events in the logs in the issue 768998.
Could you please let me know the log file name and the event id?

Comment 14 by horo@chromium.org, Nov 28 2017

Components: Internals>Network>Proxy
I think this is a long standing issue which exist before we shipped service worker in Chrome 40.

 https://crbug.com/156038 
https://superuser.com/questions/691368/how-to-prevent-chrome-from-always-looking-for-a-proxy

It looks like related to proxy resolver.
Adding component:Internals>Network>Proxy.

Cc: mmenke@chromium.org
Components: -Internals>Network>Connectivity -Internals>Network>Proxy UI>Browser>Navigation
If you look at even 12955, you see 60 seconds blocked on service worker.  No proxy involved there.  Same with 12988....

However, if you look at internal requests, like 13039, we're blocked for the same time (Well, 50 seconds, but they overlap with those 60 seconds) on a chrome://extension URL.  12819 shows us blocked for 30 seconds on MojoAsyncResourceHandler, but those 30 seconds are just before when everything else is blocked (And then it's cancelled).

If you look at 11880, we establish a connection, but when the log was stopped, 155 seconds later, we still haven't used it.

Looking at other requests, no event seems to happen for the 40 seconds between 470000 and 510000, though we have a number of live requests.  Some requests seem to start hanging from 450000 to 470000, but plenty of other requests are made in that period.

If you look at 12656, we're blocked for 107 seconds on...I'm not sure what.  We're getting a compressed response, but it's not from the network (No stream is requested), nor does it look like we get anything from the cache or ServiceWorker.  The 100 seconds overlaps both the aforementioned 30 and 50 second periods.  It then completes successfully.



There are some requests that just hang, but none of them were cancelled, and none are requests for root documents.  So, in summary...I think all the slow/hung requests are actually red herrings, possibly related to entering/exiting suspend mode, and are largely unrelated to Chrome being in a completely broken state, scary as they look.

The real issue is that we're unable to navigate.   zacdblanco:  If you can still reproduce this, do you see anything in the loading status bar at the lower left?
Owner: ----
Status: Untriaged (was: Assigned)

Comment 17 by m...@codycook.us, Nov 28 2017

When I experience this issue, it says "waiting for cache..." in the status area.
me:  When did your issue appear?  We've had recent reports of hangs related to entering/exiting suspend mode, but this report seems a bit different (Permanent hang instead of hanging for a couple minutes, and zacdblanco was able to capture a net-internals log, which I don't believe is possible in the other reports), and I think most of the other reports are more recent than when this issue was reported, though I could be wrong about that.
me:  Also, could you try and capture an about:net-export log of this happening.  Start logging, keeping that tab open, enter/exit suspend mode, and then navigate another tab around to reproduce the issue, then stop logging (if you can), and upload the result.

Comment 20 by m...@codycook.us, Nov 28 2017

It started a few months ago. I've been lightly tracking this issue since. My roommate and I both experience Chrome unable to browse after awakening Windows from suspend. I tried to get the net-export before but when I come back from suspend, that window seems to be unresponsive and is stuck waiting for cache.  
I've been utilizing 'taskkill /IM chrome.exe /F' as a workaround to completely close Chrome and reopen it and immediately after Chrome reopens, I can continue browsing. 

I will reproduce again shortly and attach the content. 
Unfortunately the behavior on my end seems to have disappeared. I'm not getting the same behavior as before. Chrome works how I would expect after suspend/resume. I didn't change anything w.r.t extensions or Chrome version when the behavior disappeared.
Cc: mmanchala@chromium.org
 Issue 787477  has been merged into this issue.
 Issue 774528  has been merged into this issue.
I've noticed that if I shut off my internet connection (by just pulling out the cable modem cord) that things reset and Blink-based browsers start working again with no problems. 
Recently started happening to me too. (Windows 10, latest Chrome)

After wake from sleep, Chrome loses network connectivity and just hangs trying to load things.

Using chrome:restart does fix it, but causes Chrome to report that it did not shut down correctly.

Most of the similar reports to this have reported that it's related to having virtual network adaptors enabled. In my case I've got Docker installed, which does include its own virtual network adaptors.
I'm not sure if it is going to reoccur, but following some advice in:

https://www.reddit.com/r/chrome/comments/6iy32a/chrome_freezes_when_waking_from_sleep/

I disabled 'automatically detect proxy settings' within Windows 10 settings, and it seems to have stopped Chrome from hanging.

Comment 27 by mvogt...@gmail.com, Dec 30 2017

Same to me... The disabling the "auto detect proxy settings" fixed the problem. Is there a way to debug this?

Comment 28 by amdr...@gmail.com, Dec 30 2017

Recently it's been happening to me more and more often. It's getting pretty ridiculous now.
Shutting off "auto detect proxy settings" seems to have fixed the situation for me.
Components: -Internals>Network Internals>Network>Connectivity Internals>Network>Proxy
Are you running McAfee ScriptScan?

Comment 32 by amdr...@gmail.com, Jan 3 2018

I don't. I use Norton Security.
I do experience this bug a lot ore often now.
I‘m using Windows built in security only..
Disable proxy auto-detect.

Is anyone on this thread not fixed by simply disabling auto-detect?

For those where disabling auto-detect helps, a few questions for data:

  * Are you using anti-virus software? If so, does disabling it resolve the problem?

  * Are you using virtual network adapters? If so what, and how did you configure them.

  * If you change your network settings to uncheck auto-detect, and instead set an explicit PAC URL of "http://wpad/wpad.dat", does it still hang? (This basically disables DHCP-based wpad, but keeps the DNS based one).

  * Provide a net-log dump per comment #1, that captures events before, during, and after the wake from sleep.
Also, please include the results of running in a cmd.exe window:
wmic nic get AdapterType, Name, Installed, Speed, PowerManagementSupported

If you have special adapter types, does disabling them speed things up? Does unchecking auto-detect, and instead specifying "http://wpad/wpad.dat" speed things up? What happens when you use Edge and navigate to a new URL immediately following  the network transition?
I also observe this. XPS 15 9560

C:\Users\bohdan>wmic nic get AdapterType, Name, Installed, Speed, PowerManagementSupported
AdapterType     Installed  Name                                                  PowerManagementSupported  Speed
                TRUE       Microsoft Kernel Debug Network Adapter                FALSE
Ethernet 802.3  TRUE       Killer Wireless-n/a/ac 1535 Wireless Network Adapter  FALSE                     866700000
                TRUE       Bluetooth Device (RFCOMM Protocol TDI)                FALSE
                TRUE       Microsoft Wi-Fi Direct Virtual Adapter                FALSE
Ethernet 802.3  TRUE       Bluetooth Device (Personal Area Network)              FALSE                     3000000
                TRUE       WAN Miniport (SSTP)                                   FALSE
                TRUE       WAN Miniport (IKEv2)                                  FALSE
                TRUE       WAN Miniport (L2TP)                                   FALSE
                TRUE       WAN Miniport (PPTP)                                   FALSE
                TRUE       WAN Miniport (PPPOE)                                  FALSE
Ethernet 802.3  TRUE       WAN Miniport (IP)                                     FALSE
Ethernet 802.3  TRUE       WAN Miniport (IPv6)                                   FALSE
Ethernet 802.3  TRUE       WAN Miniport (Network Monitor)                        FALSE
                TRUE       Microsoft Teredo Tunneling Adapter                    FALSE
Ethernet 802.3  TRUE       TAP-Windows Adapter V9                                FALSE                     100000000
Ethernet 802.3  TRUE       Hyper-V Virtual Switch Extension Adapter              FALSE
Ethernet 802.3  TRUE       Hyper-V Virtual Ethernet Adapter                      FALSE                     10000000000
Ethernet 802.3  TRUE       Hyper-V Virtual Switch Extension Adapter              FALSE
Ethernet 802.3  TRUE       Hyper-V Virtual Ethernet Adapter #2                   FALSE                     10000000000
Ethernet 802.3  TRUE       Microsoft Wi-Fi Direct Virtual Adapter #2             FALSE                     9223372036854775807

Thank you, on Windows, turning off automatic detection of proxy settings helped. FWIW, I saw this issue after using the wifi at another person's house over the course of several days and then coming back home. On my home wifi, when the machine got back on my own wifi, Chrome stopped loading pages.
Cc: vamshi.k...@techmahindra.com
 Issue 797502  has been merged into this issue.
Cc: marchuk@google.com asanka@chromium.org blumberg@chromium.org ligim...@chromium.org eroman@chromium.org cbentzel@chromium.org marcore@chromium.org evep@google.com kavvaru@chromium.org kotah@chromium.org cvintila@chromium.org feiling@chromium.org yihongg@chromium.org hdodda@chromium.org
 Issue 755537  has been merged into this issue.
Owner: eroman@chromium.org
Status: Assigned (was: Untriaged)
 Issue 795674  has been merged into this issue.
This bug has still not been diagnosed. Following is a summary of what is known so far.

The issue appears to be limited to these conditions:
 * Running Chrome on Windows
 * Have proxy auto-detect enabled in system settings
 * Have some virtual network adapter(s) active [1]
 * Wake up from hibernate/sleep, or network changes while awake

[1] Reports name the following: OpenVPN, GlobalProtect, VMWare, HyperV, npcap (but not confirmed if all in that subset contribute)

In many of the reports Chrome appears to be frozen - no tabs can be loaded, including non-network dependent ones like chrome://tracing, which points to a browser thread hang rather than just slow async networking tasks.

To date we have been unable to reproduce internally, despite trying a number of different machine configurations, installed adapters, and installed software. Without being able to reproduce, progress has been very slow. If anyone has detailed instructions on how to get a repro on a clean Windows 10 machine, that would be great.

In terms of other data, we have been able to gather:

* NetLogs from several customers
* chrome://tracing dump
* minidump
* log from instrumented binary
* pcap file of network capture

The NetLog files have not shown any single cause of slowness - slow tasks show up in a variety of areas including service worker, disk cache, web proxy auto-discovery, certificate verification, and resource scheduler. And in other logs there weren’t any slow requests corresponding to the slowness users reported.

These logs have been challenging to interpret do to timings being skewed by the hibernation - on Windows our timer continues to tick during sleep, and the suspend itself is not labeled in the log, which led to some misunderstandings when initially reading the logs (things that appeared extremely slow actually weren’t).

Only a minority of the NetLogs showed slow WPAD activity in the log (but all the logs indicated that auto-detect had been enabled, so that correlation remains strong). We know experimentally that proxy auto-detect is a necessary precondition, as in all known cases disabling proxy auto-detect resolved the user slowness.

We had one customer helpfully provide additional data (the tracing dump, minidump, pcap, and log from instrumented binary are all from a single user). The trace file showed very long task starting a URLRequest (Chrome_IOThread), and also tasks for DNSConfig and DHCP-based WPAD. Which at first seemed interesting, however the times also include the duration asleep which when accounted for doesn’t explain the problem. I couldn't get anything else of use out of this trace.

The pcap file also doesn’t appear to show anything unusual. Looking for the artifacts of WPAD (both the DHCP and DNS-based mechanisms) showed a normal looking request pattern.

Another experiment we ran was to extract Chrome’s proxy resolution logic to a separate binary (proxy_tool.exe) and add a ton of logging to it. Frustratingly, when the willing user ran this everything worked just fine and there was no big slowness coming from proxy auto-detect. So if the proxy-auto-detect code is the one causing problems, running it outside of the Chrome binary doesn’t seem to reproduce this issue (assuming that experiment wasn’t in some other way flawed).

The minidump file we received did have a few interesting things. Most notably, it showed 8 threads running dhcpsrv!DhcpRequestParams (which we issue from GetPacURLFromDhcp to query the adapter). Having 8 threads running this code is unexpected, as we expect only 1 concurrent task for each adapter queried. It could be a consequence of having abandoned tasks running which did not yet complete. At any rate, this threadpool should increase to 12 threads, so having 8 threads in use shouldn’t be giving a stall.

Another unusual thing in the minidump is the presence of wininet!AutoProxyResolver running in our process. We certainly don't call that code as we are running our own proxy resolver. I initially thought this may have to do with McAffee ScriptScan which was also loaded in the process, however the presence of ScriptScan appears to be limited to just this user so probably not likely the cause of this larger issue.

Given that in most reports Chrome is completely frozen, and that our NetLogs don’t show anything clear, my guess is the underlying problem is a hang on the UI thread, or something with how tasks are getting scheduled. Although why this would be related to proxy-auto-detect then, as it runs the known slow code on worker threads.

Somewhat separately, I changed the timeout for fetching PAC scripts from 5 minutes to 30 seconds. However none of the data indicates that this related to the reports we are seeing here (none of the NetLogs shows we are even fetching a PAC script for auto-detect with any kind of slowness).

The best way forward to resolving this is if we are able to reproduce the issue in-house. 

Second best could be if a user that can reproduce is willing to run a more experiments to capture data. Working off the theory of a hung thread, an ETW trace is probably the next thing to try.

Without being able to diagnose the cause, the workarounds for users are:
 * Disable proxy auto-detect in system settings (this is a good thing to do regardless of this bug)
 * User keeps auto-detect enabled, but runs Chrome with --winhttp-proxy-resolver

A Chrome-level change could be to disable DHCP-based WPAD in Chrome, however without understanding the problem this is not a great solution.

I will tentatively leave this assigned to me, however if someone else wants to jump on this bug by all means please do, as I may have missed something.

Cheers.
Can anyone reproducing the problem capture an ETW trace of the problem and send it to me?

To do so, follow steps 1-3 listed under "Recording ETW traces" at https://randomascii.wordpress.com/2015/09/01/xperf-basics-recording-a-trace-the-ultimate-easy-way/

Except in step (2) use the "Tracing to file" option instead of "Circular buffer tracing".

Comment 44 by amdr...@gmail.com, Jan 17 2018

I'm not sure if this is related but I have a similar/identical issue on Chrome on my Android Phone. 
It's not the Chrome itself but the WebView part of the OS. At some times the Internet freezes for no reason. On Facebook Lite the images are not being shown, new posts are not being loaded and links don't work. On Messenger Lite, the messages not not being send (sending status). On Chrome the webpages are not being loaded at all.

After a couple of minutes the issue subsides and the phone work fine. Usually the turning WiFi off and on again fixes the issue but not always. I've noticed this behavior too on mobile data when I'm at work. It happens a lot more often on WiFi. I've tried to installing a different OS versions (KK, L, M, N). It didn't fix it. I'm mentioning this because the symptoms of the issue are identical to the ones I'm getting on Windows.

About reproducing this issue. Have you tried a different router?
I'm using TP-Link C2600 with OpenWRT/LEDE with encrypted DNS from dnscrypt-proxy package. Maybe other users can post their router's model and OS version?


RE comment #44: This bug is specific to Chrome on Windows.
Please file a separate bug report for your Chrome on Android issue.

Comment 46 by amdr...@gmail.com, Jan 17 2018

I will do that. I'm just mentioning this because of this:

"The issue appears to be limited to these conditions:
 * Running Chrome on Windows"

It might not be limited to Windows after all. 

Comment 47 by roy...@google.com, Jan 26 2018

Labels: -Pri-2 Pri-1
 Issue 805413  has been merged into this issue.
Summary: Chrome hangs after waking up from sleep/switching network when using proxy auto-detect and have virtual adapters enabled (was: After Switching WiFi networks, chrome doesn't load new tabs and cannot connect to the internet)
 Issue 795543  has been merged into this issue.
Re: comment 42: I do not believe I have any " * Have some virtual network adapter(s) active [1]", which is listed as a constraining condition.

Am happy to capture logs or try out experiments as this is 100% reproducible on this personal Win10 Lenovo laptop.
@tprachar: That would be great! I will be in contact with you via email.
The ETW traces are tricky. I don't have an answer, but I have some stuff to share.

But, a few snippets that I'll share in the hopes that they spur further progress. Two relevant processes/services are:

Microsoft Windows Profiler
Line #	Process	Display Name
55	svchost.exe (2372)	DHCP Client
64	svchost.exe (2988)	WinHTTP Web Proxy Auto-Discovery Service

In particular, svchost.exe (2988) has two threads that sit idle for exactly 60 s and then wake up about 7-8 s before Chrome becomes heavily active. It looks like the proxy auto detection finishes and then lets Chrome start loading the page and then it takes 7-8 s for things to really start moving (this means I was looking at the wrong part of the trace since I was looking at the place where activity resumes, which appears to be too late).

The rough sequence of events in svchost.exe (2988) is that thread 868 is woken up after a 60.1 s nap at 160.998397265 into the trace by thread 14668. Meanwhile thread 14668 had been napping (technical term that) for 60.0 s. Thread 14668 was woken up by thread 14048 (still in the same process) which had been napping on-and-off, waking up occasionally.

The partial stack from which 14048 wakes up 14668 is this:
  winhttp.dll!AutoProxyResolver::BackgroundWpadDetection
  winhttp.dll!AutoProxyResolver::DoBackgroundDetection
  winhttp.dll!ForegroundWpadDetection
  winhttp.dll!InternalDetectAutoProxyUrl

The weird thing is that I can't find any connection from svchost.exe (2988) to the chrome browser process (10544). 2988 does wake up 10544 around that time but it looks like it is just a timer expiring, so that is meaningless.

Also, all this analysis isn't really telling us anything that we didn't know already. The fact that svchost.exe is apparently doing BackgroundWpadDetection is interesting - maybe.

It sounds like maybe I can't find a connection with the svchost.exe processes because there is none. I'll look a bit more.

I also looked for context switches where Chrome was woken up on a call to winhttp.dll!WinHttpGetIEProxyConfigForCurrentUser. It is always called from chrome.dll!net::ProxyConfigServiceWin::GetCurrentProxyConfig. However these waits are always completed within a few seconds of resuming from sleep. There must be some follow-on work which is then delayed for a minute and I still can't find it.
Hi all,

I am keenly interested in solving the issue. 
I have been experiencing it since almost 1 year now. And I am quite fed up. So I want to get head on to the issue now.
I can reproduce the problem, and try my best to capture all logs. 

I am a fresh CS grad, and looking forward for active contribution towards open source.
Therefore, I'm a newbie to open-source, so please excuse my mistakes.


As mentioned in Comment-42
"Second best could be if a user that can reproduce is willing to run a more experiments to capture data".

Someone tell me all the required experiments and tools.
(So that maybe a few things I can analyze myself too.)

Disabling proxy auto-detect worked fine for me.

Tried several wireshark captures, Found some wpad requests, but mostly couldn't interpret exactly what is causing the issue.

I've did a chrome://net-export few days back, but without knowing how to analyse that .json I was quite helpless.
Instructions to get/build chromium at at https://www.chromium.org/developers/how-tos/get-the-code.

I assume the hang is either in the WinHttpIEProxyConfigForCurrentUser system call (https://cs.chromium.org/chromium/src/net/proxy_resolution/proxy_config_service_win.cc?sq=package:chromium&l=133) or in fetching the PAC file (https://cs.chromium.org/chromium/src/net/proxy_resolution/pac_file_fetcher_impl.cc?sq=package:chromium).  I assume dhcp_pac_file_fetcher_win.cc in the same directory doesn't come into play here, but I'm not the proxy configuration expert.  If it does, it's also a possibility.

chrome://net-internals can load the output of chrome://net-export (It can also capture the same data live.  Eventually we plan to remove it in favor of an app that does the same thing, to reduce Chrome's size and to make working on it a bit lighter weight)
 Issue 816532  has been merged into this issue.

Comment 58 by ras...@mindplay.dk, Feb 27 2018

Counter to what I reported in #816532, disabling VMWare seemed to give only temporary relief from this issue last night - it returned this morning.

Having disabled every network interface except the LAN and Wi-Fi, if anything, just made the issue worse.

I managed to grab a screen capture of this - as you can see, there's something odd going on with the network interface status, it seems to be toggling itself on and off, meanwhile the spinner in the Chrome tab is jerky, and the "downloading proxy script" prompt (not visible in this capture) keeps flashing at the bottom of the browser window.

network-bug.gif
288 KB View Download
Hello,

I contacted support and after a decent amount of chatting he suggested me to look here and report back my findings, so that's what I'm gonna do.

I'm currently experiencing something like this myself and have a bit of information.

I have tested this on Google Chrome, Canary and Chrome Beta. Google Chrome and Chrome Beta behaves the same where Canary behaves as expected when opening a webpage and will not be included in the issue below.

OS: Windows 10 (64bit)
Version: Google Chrome: 64.0.3282.186, the rest I don't have on hand right now. They where downloaded at the same time last night, if you need the verions on them too let me know and I will provide.

Steps to reproduce the problem:
1. Leave Chrome open or closed, no difference from my findings.
2. Put PC to sleep.
4. Wake the PC up.
5. Try to open a new tab or try to refresh an existing one
6a. If you where quick you can now browse for 2-4 before step 7 happens.
6b. If you where a bit slow you have reached step 7.
7. Issue reproduced

What is the expected behavior?
reloading or opening will result in that page getting laoded

What went wrong?
After about 2-4 sec after Windows Desktop has showed pages start to load infinitely, before this it works as it should.

about 10-20 sec after step 7. everything works again, no restarts, no crashes, no error messages.
I want to add on this that no pages work, even not chrome://net-internals.

This was reproduced on a PC running Ethernet and a PC running WIFI. One is running Docker (I saw you where talking about VM's) the other is not. One is running AntiVirus, the other isn't.

So on my case, only Google Chrome and Chrome Beta is effected, no other applications (No other Browser had this behavior too.)

I hope this information is useful
I see I didn't make myself clear on this part
"about 10-20 sec after step 7. everything works again, no restarts, no crashes, no error messages.
I want to add on this that no pages work, even not chrome://net-internals."

What I meant is that when the issue is doing it's stuff no pages work, not even chrome://net-internals.

After the issue everything is fine and every page loads.

Sorry for an extra notification :P
Thanks everyone for the feedback and patience!

I think I understand enough on what is happening to work on a fix. I will update with more details shortly.

As a reminder, until we have a fix you can mitigate the problem by any of the following strategies:

 (a) Disable proxy auto-detect (recommended)
 (b) Instead of proxy auto-detect, configure your settings to the explicit PAC script "http://wpad". This accomplishes the same thing as proxy auto detect in most environments (it only differs if the environment only supports DHCP-based WPAD)
 (c) Run chrome with the command line flag --proxy-resolver-winhttp

RE comment #58: Thanks for the info! Yes, the changing interface status is relevant.
Cc: gab@chromium.org fdoray@chromium.org
Here is a lengthy technical write-up for those that are curious (no need to read this if you are just following the status of the bug fix).

Here is what I believe went wrong:

The way Chrome schedules tasks has been changing since circa Chrome 57, with the move towards a global task scheduler.

The code that handles DHCP-based WPAD was originally written to use a base::SequencedWorkerPool with 12 threads. As such it could run 12 tasks in parallel, used when probing multiple interfaces for WPAD. The migration to a unified task scheduler refactored the implementation of SequencedWorkerPool such that it would redirect these tasks to the global task scheduler, with a task priority of USER_VISIBLE.

Although the maximum number of threads was seemingly kept at 12 in this transition, in reality the underlying scheduling bucket for USER_VISIBLE is capped to a smaller number (namely 8).

Even worse, the thread pool for USER_VISIBLE is shared by a variety of other important tasks and not just exclusively used by our DHCP WPAD code. Notably, DNS tasks are scheduled to this pool, as are various dependencies for loading a tab. Hence when this shared pool becomes saturated (by DHCP tasks, or otherwise), a variety of things simply stop working, with the result of the Chrome browser becoming hung. Relatedly, there have been reports correlating slow DNS to Chrome browser hangs, which sounds like exactly this same issue.

As an example, loading chrome://version/ (which doesn't have a direct network dependency) will hang when the USER_VISIBLE pool is stalled.

While I haven’t been able to reproduce the preconditions for stalling USER_VISIBLE via DHCP tasks myself, I can definitely reproduce the symptoms of this bug by artificially saturating it with slow to complete tasks.

For users reproducing this, the saturation of the USER_VISIBLE pool is a consequence of repeated network interface changes triggering WPAD, and the per-interface DHCP probes taking a fantastically long time. Although the code abandons these tasks based on a timeout, they still clog up the pool as they are not cancellable.

More concretely, I can confirm this hypothesis in the minidump received in  Issue 755537 . Here we see 8 threads running dhcpsvc!DhcpRequestParams. Unfortunately when first analyzing this I didn't recognize that the threadpool was being capped at 8 threads rather than 12, so missed that the USER_VISIBLE pool was currently blocked.

Also frustratingly, it appears that chrome://tracing/ fails to display slow tasks if they have not completed by the time the capture ended. We received a chrome://tracing log from a user, but it did not show this scheduling fail. Presumably the capture was ended before reaching a quiescent state.

The remaining aspect of this bug that I don’t yet understand is why dhcpsvc!DhcpRequestParams is taking on the order of minutes for these particular users.

We know that the repro scenario has to do with specific timings (wake-up from sleep, network interface changes), as well as specific interface types (correlation to particular virtual adapters). However when I had users run proxy_tool.exe to measure the performance of DhcpRequestParams it did not show such latencies. Also, we received a packet capture from a user, which showed that the DHCP request completed in under 4ms once it hit the network.

Understanding why dhcpsvc!DhcpRequestParams is misbehaving would be nice since there is surely a mitigation to be had here that involves something to the effect of only issuing the request at a certain time, or skipping particular interfaces.

But barring that, we can still address this bug by working around the obnoxiously large delays in DHCP by tackling the task starvation problem.

While it seems that the task scheduler changes pushed this over the edge by reducing parallelism from 12 to 8 or less, and by turning this situation into a browser hang rather than just loading of network resources, the previous thread pool approach was still problematic (it just had a higher threshold before falling over).
My recent CL https://chromium-review.googlesource.com/c/chromium/src/+/881273 change this:
1. When a task is blocked on GetAdapterAddresses [1] or DHCPRequestParams [2] for at least 10ms [3], the capacity of the thread pool is incremented by 1 until the task is unblocked.
2. The number of concurrent DhcpProxyScriptFetcherWin task is limited to 12.
Does it partially mitigate this bug?

[1] https://chromium-review.googlesource.com/c/chromium/src/+/881273/20/net/proxy_resolution/dhcp_pac_file_fetcher_win.cc#408
[2] https://chromium-review.googlesource.com/c/chromium/src/+/881273/20/net/proxy_resolution/dhcp_pac_file_adapter_fetcher_win.cc#255
[3] Could be 0ms if we used WILL_BLOCK instead of MAY_BLOCK.
Cc: morlovich@chromium.org
cc Maks, who has also been looking into task starvation issues recently
Project Member

Comment 65 by bugdroid1@chromium.org, Mar 3 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1810d4384f2f5b79854e5ab54bbdf1f69be8d7fa

commit 1810d4384f2f5b79854e5ab54bbdf1f69be8d7fa
Author: Eric Roman <eroman@chromium.org>
Date: Sat Mar 03 00:11:15 2018

Add NetLog details for DHCP-based WPAD.

Adds NetLog instrumentation for how the network adapter list was read on Windows:
 * The enumerated network adapters
 * How long it took to schedule the task to worker thread
 * How long it took to schedule the reply task to origin thread
 * How long it took to query the adapters
 * Which "adapter fetcher" won the overall race for WPAD

Subsequent CLS will add more logging for the individual "adapter fetchers".

TBR=stevenjb@chromium.org

Bug:  770201 
Change-Id: I9f704614e4a991e9a56879fc81637d41782dce1f
Reviewed-on: https://chromium-review.googlesource.com/876921
Commit-Queue: Eric Roman <eroman@chromium.org>
Reviewed-by: Eric Roman <eroman@chromium.org>
Reviewed-by: Steven Bennetts <stevenjb@chromium.org>
Reviewed-by: Matt Menke <mmenke@chromium.org>
Cr-Commit-Position: refs/heads/master@{#540693}
[modify] https://crrev.com/1810d4384f2f5b79854e5ab54bbdf1f69be8d7fa/chromeos/network/dhcp_pac_file_fetcher_chromeos.cc
[modify] https://crrev.com/1810d4384f2f5b79854e5ab54bbdf1f69be8d7fa/chromeos/network/dhcp_pac_file_fetcher_chromeos.h
[modify] https://crrev.com/1810d4384f2f5b79854e5ab54bbdf1f69be8d7fa/net/log/net_log_event_type_list.h
[modify] https://crrev.com/1810d4384f2f5b79854e5ab54bbdf1f69be8d7fa/net/proxy_resolution/dhcp_pac_file_fetcher.cc
[modify] https://crrev.com/1810d4384f2f5b79854e5ab54bbdf1f69be8d7fa/net/proxy_resolution/dhcp_pac_file_fetcher.h
[modify] https://crrev.com/1810d4384f2f5b79854e5ab54bbdf1f69be8d7fa/net/proxy_resolution/dhcp_pac_file_fetcher_win.cc
[modify] https://crrev.com/1810d4384f2f5b79854e5ab54bbdf1f69be8d7fa/net/proxy_resolution/dhcp_pac_file_fetcher_win.h
[modify] https://crrev.com/1810d4384f2f5b79854e5ab54bbdf1f69be8d7fa/net/proxy_resolution/dhcp_pac_file_fetcher_win_unittest.cc
[modify] https://crrev.com/1810d4384f2f5b79854e5ab54bbdf1f69be8d7fa/net/proxy_resolution/pac_file_decider.cc
[modify] https://crrev.com/1810d4384f2f5b79854e5ab54bbdf1f69be8d7fa/net/proxy_resolution/pac_file_decider_unittest.cc

Project Member

Comment 66 by bugdroid1@chromium.org, Mar 5 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/12ace03c91bee8ec27cfffff2d80e86704ed754d

commit 12ace03c91bee8ec27cfffff2d80e86704ed754d
Author: Eric Roman <eroman@chromium.org>
Date: Mon Mar 05 21:37:35 2018

Skip network adapters that are not in state IfOperStatusUp when probing for WPAD via DHCP.

This is a speculative fix for calls to dhcpsvc!DhcpRequestParams being very slow following network changes.

Bug:  770201 
Change-Id: I506f44d51ee4d14625bb4d64974f6e4dd618c103
Reviewed-on: https://chromium-review.googlesource.com/946870
Commit-Queue: Eric Roman <eroman@chromium.org>
Reviewed-by: Matt Menke <mmenke@chromium.org>
Cr-Commit-Position: refs/heads/master@{#540947}
[modify] https://crrev.com/12ace03c91bee8ec27cfffff2d80e86704ed754d/net/proxy_resolution/dhcp_pac_file_fetcher_win.cc

Discovered a work-around, which may be helpful in diagnosing the issue.

I disable the Wi-Fi adapter. (I'm on a laptop with both Wi-Fi and LAN.)

That's it - the "proxy script" prompt immediately disappears, as Chrome connects via the LAN instead.

Once it's connected, I can re-enable the Wi-Fi and everything is back to normal.

Cc: csharrison@chromium.org
 Issue 818068  has been merged into this issue.
Can those reproducing this try the very latest version of Chrome (Canary) and confirm whether this is resolved?

   https://www.google.com/chrome/browser/canary.html

Thanks
@fdoray: RE comment #63 -- yes, that absolutely does help. Thanks!
Seems like there are no issues for me on Canary, only tested twice.
Status: Fixed (was: Assigned)
Thanks!

I also heard back from another user that Canary resolves the issue, and was able to confirm with local testing.

The primary fix is in Chrome 66 (comment #63), which is due to be released April 17.

Comment 73 by gab@chromium.org, Mar 21 2018

Thanks Francois (re. #63). Apologies for breaking this when doing the SequencedWorkerPool redirection, the original mapping of 8 threads was because it was mapping BlockingPool (3) and CachePool (5). I had totally missed this one (and hence the fact that some pool's cap was individually critical). Only realized its existence in the corner of //net when we tried to delete sequenced_worker_pool.h... Sorry once again, I think we're in a better state than ever now with #63 though, yay! Thanks everyone who put time and effort into diagnosing this intricate regression.
 Issue 820797  has been merged into this issue.
Blocking: 644030

Sign in to add a comment