New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 755537 link

Starred by 14 users

Issue metadata

Status: Duplicate
Merged: issue 770201
Owner:
Closed: Jan 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug



Sign in to add a comment

very slow loading file wpad.dat after changing network (cable to wifi) or recovering from hibernate

Project Member Reported by marcore@chromium.org, Aug 15 2017

Issue description

Chrome Version: Google Chrome 60.0.3112.90 (official 6ee4392c6cef820c64ff50b5fd5eeb8596cd0394-refs/branch-heads/3112@{#702}) 
OS: Windows NT: 6.1.7601 SP1 (x86_64)

What steps will reproduce the problem?
(1) configure WPAD using DNS
(2) hibernate laptop with chrome tabs open
(3) wake up laptop

What is the expected result?
chrome responsive
What happens instead?
chrome unresponsive for about 2 minutes or more

Chrome is working as expected on version 56 and 57.
The issue persists on version 58 ,59, and 60.

customer net export:
https://drive.google.com/open?id=0B01ZVp8vDQocSjg5VEp2TXJ2RUE

id 16245: PROXY_SCRIPT_DECIDER Start Time: 2017-08-14 18:00:20.582
id 16501: HOST_RESOLVER_IMPL_JOB wpad Start Time: 2017-08-14 18:02:14.760
id 16512: SOCKET Start Time: 2017-08-14 18:02:14.767 finished downloading wpad.dat

customer chrome logs:
https://drive.google.com/open?id=0B01ZVp8vDQocSDY4QmVNb2k0WHc
customer HAR file:
https://drive.google.com/open?id=0B01ZVp8vDQocMUZJNXpkZXFWNk0

 
Owner: bustamante@chromium.org
Hi richard, could you please help triage this issue ?

Summary: very slow loading file wpad.dat after changing network (cable to wifi) or recovering from hibernate (was: very slow loading file wpad.dat after hibernate)
customer info: 
https://drive.google.com/open?id=1gBOiiNYIfOxRJzTYkLELr7Xkagsqp7CJOtOwHkMtu6c

Cc: blumberg@chromium.org pastarmovj@chromium.org ligim...@chromium.org
Labels: Needs-Triage-M60

Comment 4 by mmenke@chromium.org, Aug 16 2017

Looks like we're stuck in net/proxy/dhcp_proxy_script_fetcher_win.cc, trying to get a PAC script from DHCP.  It takes two minutes for it to fail with "ERR_PAC_NOT_IN_DHCP".  Once that happens, we fall back to getting a PAC script via DNS instead, and that works basically instantly.

I don't know the code, but skimming through the history, I have no idea why the behavior here would have changed in M58.

https://chromium.googlesource.com/chromium/src/+log/master/net/proxy/ and https://chromium.googlesource.com/chromium/src/+log/master/components/proxy_config are the logs of the most obviously relevant directories.  Chrome 58 would mean changes made somewhere on the order of 5-8 months ago.
Didn't have a proxy setup on our end to verify this issue, hence adding "TE-NeedsTriageFromMTV" for further triage.
Labels: TE-NeedsTriageFromMTV

Comment 7 by roy...@google.com, Aug 23 2017

Cc: dskaram@chromium.org
Owner: krishna...@chromium.org
Krishna: Do we have setup for this and can we help setup this infrastructure if it doesn't exist ?

Comment 8 by roy...@google.com, Aug 23 2017

Labels: -Pri-2 Pri-1
Components: Enterprise
Cc: asanka@chromium.org
Labels: -TE-NeedsTriageFromMTV
Unfortunately chrome desktop team do not have the setup for reproducing the issue.

As mmenke@ mentioned in #4, there is no change in code base during M58 time frame. The latest one during M60 and the previous during M54. 

Lopping to asanka@ for more inputs.

We will track this issue for M60
Labels: -M-58 -M-59

Comment 12 by roy...@google.com, Aug 24 2017

Cc: cbentzel@chromium.org evep@google.com
Labels: Restrict-View-Google
Owner: asanka@chromium.org
Assigning to @asanka and CCing @cbentzel as FYI. 

Comment 13 by roy...@google.com, Aug 24 2017

Labels: -Restrict-View-Google
Owner: eroman@chromium.org
Status: Assigned (was: Untriaged)
There seems be a few instances of this:

t=182065 [st=     0] +PROXY_SCRIPT_DECIDER  [dt=114511]
t=182065 [st=     0]    PROXY_SCRIPT_DECIDER_WAIT  [dt=2000]
t=184065 [st=  2000]   +PROXY_SCRIPT_DECIDER_FETCH_PAC_SCRIPT  [dt=112178]
                        --> source = "WPAD DHCP"
t=296243 [st=114178]   -PROXY_SCRIPT_DECIDER_FETCH_PAC_SCRIPT
                        --> net_error = -348 (ERR_PAC_NOT_IN_DHCP)
t=296243 [st=114178]    PROXY_SCRIPT_DECIDER_FALLING_BACK_TO_NEXT_PAC_SOURCE
t=296243 [st=114178]   +HOST_RESOLVER_IMPL_REQUEST  [dt=5]
                        --> address_family = 0
                        --> allow_cached_response = true
                        --> host = "wpad:80"
                        --> is_speculative = false
t=296243 [st=114178]      HOST_RESOLVER_IMPL_IPV6_REACHABILITY_CHECK
                          --> cached = true
                          --> ipv6_available = false
t=296243 [st=114178]      HOST_RESOLVER_IMPL_JOB_ATTACH
                          --> source_dependency = 16501 (HOST_RESOLVER_IMPL_JOB)
t=296248 [st=114183]   -HOST_RESOLVER_IMPL_REQUEST
t=296248 [st=114183]    PROXY_SCRIPT_DECIDER_FETCH_PAC_SCRIPT  [dt=328]
                        --> source = "WPAD DNS: http://wpad/wpad.dat"
t=296576 [st=114511] -PROXY_SCRIPT_DECIDER

And lots of network address changes.

The chrome_debug.log contains:
[9968:3548:0808/104252.236:VERBOSE1:dhcp_proxy_script_adapter_fetcher_win.cc(262)] Error fetching PAC URL from DHCP: 2

... where 2 == ERROR_FILE_NOT_FOUND
I believe is what we expect when there's no PAC URL configured via DHCP.

Looks like the dhcp_proxy_script_fetcher is taking a long time. It's not timing out either.

Are we picking the wrong interfaces to probe for PACs over DHCP? +eroman to look further.
OP: Is it possible to get some details about the network interfaces that this user has? It looked like they had a GPRS radio as well as WiFi?
Cc: kavvaru@chromium.org hdodda@chromium.org
 Issue 750507  has been merged into this issue.
Cc: marchuk@google.com
@asanka
the customer has tried on two different laptop:
both don't have GPRS radio only wifi + Bluetooth
Latitude E5470 
* WiFi as specified on Dell's website: IntelĀ® Dual-Band Wireless-AC 8260 Wi-Fi + BT 4.1 Wireless Card (2x2) 
Latitude 5580 
* WiFi as specified on Dell's website: Qualcomm QCA61x4A 802.11ac Dual Band (2x2) Wireless Adapter+ Bluetooth 4.1 

Flag enable net export https://drive.google.com/open?id=0B01ZVp8vDQocR29janhrTGYzY2s

NO Flag enable net export https://drive.google.com/open?id=0B01ZVp8vDQocTl9qeTFrRW5FR28

on  Chrome 61.0.3163.79
they don't see any delay while launching Chrome with --winhttp-proxy-resolver switch
but they can still see the delay while opening chrome without the flag added (--winhttp-proxy-resolver)

Thanks for attaching all the logs!

As a workaround for now I would recommend the user change their proxy settings (either system wide, or just for Chrome) from "Automatically detect settings" to "Use automatic configuration script" with URL:  http://wpad/wpad.dat

This will effectively disable the use of DHCP based discovery, which seems to be the problem area.

Would you be able to carry out some extra experiments to capture some data for us? (one would be to capture a chrome://tracing/ log while the problem happens, and the other experiments would entail running a custom binary on the affected system).

From the logs, my best guess would be that it is actually enumerating the network adapters that is taking a ridiculously long time, however that would be unexpected (but easily tested with a test binary).

There are also some unusual timings in the logs which suggest the IO thread is either blocked, or is not scheduling timer tasks as expected. For instance there are some 31+ seconds spent in PROXY_SCRIPT_DECIDER_WAIT, however that should never be more than 2 seconds set by a timer. The log also shows a number of timezone changes logged by Chrome around the network change events, which makes me a bit suspicious if that could affect scheduling, but wouldn't fully explain the slowness seen elsewhere.

We can try some changes on the Canary channel too -- maybe the easiest would be a total timeout on DHCP probing to close the current gaps.

Thanks for your patience, I know this has been a frustrating issue for your customer!

Comment 20 Deleted

Re: https://bugs.chromium.org/p/chromium/issues/detail?id=750507

Could the fact that I have a VPN client (GlobalProtect) installed have anything to do with it? I use it sparringly, and only on-request (it's not trying to auto-connect on startup, that I know of).
Thanks for providing the trace file!

It does confirm that there is something blocking the Chrome IO thread (can see one task for instance take 40 seconds) however doesn't precisely identify it).

Would you be able to capture a mini-dump capturing the stalled thread?

This is slightly more involved then getting a trace, and needs to be timed right so the snapshot is taken while things are stalled:

We have also seen bug reports where lots of threads (thousands) are being created in the utility process (either by malware, or by other yet to be determined causes).

This could also be accounting for the large memory usage that you are seeing.

If you reproduce this again, could you capture a minidump of the utility process and upload it here?

Here are steps for doing that:

 (1) Download Process Explorer from Microsoft:
     https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer

 (2) Launch process explorer, and find the chrome browser process in the list. There will be a tree of chrome.exe processes, the "browser process" is the root of that tree. You can open the Chrome task manager to determine its PID for confirmation.
  
 (3) Reproduce the slowness in Chrome

 (4) In process explorer, right click on the chrome.exe and choose "Create Dump -> Create Minidump...". This file needs to be saved while things are stuck (so shortly after things become unresponsive).

 (5) Email or attach the resulting minidump to this bug. minidumps can contain some private information such as the path to your homedirectory. If you don't want to send us that, I can provide instructions on extracting the needed information (thread callstacks) from it manually.
Cc: cvintila@chromium.org
@eroman we have the minidump from the customer:
Minidump - working after 2 mins
https://drive.google.com/open?id=1v7DwF-mx2ZOta_4GGZHz70qRACUbMDVl

Minidump - unresponsive
https://drive.google.com/open?id=1JUSpsFflFTXx-etVkJ4hwoc_QvOYaLSP

Thanks for the minidumps.

To clarify - the first dump was taken after things started working again?

Unfortunately neither dump shows a hang on the Chrome_IO thread that I was looking for.

The second one does show 8 outstanding invocations of DhcpProxyScriptAdapterFetcher::GetProxyURLFromDhcp(), however doesn't offer an explanation for why the 2 second timeout didn't kick in.

I will have to prepare a test binary to run on the affected machines, since this back and forth to collect data is very slow.
Cc: -dskaram@chromium.org
Just checking here if there's any information you'll need for the investigation? Thanks!
Can you run proxy_tool.exe and attach the two log files that it writes out?

https://drive.google.com/file/d/1v1yQDUspSZWl-Gr9b9xJaalHiW-odDl5/view?usp=sharing

(1) Download and unzip
(2) Execute proxy_tool.exe
(3) Change network, then wait a few minutes
(4) Stop the program, and attach the .txt and .json file that it wrote.

Thanks
@eroman:
here the result of the program proxy_tool.exe
proxy-tool-log.txt https://drive.google.com/open?id=1B9wWHACSg4a_mJlJurRyDEYv5XLYzCla
proxy-tool-net-log.json https://drive.google.com/open?id=1dEdEyeb2h5kfQyjkRfuGGpof1q5WrOek
Amy updates on this please? Thanks.

Comment 32 by kotah@chromium.org, Dec 14 2017

Cc: kotah@chromium.org
cbentzel@,

eroman@ thinks it would be faster if we can repro and test in local environment. Can you point us to QA team who would be able to help?

Cc: yihongg@chromium.org
asanka, yihong: would this work in enterprise lab? 
Cc: -pastarmovj@chromium.org feiling@chromium.org
The scenario is a bit tricky on GCE with the hibernation step but it might be possible. Please feel free to contact feiling@ to give a try.

Comment 35 by jayhlee@google.com, Dec 19 2017

Customer has submitted a pcap capture of a device waking and hanging for the 1-2 minutes. Shared with google.com:

https://drive.google.com/file/d/1jQDMdX-yt4FcJSg6109QyQxtzmTilXlS/view?usp=sharing
Is there any context on that pcap file?

- What requests were made before/after the device waking up?
-What was happening during the hang. What load status did Chrome show in the bottom left? Was it a browser hang where UI was unresponsive, or a page slow to load?

At first glance I see both the DHCP and DNS wpad requests and they don't look slow.
It also appears that the system is running McAfee ScriptScan.

Please try disabling McAfee (ScriptScan in particular) and seeing if the issue still reproduces.
Mergedinto: 770201
Status: Duplicate (was: Assigned)

Sign in to add a comment