New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 716764 link

Starred by 8 users

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug



Sign in to add a comment

Amazon.com cannot be accessed

Reported by willn2...@gmail.com, Apr 29 2017

Issue description

Chrome Version       : 58.0.3029.81
OS Version: Windows 10 Home 64-bit Build 1607
URLs (if applicable) : http://www.amazon.com
Other browsers tested:
  Add OK or FAIL after other browsers where you have tested this issue:
     Microsoft Edge: OK
       Firefox 53.0: OK
              IE 11: OK

What steps will reproduce the problem?
1. accessing http://www.amazon.com
2.
3.

What is the expected result?
The webpage will load

What happens instead of that?
The website provides error "ERR_SOCKET_NOT_CONNECTED" 

Please provide any additional information below. Attach a screenshot if
possible.

Cannot access http://www.amazon.com because of "ERR_SOCKET_NOT_CONNECTED" error... I have attached a net-export log file for you to review. Please keep this information confidential.

Thank you!

UserAgentString: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36



 
This is affecting chrome across platforms (windows, mac, chromeos) and while intermittent, seems to affect amazon.com and not other major sites

Comment 2 by kochi@chromium.org, May 1 2017

Components: Internals>Network
Interesting, and another report is at  issue 716955 .

Can anyone working on network stack take a look at this for triage?
Anecdotally, this seems to affect a non-trivial number of users. There are several additional reports at:
https://www.facebook.com/piaw.na/posts/10158582819245564?match=cGlhdyBuYSxwaWF3
Network triage here, taking a look.
 Issue 716955  has been merged into this issue.
abhinandan.das:  Are you getting ERR_SOCKET_NOT_CONNECTED on all platforms, or just periodically not able to connect to Amazon but getting different errors?
This happens intermittently (eg once every few pages loaded on amazon) - and its the same error ERR_SOCKET_NOT_CONNECTED (I've seen this on chromeos and mac devices I use, and this bug and others on the link in comment #4 seem to say the same for windows).

Seeing a lot of reports from our users about being unable to load images from cloudfront. I believe another ticket merged in has another CDN which has this issue also - looks like not isolated to amazon.
cferror1.png
430 KB View Download
cferror2.png
71.0 KB View Download
Thanks!  Just asked because getting that particular error instead of ERR_CONNECTION_CLOSED / ERR_EMPTY_RESPONSE is fairly weird and obscure, and I'm a bit surprised that (when we fail) we're consistently generating it on both Windows and POSIX platforms.
I might have seen ERR_CONNECTION_CLOSED as well - cannot remember for sure. I tried reproducing the issue today to check but so far didnt run into the issue after loading a few 10s of pages on amazon (ERR_SOCKET_NOT_CONNECTED was the most common error though iirc)
related amazon support case: https://forums.aws.amazon.com/thread.jspa?threadID=254701

Speaking with an AWS rep, they believe it's a library issue. It appears to be affecting all chrome/chromium browsers.
gabinante:  Any reports from it on HTTP URLs, or just seeing it on HTTPS ones?
Unfortunately I can't reproduce this issue, and am dealing with secondhand reports from our support team. It would probably be helpful for the chrome team if someone who is affected can get the following information:
--Full network packet capture (this shouldn't be too hard using a chrome extension such as Network Sniffer)
--.HAR dump (chrome -> developer console -> network -> check 'preserve log' -> reproduce issue -> right click anywhere in the network pane and select "save .HAR dump"
--would also be interested to see what chrome://net-internals/#sockets says during the issue.
@mmenke all our URLs enforce https, so I'm not sure. I'll try to get more info ASAP.
Components: -Internals>Network Internals>Network>SSL
HAR dumps aren't generally too useful for us, but a wireshark capture would be great.

I'm currently thinking this is due to some sort of SSL intolerance on the part of the server (Could be a middlebox as well), since according to Chrome, SSL is negotiated successfully, but could certainly be wrong.

Tentatively adding the SSL label.

[SSL folks]:  Note that  issue 716955  has a net-internals log.
Labels: -OS-Windows -Pri-3 OS-All Pri-1
The net-internals log on  issue #716955  looks weird. If there's some sort of SSL intolerance, we'd usually fail earlier, and we'd also have gotten some sort of net error (or timeout), neither of which shows up in the log.

Looking at id:58487, it ultimately grabs socket id:58495. That ends with:

t=7632 [st=116]     -SSL_CONNECT
                     --> cipher_suite = 49199
                     --> is_resumed = false
                     --> next_proto = "http/1.1"
                     --> version = "TLS 1.2"
t=7632 [st=116]     +SOCKET_IN_USE  [dt=1]
                     --> source_dependency = 58491 (HTTP_STREAM_JOB)
t=7633 [st=117]        SOCKET_CLOSED
t=7633 [st=117]     -SOCKET_IN_USE
t=7633 [st=117]   -SOCKET_IN_USE
t=7633 [st=117] -SOCKET_ALIVE

That timestamp aligns with the URL_REQUEST:

t=7633 [st=119]     +HTTP_TRANSACTION_SEND_REQUEST  [dt=0]
t=7633 [st=119]        HTTP_TRANSACTION_SEND_REQUEST_HEADERS
                       --> GET / HTTP/1.1
                           Host: www.amazon.com
                           Connection: keep-alive
                           Cache-Control: max-age=0
                           Upgrade-Insecure-Requests: 1
                           User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36
                           Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
                           Accept-Encoding: gzip, deflate, sdch, br
                           Accept-Language: en-US,en;q=0.8
                           Cookie: [1247 bytes were stripped]
t=7633 [st=119]     -HTTP_TRANSACTION_SEND_REQUEST
                     --> net_error = -15 (ERR_SOCKET_NOT_CONNECTED)
t=7633 [st=119]   -URL_REQUEST_START_JOB
                   --> net_error = -15 (ERR_SOCKET_NOT_CONNECTED)
t=7633 [st=119]    URL_REQUEST_DELEGATE  [dt=0]
t=7633 [st=119] -REQUEST_ALIVE
                 --> net_error = -15 (ERR_SOCKET_NOT_CONNECTED)

This looks odd. The only place where SOCKET_CLOSED is logged (on all platforms, actually) is here:
https://cs.chromium.org/chromium/src/net/socket/tcp_socket_win.cc?q=case:yes+SOCKET_CLOSED+file:net/&l=656&dr=C

So, for some reason, we're calling Close() on a socket that's in use?

That ERR_SOCKET_NOT_CONNECTED isn't getting logged outside the URL_REQUEST probably also tells us something, though I haven't traced through everything yet. There's SSLClientSocketImpl::ExportKeyingMaterial, but that's only called as part of some //remoting stuff.
Oh. I'm guessing it's failing here, which explain why the SSL socket is being silent:

https://cs.chromium.org/chromium/src/net/http/http_stream_parser.cc?rcl=0f27852ca4088b309083e746c3eaab29fb0b4415&l=237

We know it got to logging HTTP_TRANSACTION_SEND_REQUEST_HEADERS. Immediately afterwards, it calls GetPeerAddress which is ultimately implemented by:

https://cs.chromium.org/chromium/src/net/socket/tcp_socket_win.cc?rcl=0f27852ca4088b309083e746c3eaab29fb0b4415&l=589

And if the socket were disconnected, there's our error code. (If we got far enough to actually call Write something, I would have expected more logging.)
Cc: rdsmith@chromium.org juliatut...@chromium.org manisca...@chromium.org
 Issue 714590  has been merged into this issue.
The other possible place where SOCKET_NOT_CONNECTED happens appears to be with TCP Fast-Open, but this is happening on platforms (like OSX) where that was never deployed/implemented.
Owner: juliatut...@chromium.org
Status: Assigned (was: Unconfirmed)
Also assigning to current net triager as hot potato: Once we work out root cause can reassign
Cc: svaldez@chromium.org davidben@chromium.org
Owner: svaldez@chromium.org
We have a thread with Amazon engineering ongoing. Going to put Steven in here as "Owner" as he's on the thread & looked through net-internals.
re:  #18

I don't think we're calling Close() on the socket before the error.  If you look at the timestamps:

t=7466 [st=121]        SOCKET_CLOSED

t=7465 [st=84]     -HTTP_TRANSACTION_SEND_REQUEST
                    --> net_error = -15 (ERR_SOCKET_NOT_CONNECTED)

So we call Close() only after we get the error.

You could be right about the call where HttpStreamParser first sees the failure, not sure, but the Close() call looks like a red herring.
Ah, yeah I should have looked at both requests. I agree that means it's probably not a stray Close(). That is more consistent with a recent server-side change then.

Supposing it's not possible to hit SSLClientSocket::Write without logging anything (this should be true, though it's certainly possible there's a bug there), that does still suggest the GetPeerAddress one. Looks like IsConnected() does a recv(MSG_PEEK), so if the server is shutting things off on us for some reason that might do it.

Comment 25 by ap...@twilio.com, May 2 2017

We started getting reports about this issue on 4/28 and are able to reproduce with these steps:

1. Load a cloudfront file on chrome (success)
2. Turn on (or off) vpn
3. Try to reload the file (fail)
4. Try to load the file in incognito mode (success)

Other browsers are not affected. I'm guessing Chrome is caching something (maybe dns?) that's no longer valid when vpn settings change
Given that in the log, we negotiate SSL, and 1 millisecond later get the error, I don't think this could be an issue with DNS caching.
Status: WontFix (was: Assigned)
Amazon discovered it was a problem on their end, not correctly handling session resumption fallback when they couldn't handle the resume.  Other browsers would clear session resumption info when they just hang up, but Chrome does not.  They've fixed the underlying issue, and we've decided to keep the behavior, Chrome-side, since we prefer to surface this sort of bug rather than papering over them.
Thanks for all of your hard work on this one! Much appreciated.

Sign in to add a comment