Amazon.com cannot be accessed
Reported by
willn2...@gmail.com,
Apr 29 2017
|
||||||||
Issue descriptionChrome Version : 58.0.3029.81 OS Version: Windows 10 Home 64-bit Build 1607 URLs (if applicable) : http://www.amazon.com Other browsers tested: Add OK or FAIL after other browsers where you have tested this issue: Microsoft Edge: OK Firefox 53.0: OK IE 11: OK What steps will reproduce the problem? 1. accessing http://www.amazon.com 2. 3. What is the expected result? The webpage will load What happens instead of that? The website provides error "ERR_SOCKET_NOT_CONNECTED" Please provide any additional information below. Attach a screenshot if possible. Cannot access http://www.amazon.com because of "ERR_SOCKET_NOT_CONNECTED" error... I have attached a net-export log file for you to review. Please keep this information confidential. Thank you! UserAgentString: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36
,
May 1 2017
Interesting, and another report is at issue 716955 . Can anyone working on network stack take a look at this for triage?
,
May 1 2017
Anecdotally, this seems to affect a non-trivial number of users. There are several additional reports at: https://www.facebook.com/piaw.na/posts/10158582819245564?match=cGlhdyBuYSxwaWF3
,
May 1 2017
Network triage here, taking a look.
,
May 1 2017
Issue 716955 has been merged into this issue.
,
May 1 2017
abhinandan.das: Are you getting ERR_SOCKET_NOT_CONNECTED on all platforms, or just periodically not able to connect to Amazon but getting different errors?
,
May 1 2017
This happens intermittently (eg once every few pages loaded on amazon) - and its the same error ERR_SOCKET_NOT_CONNECTED (I've seen this on chromeos and mac devices I use, and this bug and others on the link in comment #4 seem to say the same for windows).
,
May 1 2017
Seeing a lot of reports from our users about being unable to load images from cloudfront. I believe another ticket merged in has another CDN which has this issue also - looks like not isolated to amazon.
,
May 1 2017
Thanks! Just asked because getting that particular error instead of ERR_CONNECTION_CLOSED / ERR_EMPTY_RESPONSE is fairly weird and obscure, and I'm a bit surprised that (when we fail) we're consistently generating it on both Windows and POSIX platforms.
,
May 1 2017
I might have seen ERR_CONNECTION_CLOSED as well - cannot remember for sure. I tried reproducing the issue today to check but so far didnt run into the issue after loading a few 10s of pages on amazon (ERR_SOCKET_NOT_CONNECTED was the most common error though iirc)
,
May 1 2017
related amazon support case: https://forums.aws.amazon.com/thread.jspa?threadID=254701 Speaking with an AWS rep, they believe it's a library issue. It appears to be affecting all chrome/chromium browsers.
,
May 1 2017
gabinante: Any reports from it on HTTP URLs, or just seeing it on HTTPS ones?
,
May 1 2017
Unfortunately I can't reproduce this issue, and am dealing with secondhand reports from our support team. It would probably be helpful for the chrome team if someone who is affected can get the following information: --Full network packet capture (this shouldn't be too hard using a chrome extension such as Network Sniffer) --.HAR dump (chrome -> developer console -> network -> check 'preserve log' -> reproduce issue -> right click anywhere in the network pane and select "save .HAR dump" --would also be interested to see what chrome://net-internals/#sockets says during the issue.
,
May 1 2017
@mmenke all our URLs enforce https, so I'm not sure. I'll try to get more info ASAP.
,
May 1 2017
HAR dumps aren't generally too useful for us, but a wireshark capture would be great. I'm currently thinking this is due to some sort of SSL intolerance on the part of the server (Could be a middlebox as well), since according to Chrome, SSL is negotiated successfully, but could certainly be wrong. Tentatively adding the SSL label. [SSL folks]: Note that issue 716955 has a net-internals log.
,
May 1 2017
,
May 1 2017
The net-internals log on issue #716955 looks weird. If there's some sort of SSL intolerance, we'd usually fail earlier, and we'd also have gotten some sort of net error (or timeout), neither of which shows up in the log. Looking at id:58487, it ultimately grabs socket id:58495. That ends with: t=7632 [st=116] -SSL_CONNECT --> cipher_suite = 49199 --> is_resumed = false --> next_proto = "http/1.1" --> version = "TLS 1.2" t=7632 [st=116] +SOCKET_IN_USE [dt=1] --> source_dependency = 58491 (HTTP_STREAM_JOB) t=7633 [st=117] SOCKET_CLOSED t=7633 [st=117] -SOCKET_IN_USE t=7633 [st=117] -SOCKET_IN_USE t=7633 [st=117] -SOCKET_ALIVE That timestamp aligns with the URL_REQUEST: t=7633 [st=119] +HTTP_TRANSACTION_SEND_REQUEST [dt=0] t=7633 [st=119] HTTP_TRANSACTION_SEND_REQUEST_HEADERS --> GET / HTTP/1.1 Host: www.amazon.com Connection: keep-alive Cache-Control: max-age=0 Upgrade-Insecure-Requests: 1 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 Accept-Encoding: gzip, deflate, sdch, br Accept-Language: en-US,en;q=0.8 Cookie: [1247 bytes were stripped] t=7633 [st=119] -HTTP_TRANSACTION_SEND_REQUEST --> net_error = -15 (ERR_SOCKET_NOT_CONNECTED) t=7633 [st=119] -URL_REQUEST_START_JOB --> net_error = -15 (ERR_SOCKET_NOT_CONNECTED) t=7633 [st=119] URL_REQUEST_DELEGATE [dt=0] t=7633 [st=119] -REQUEST_ALIVE --> net_error = -15 (ERR_SOCKET_NOT_CONNECTED) This looks odd. The only place where SOCKET_CLOSED is logged (on all platforms, actually) is here: https://cs.chromium.org/chromium/src/net/socket/tcp_socket_win.cc?q=case:yes+SOCKET_CLOSED+file:net/&l=656&dr=C So, for some reason, we're calling Close() on a socket that's in use? That ERR_SOCKET_NOT_CONNECTED isn't getting logged outside the URL_REQUEST probably also tells us something, though I haven't traced through everything yet. There's SSLClientSocketImpl::ExportKeyingMaterial, but that's only called as part of some //remoting stuff.
,
May 1 2017
Oh. I'm guessing it's failing here, which explain why the SSL socket is being silent: https://cs.chromium.org/chromium/src/net/http/http_stream_parser.cc?rcl=0f27852ca4088b309083e746c3eaab29fb0b4415&l=237 We know it got to logging HTTP_TRANSACTION_SEND_REQUEST_HEADERS. Immediately afterwards, it calls GetPeerAddress which is ultimately implemented by: https://cs.chromium.org/chromium/src/net/socket/tcp_socket_win.cc?rcl=0f27852ca4088b309083e746c3eaab29fb0b4415&l=589 And if the socket were disconnected, there's our error code. (If we got far enough to actually call Write something, I would have expected more logging.)
,
May 1 2017
Issue 714590 has been merged into this issue.
,
May 1 2017
The other possible place where SOCKET_NOT_CONNECTED happens appears to be with TCP Fast-Open, but this is happening on platforms (like OSX) where that was never deployed/implemented.
,
May 1 2017
Also assigning to current net triager as hot potato: Once we work out root cause can reassign
,
May 1 2017
We have a thread with Amazon engineering ongoing. Going to put Steven in here as "Owner" as he's on the thread & looked through net-internals.
,
May 1 2017
re: #18
I don't think we're calling Close() on the socket before the error. If you look at the timestamps:
t=7466 [st=121] SOCKET_CLOSED
t=7465 [st=84] -HTTP_TRANSACTION_SEND_REQUEST
--> net_error = -15 (ERR_SOCKET_NOT_CONNECTED)
So we call Close() only after we get the error.
You could be right about the call where HttpStreamParser first sees the failure, not sure, but the Close() call looks like a red herring.
,
May 1 2017
Ah, yeah I should have looked at both requests. I agree that means it's probably not a stray Close(). That is more consistent with a recent server-side change then. Supposing it's not possible to hit SSLClientSocket::Write without logging anything (this should be true, though it's certainly possible there's a bug there), that does still suggest the GetPeerAddress one. Looks like IsConnected() does a recv(MSG_PEEK), so if the server is shutting things off on us for some reason that might do it.
,
May 2 2017
We started getting reports about this issue on 4/28 and are able to reproduce with these steps: 1. Load a cloudfront file on chrome (success) 2. Turn on (or off) vpn 3. Try to reload the file (fail) 4. Try to load the file in incognito mode (success) Other browsers are not affected. I'm guessing Chrome is caching something (maybe dns?) that's no longer valid when vpn settings change
,
May 2 2017
Given that in the log, we negotiate SSL, and 1 millisecond later get the error, I don't think this could be an issue with DNS caching.
,
May 2 2017
Amazon discovered it was a problem on their end, not correctly handling session resumption fallback when they couldn't handle the resume. Other browsers would clear session resumption info when they just hang up, but Chrome does not. They've fixed the underlying issue, and we've decided to keep the behavior, Chrome-side, since we prefer to surface this sort of bug rather than papering over them.
,
May 2 2017
Thanks for all of your hard work on this one! Much appreciated. |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by abhinand...@gmail.com
, Apr 30 2017