Chrome-network: Long delay before RST message on stale sockets results in slow page loads.
Reported by liuway...@gmail.com, Jan 9 2015
Chrome Version : 39.0.2171.95 OS Version: 6.1 (Windows 7, Windows Server 2008 R2) URLs (if applicable) : Other browsers tested: Add OK or FAIL after other browsers where you have tested this issue: Safari 5: Firefox 4.x: IE 7/8/9: What steps will reproduce the problem? 1. this problem can not be reproduced regularly What is the expected result? - expect the response to be received or report a timeout error What happens instead of that? - the request stalled and pending for a long time Please provide any additional information below. Attach a screenshot if possible. UserAgentString: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 here's the description: Ajax request occasionally stalled for a long time in chrome. I finally managed to reproduced it and save all related data necessary to post here if anyone could help me out. The timeline from Chrome Dev Tool shows the request stalled for 42.62s as the screen capture 1.jpg shows. and within the chrome://net-internals/#events (for the events log please heads to the end) page i found the most time is cost by two events: +HTTP_TRANSACTION_READ_HEADERS [dt=21301] +HTTP_TRANSACTION_READ_HEADERS [dt=21304] both get ERR_CONNECTION_RESET. as the screen capture 2.jpg shows. I think the error is the reason why the request stalled for so long. Any one could explain the errors? FOLLOWING IS THE EVENTS LOG FOR THE REQUEST, I also export the full events as json you can get from here then restore within the Chrome chrome://net-internals/#events page. note the request url is internal so maybe cant access from public network: 193486: URL_REQUEST http://qa.tieba.baidu.com/release/getReleaseHistory?projectId=fum1.0.593 Start Time: 2015-01-02 17:51:05.323 t= 1 [st= 0] +REQUEST_ALIVE [dt=42741] t= 1 [st= 0] URL_REQUEST_DELEGATE [dt=0] t= 1 [st= 0] +URL_REQUEST_START_JOB [dt=42740] --> load_flags = 339804160 (BYPASS_DATA_REDUCTION_PROXY | MAYBE_USER_GESTURE | REPORT_RAW_HEADERS | VERIFY_EV_CERT) --> method = "GET" --> priority = "LOW" --> url = "http://qa.tieba.baidu.com/release/getReleaseHistory?projectId=fum1.0.593" t= 2 [st= 1] URL_REQUEST_DELEGATE [dt=0] t= 2 [st= 1] HTTP_CACHE_GET_BACKEND [dt=0] t= 2 [st= 1] HTTP_CACHE_OPEN_ENTRY [dt=0] t= 2 [st= 1] HTTP_CACHE_ADD_TO_ENTRY [dt=0] t= 2 [st= 1] HTTP_CACHE_READ_INFO [dt=0] t= 2 [st= 1] URL_REQUEST_DELEGATE [dt=0] t= 2 [st= 1] +HTTP_STREAM_REQUEST [dt=2] t= 4 [st= 3] HTTP_STREAM_REQUEST_BOUND_TO_JOB --> source_dependency = 193488 (HTTP_STREAM_JOB) t= 4 [st= 3] -HTTP_STREAM_REQUEST t= 4 [st= 3] +HTTP_TRANSACTION_SEND_REQUEST [dt=0] t= 4 [st= 3] HTTP_TRANSACTION_SEND_REQUEST_HEADERS --> GET /release/getReleaseHistory?projectId=fum1.0.593 HTTP/1.1 Host: qa.tieba.baidu.com Connection: keep-alive Accept: application/json, text/plain, */* User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 Referer: http://qa.tieba.baidu.com/project/ Accept-Encoding: gzip, deflate, sdch Accept-Language: en-US,en;q=0.8 Cookie: [268 bytes were stripped] t= 4 [st= 3] -HTTP_TRANSACTION_SEND_REQUEST t= 4 [st= 3] +HTTP_TRANSACTION_READ_HEADERS [dt=21301] t= 4 [st= 3] HTTP_STREAM_PARSER_READ_HEADERS [dt=21301] --> net_error = -101 (ERR_CONNECTION_RESET) t=21305 [st=21304] HTTP_TRANSACTION_RESTART_AFTER_ERROR --> net_error = -101 (ERR_CONNECTION_RESET) t=21305 [st=21304] -HTTP_TRANSACTION_READ_HEADERS t=21305 [st=21304] +HTTP_STREAM_REQUEST [dt=3] t=21307 [st=21306] HTTP_STREAM_REQUEST_BOUND_TO_JOB --> source_dependency = 193494 (HTTP_STREAM_JOB) t=21308 [st=21307] -HTTP_STREAM_REQUEST t=21308 [st=21307] +HTTP_TRANSACTION_SEND_REQUEST [dt=3] t=21308 [st=21307] HTTP_TRANSACTION_SEND_REQUEST_HEADERS --> GET /release/getReleaseHistory?projectId=fum1.0.593 HTTP/1.1 Host: qa.tieba.baidu.com Connection: keep-alive Accept: application/json, text/plain, */* User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 Referer: http://qa.tieba.baidu.com/project/ Accept-Encoding: gzip, deflate, sdch Accept-Language: en-US,en;q=0.8 Cookie: [268 bytes were stripped] t=21311 [st=21310] -HTTP_TRANSACTION_SEND_REQUEST t=21311 [st=21310] +HTTP_TRANSACTION_READ_HEADERS [dt=21304] t=21311 [st=21310] HTTP_STREAM_PARSER_READ_HEADERS [dt=21304] --> net_error = -101 (ERR_CONNECTION_RESET) t=42615 [st=42614] HTTP_TRANSACTION_RESTART_AFTER_ERROR --> net_error = -101 (ERR_CONNECTION_RESET) t=42615 [st=42614] -HTTP_TRANSACTION_READ_HEADERS t=42615 [st=42614] +HTTP_STREAM_REQUEST [dt=12] t=42627 [st=42626] HTTP_STREAM_REQUEST_BOUND_TO_JOB --> source_dependency = 193498 (HTTP_STREAM_JOB) t=42627 [st=42626] -HTTP_STREAM_REQUEST t=42627 [st=42626] +HTTP_TRANSACTION_SEND_REQUEST [dt=2] t=42627 [st=42626] HTTP_TRANSACTION_SEND_REQUEST_HEADERS --> GET /release/getReleaseHistory?projectId=fum1.0.593 HTTP/1.1 Host: qa.tieba.baidu.com Connection: keep-alive Accept: application/json, text/plain, */* User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 Referer: http://qa.tieba.baidu.com/project/ Accept-Encoding: gzip, deflate, sdch Accept-Language: en-US,en;q=0.8 Cookie: [268 bytes were stripped] t=42629 [st=42628] -HTTP_TRANSACTION_SEND_REQUEST t=42629 [st=42628] +HTTP_TRANSACTION_READ_HEADERS [dt=112] t=42629 [st=42628] HTTP_STREAM_PARSER_READ_HEADERS [dt=112] t=42741 [st=42740] HTTP_TRANSACTION_READ_RESPONSE_HEADERS --> HTTP/1.1 200 OK Date: Fri, 02 Jan 2015 09:51:48 GMT Content-Type: application/json; charset=UTF-8 Transfer-Encoding: chunked Connection: keep-alive Cache-Control: no-cache tracecode: 31079600320335034634010217 tracecode: 31079600320537995786010217 Server: Apache t=42741 [st=42740] -HTTP_TRANSACTION_READ_HEADERS t=42741 [st=42740] HTTP_CACHE_WRITE_INFO [dt=0] t=42741 [st=42740] HTTP_CACHE_WRITE_DATA [dt=0] t=42741 [st=42740] HTTP_CACHE_WRITE_INFO [dt=0] t=42741 [st=42740] URL_REQUEST_DELEGATE [dt=0] t=42741 [st=42740] -URL_REQUEST_START_JOB t=42741 [st=42740] URL_REQUEST_DELEGATE [dt=0] t=42741 [st=42740] HTTP_TRANSACTION_READ_BODY [dt=0] t=42741 [st=42740] HTTP_CACHE_WRITE_DATA [dt=0] t=42741 [st=42740] HTTP_TRANSACTION_READ_BODY [dt=0] t=42741 [st=42740] HTTP_CACHE_WRITE_DATA [dt=0] t=42742 [st=42741] -REQUEST_ALIVE
Jan 12 2015,
We'd need a full log dump to be sure, but what looks to be happening: 1) We have a bunch of connections to qa.tieba.baidu.com, all were used successfully, and are now idle. 2) They all silently die for some reason, without us ever being informed. My guess is your personal router times out the connection, but this could also be your ISP, the destination server, or ever a real network outage (A short one) that prevents us from getting the connection closed message. 3) There's a new request to qa.tieba.baidu.com. We try to reuse a socket. After 21 seconds, we get the server's RST message ("I don't have a connection to you."). Since it was a stale socket, we're aware this sometimes happens, so we just retry...And get the next idle socket in the list, which, after 21 seconds, gives us the same reset message. We try again, for the same reason. This time we don't have another stale socket to try, so we use a fresh one. The request succeeds. The real problem here is something is taking 21 seconds to send us the RST messages, despite the fact that a roundtrip to the server you're talking to only takes about 100 milliseconds.
Jan 12 2015,
Sufficiently confident of that diagnosis that I'm going to go ahead and move this into the "Available" bin. Unfortunately, dealing with bad middleboxes is more of an art than a science, not sure if/when we'll take this particular issue on.
Jan 13 2015,
- Regarding the `full log dump`, does the attached file `chrome-net-internal-log.json` met the demand? Or anything more to provide? - FYI. this happens very rarely. - Thanks your analyse, seems your diagnosis well explained the net log.
Jan 13 2015,
Oops - sorry, I saw you pasted in part of the log, and just assumed you didn't attach it as well. I'll take a look at it tomorrow, just in case (Don't expect to see anything else, but best to be sure).
Jan 13 2015,
Log is interesting. We make two requests at once for URLs on the same server (getTestInfo and getReleaseHistory). Both get stale sockets. The first one succeeds, while the second is hung on 21 seconds, then we get the ERR_CONNECTION_RESET. Then we retry using the socket we successfully got getTestInfo from 21 seconds earlier. That socket then also gives us a CONNECTION_RESET after 21 seconds. So whatever's killing our sockets doesn't hesitate to do it on sockets that were used to send and receive data just 21 seconds ago. That seems very short for something timing out a socket. Also means there wasn't one event that killed all sockets at once. Diagnosis is still basically the same, but details are a bit different than I expected.
Jan 14 2015,
So according what we see from the log, basically the browser doesn't know the socket it's going to reuse is invalid. And what causes the socket invalid is unknown for now, at least from the log we can't tell, right? If so , seems not a bug relate to Chrome. But seems not happened in other browsers like Firefox. I tested several times and failed to reproduce.
Jan 14 2015,
Exactly. A connection/socket dies for some reason, we're never told (And your PC doesn't know, either, in general). And then when we try to use the socket again, we're in trouble. If everything in between was working correctly, we should learn the socket is dead pretty quickly, but instead, in this case, something's waiting on a timeout before telling us about the problem. I suspect this is the TCP timeout (The OS's network code just gives up), but could theoretically be something else timing out the connection. Different browsers may have different rules for keeping around stale sockets, or creating new ones, which may by why you're not seeing it in other browsers.
Mar 16 2015,
I've faced exactly the same issue with my web application. - I have been able to produce this issue only with AJAX calls - Not all calls take a long time - probably one in 10 calls end up in this delay - The problem happens more often if the time gap between two requests to my server is between 1 minute to 5 minutes (not very sure of the time - I did not do more experiment to narrow the timeframe). - On analysing network tab of Chrome, I see that most of the time is taken in "stalled state". (sometimes around 20 seconds but mostly around 1 minute. I have observed upto 2.5 minutes in the stalled state). - Unfortunately, I do not have net-internal/#events from my tests. This problem unfortunately does not happen in my home/office internet connection. - This happens regularly with some of the internet connections. This has been reported by 3 of my customers at different locations. All of these customers use current version of chrome (40.0.2214.115 m) on Windows 7. - Most of my customers don't face this issue. This does not happen in my/other developer connections. - I had tried with my laptop in my customer's network and I’m able to reproduce the issue. - When I was able to consistently reproduce the problem in Chrome (when testing in my customer's network), I tried producing the problem in Firefox but could not. - When I was able to consistently reproduce the problem in Chrome (when testing in my customer's network), many other websites I tried seems to be working fine. - For now I have instructed my customers to use Firefox to access our application.
Mar 18 2015,
since you can reproduce it consistently I recommend you attach a net-internal/#events log file for the developer to investigate.
Mar 19 2015,
I will collect and post the same when I visit one of my customers who face this issue. Meanwhile, I'm trying to produce the problem in my home and office networks - not been able to till now.
May 27 2015,
I have the same issue, as aravinda has described. net-internal/#events log attached
May 27 2015,
Comment #11 looks like the same issue. However please include the actual net-internals dump (by going to export panel and clicking the button) rather than attaching the HTML
May 28 2015,
By 956776: URL_REQUEST you'll find 540 seconds waiting time.
Jul 31 2015,
I am observing the same problem when our clients using our service through VPN-tunnel. I am also see ERR_CONNECTION_RESET in logs after ~20 seconds. May somebody help with workaround? May be we can change server headers or server settings? P.S. Server response headers: Cache-Control:private, no-cache, no-cache=Set-Cookie, proxy-revalidate Connection:close Date:Fri, 31 Jul 2015 08:51:14 GMT Last-Modified:Mon, 28 Sep 1970 06:00:00 GMT Pragma:no-cache Server:nginx/1.7.7
Nov 21 2015,
[ Network Stack Bug Triage Fixit ] This seems like fundamentally something wrong in the network below, and not clear to me what a sensible strategy to work around it would be. Since there's been no activity, I'm going to archive, but feel free to re-open if you object.
Jun 20 2016,
I have noticed today, that as soon as I disable the "Use a prediction service to load pages more quickly" (in Advanced Settings/Privacy) option in Chrome's settings - then everything works ok and the Ajax request loads w/o any issues - every time. With this option ticked/enabled I am able to reproduce it basically "at will". So there seems to be some kind of correlation between these two. Tried it on Linux Chrome Version 51.0.2704.103 (64-bit).
Oct 19 2016,
I have observed the same problem. I think the bug should be re-opened. The net-internal log file is attached.
Jan 26 2017,
Any plans on fixing this issue?
Jan 30 2017,
Noticed this again on our customer's environment. Any plans on fixing this, please find the net-internal logs attached. Any plans on fixing this?
Mar 6 2017,
I am also having this issue with Chrome. It has been happening for months, on different network equipment, different devices, etc. It has gotten to the point that when a request is doing this, I just click refresh 10 times to flush the connections out (or so I assume that's what is happening). These timeouts are usually around 20 seconds per failed connection, which can happen numerous times on a single request. Without manually aborting and retrying a few times, the cumulative timeout can be as long as 2 minutes!
Mar 6 2017,
I've encountered a similar issue that only reproduces in Chrome and appears to be CORS-related. In my Angular app, the majority of my API requests are against the same host/origin as the app and all work fine. In a few places, however, I make 1-3 parallel calls to an API hosted elsewhere. These frequently (not always) are in a Pending state for 20-25 seconds before returning in Chrome, but consistently return in milliseconds in other browsers (and in Chrome when it doesn't hit the issue). If I open the query link in a new tab while the query is Pending, it returns quickly and also "unsticks" the original requests and they return immediately as well. I have also found that disabling "Use a prediction service to load pages more quickly" as suggested above resolves the issue, so it seems this may be a bug in that feature. That doesn't really help all my Chrome-using clients who have this enabled by default, though.
Mar 6 2017,
I should note that I personally have not tried disabling the prediction service. I will do so now and see if it affects the issue at all. Also, I have purchased an additional modem and router so I can try to remove those variables as well. I have experienced this issue on other environments in the past but don't have network logs from that time. So I'll try to get that this week. Hope this helps, Thanks
Mar 9 2017,
Disabling that feature made no difference for me. I am testing with different equipment now. I have not had recurrence after a few hours of testing. By now I would usually see it. Could be luck, I don't know. I'll keep an eye on it. It seems to me that if this is buggy behavior from residential/consumer gateways it would be ideal if Chrome could be aware of the occurrence and work around it somehow. Given that it does not happen in other browsers (at least with my testing), something must be different in the way this connection pooling or reuse is working. Just my two cents. :-)
Dec 11 2017,
Get the same error with an angular 2 app, the app stalls on 15% of the requests for 15.10 second https://stackoverflow.com/questions/47749806/chrome-stalls-request-for-15-seconds
I have a similar problem which happens each time, long delay but with different error.
Sign in to add a comment