way to disable preconnected/speculative sockets from server side
|Reported by petrausk...@gmail.com, Jun 7 2011||Back to list|
Chrome Version : 11.0.696.77, 11.0.696.71, 12.0.742.91 OS Version: 5.0, 5.1, 6.1 URLs (if applicable) : Other browsers tested: Safari 5: OK Firefox 4.x: OK IE 7/8/9: OK What steps will reproduce the problem? 1. Client often visits HTTPS site and look at the same dynamic content 2. This triggers "preconnected/speculative sockets" use 3. Server suffers from available connections starvation What is the expected result? Disable "preconnected/speculative sockets" from server side if server admins see the problem. Reduce idle time of "preconnected/speculative socket". What happens instead? Server with 50 maximum clients limit and average 80ms serving time for content item serves not more than 10 clients per second. I am an administrator of some intranet web server. Intranet users use only IE 7/8/9. Recently we started to do some internet services for authenticated users. Server with lot of dynamically generated content runs on 16 (sixteen) years old Sun machine and we currently don't have possibility to upgrade. Internet users use a variety of browsers and we started to see degradation of service. Apache server status shows a lot of sessions in "Reading Request" state. Because Apache is configured with "Timeout" of 300 seconds these connections in "Reading Request" state are terminated after 300 seconds. I see these lines in access_log: 10.10.10.10 - - [06/Jun/2011:16:45:07 +0200] "-" 408 - and these lines in error_log: [Mon Jun 6 16:45:07 2011] [warn] [client 10.10.10.10] read request line timed out I attached three screenshots of Wireshark (did not attached dump file because of security concerns): CaptureImmediateShutdown.gif - sometimes Chrome preconects with SSL server but immediately shutdowns connection CaptureTimeout.gif - Chrome keeps idle preconnected socket and server after 300 seconds shutdowns connection CaptureSuccessfulUsageOfPreconnectedSocket.gif - Chrome keeps preconnected socket, user makes request after 52 seconds and Chrome uses this socket to communicate with server. I named picture with "successful usage" but as admin of this server i don't want such behavior. Some details about Apache configuration: Timeout 300 KeepAlive Off #Server always sends "Connection: close" header MaxClients 50 Dynamic content are generated from database and httpd children processes do not share database connections and by increasing MaxClients value I would exhaust database resources. I tried to search web with keywords from subject but could not find any suggestions for web site owners/administrators how to deny this type of browser behavior from server side.
Jun 8 2011,
Jun 9 2011,
+cc the networking-preconnect braintrust
Jun 9 2011,
I can't tell from these gifs what is really going on here, or if this is even a chrome browser. Some observations: a) The SSL handshake signature does not look like a recent chrome client. Are you sure this is chrome? b) The SSL server certainly does seem to have some problem - the time between the client hello and server hello in the first diagram is 13s. Ouch. c) The 3rd chart does not look to me like use of the socket after 50s of idle. Rather, it looks like there are both HTTP and HTTPS connections to this server from the same client. But I can't see the port # to confirm this. Overall, I don't believe server side control of client preconnect behavior is the right answer here. I could be convinced, but my initial thought is that system admins won't know how to configure this properly, and it will become a "voodoo configuration". Instead, I propose more evidence be gathered. I understand your privacy concerns, but we need to see some traces, as well as the web pages and description of user behavior causing this pattern. I'm not at all convinced that this was preconnect causing this, or that it was even a chrome client. Can you submit more data?
Jun 9 2011,
One source of data would be a trace from about:net-internals. Do the following: a) load the about:net-internals tab b) reproduce the problem c) Click "dump" in about:net-internals, remove any data that is private, and then send to us here. The about:net-internals doesn't contain any web content, but it does contain URLs. We already black out cookies, so those won't be sent. But if you are sensitive on other headers, you'd have to block those out as well.
Jun 9 2011,
The 300 seconds is probably for keep alive, and has nothing to do with speculative preconnects, which would typically disconnect in about 10 seconds (if never used). The 300 seconds should be a server side parameter. It can be set as high as 300 seconds when the server wants to improve user experience at the cost of server side resources. My first suggestion would be to reduce it. This will increase connect time, but will reduce load on your server (which you are asserting is the critical resource for customer performance). This bug is asking "what can the server do when it wants to use less resources, and is willing to reduce client performance." Perhaps it is also asking what can be done to disable preconnects, asserting that they are harming performance, but I'm not clear on the evidence that this is taking place. We recently changed the performance (client side) to avoid "learning" about preconnects if the historical connection did not happen within 10 seconds of the parent resource. As a result, I'd expect that unless the HTTPS is truly "needed" that we won't "learn" about it. If a subresource is truly needed, then (if we hesitate at all in response to a challenge for credentials), we wouldn't (wastefully) abandon the connection. If we can't "hesitate" then perhaps we need to monitor connections, and avoid pre-connection to sites that demand client credentials. I'm adding another developer that may be able to comment on the SSL performance when credentials are requested. I suspect that if this is a problem, the bug should be morphed to better understanding (client side) that it is wasteful to preconnect, so as to avoid this connection thrashing. It is possible that we should support this as a hint from server, but if we can understand the problem, it seems much better to solve it adaptively client side. This would solve it for all sites, without requiring diagnostics.
Jun 9 2011,
@ comment #3: gifs are created from network dump file. Because all requests are using SSL to get more information I entered server private RSA key in Wireshark and decrypted SSL traffic. All three gifs are extracted from network dump file with Wireshark filter "tcp.stream==###" so in every gif there are all packets from one TCP session. a) this is a header from successful requests "User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.77 Safari/534.24" b) could be because at capture time server suffered from this problem. I'm attaching anonymized tcpdump file and you will see that: * Chrome keeps idle connection longer than 5 seconds at the times when SSL handshake was without delays (tcp.strem 37 or 40) but client itself initiates shutdown after 300 seconds * Chrome closes connection after 9.6 seconds after delayed SSL server hello (tcp.stream 22 or 26) * Chrome keeps idle connection longer than 300 seconds after delayed SSL server hello (tcp.stream 10, 39, 46 or 47) and server initiates shutdown after 300 seconds I created spreadsheet (https://spreadsheets.google.com/spreadsheet/ccc?key=0AswgSgD2-Y58dHI3TU5DeU1fV3BtZVdCT19LcDFBSXc&hl=en_US) with information from Apache access_log, tcp dump stream numbers and some comments @ comment #4: currently we closed services for Chrome browser so I will create trace from browser when I will have chance. @ comment #5: Keep alive is turned off in Apache configuration. Because Apache has only one parameter for both server side dynamic content generation time and client connection activity time I can not reduce "Timeout" setting (http://httpd.apache.org/docs/2.2/mod/core.html#timeout). I don't know internals of Chrome preconnects and maybe this combination of slow SSL connection and 5 or 10 seconds idle socket timer could create situation when idle socket timer finishes in some state when idle connection is never shutdown from the client side.
Jun 29 2011,
jar: Related to your suggestion http://code.google.com/p/chromium/issues/detail?id=87121#c19 , and your remark in comment 5, would/should it be possible to tune preconnects aggressiveness down based on the presence/prevalence of an explicit "Connection: Close" headers in HTTP/1.1 services? Given that "Connection: Close" semantics indicate that connections SHOULD NOT be considered persistent and HTTP/1.1 applications that don't support persistent connections MUST include it every message, this (may) be a way for servers to reduce load. As you see in the reporters original Apache configuration, they're already setting "KeepAlive Off". Admittedly, connections marked "Connection: Close" are perhaps the ones best suited to benefit from preconnect (since a primed connection may be waiting in the pool), but it may better match the server's expectation that the client should "go away" after this request. Also, should Issue 87121 be merged into this, based on willchan's findings in comment 18?
Jul 10 2011,
Jul 11 2011,
I think there are two issues here: (1) There is a problem with Chrome overpreconnecting. We should perhaps be more conservative. I defer to Jim here. (2) The server cannot handle the load. Let's work on fixing (1) so we improve the accuracy of our preconnect target. For (2), I advise the server admin to disable HTTP keep alives and lower the timeouts. If the server considers it unacceptable for clients to keep sockets open for so long, then close the sockets. The server doesn't need to wait 300s for the client to close its socket. Preconnect has been in Chrome since Chrome 7 or so. This is the first bug report I've seen where servers had begun complaining about it. If this is a problem for server admins, I'd like to see more server admins chime in here and ask Chrome to do something.
Jul 12 2011,
The problem with web servers is that they are different. Very popular Apache server have only one parameter Timeout that defines a lot of things (http://httpd.apache.org/docs/2.2/mod/core.html#timeout): 1. When reading data from the client, the length of time to wait for a TCP packet to arrive if the read buffer is empty. 2. When writing data to the client, the length of time to wait for an acknowledgement of a packet if the send buffer is full. 3. In mod_cgi, the length of time to wait for output from a CGI script. 4. In mod_ext_filter, the length of time to wait for output from a filtering process. Chrome exhausts server resources because #1 but #3 and #4 does not allow server admins to lower Timeout value because clients will not get results from dynamically generated pages when generation takes a long time. HTTP keep alives has more configuration on server side and Chrome plays by rules set on server. I could accept idle client connections for 10 seconds but Chrome breaks something in this idle preconnected unused socket timeout whet it plays with SSL and frames. It is not easy to get "public" hosting server with SSL but I found one and created some simple dynamic application there https://apex.oracle.com/pls/apex/f?p=27545:4:0 if you cannot repeat the bug in your environment. I setup Chrome to open old tabs whet it runs, I closed Chrome with my sample application in one tab and about:net-internals in other tab. And when I run chrome it opens these two tabs on start up and hits the bug 9 times of 10. Also I logged to file one of these sessions using --log-net-log flag. Log file is attached, socket #8 hits the bug and are closed after 5 minutes.
Jul 20 2011,
I think it is very hard to diagnose this issue from server side and to find that problems are because Chrome. I found only one official issue here http://www.directadmin.com/features.php?id=1138 To diagnose this issue Apache admin must have access to dump network traffic, must know how to examine traffic in Wireshark, must have access to server private key to decipher traffic and identify Chrome as cause.
Jul 20 2011,
@#10: If you need Apache to provide more configuration, please file a bug with Apache. Commenting here isn't going to change Apache's configurability. @#11: I think that for many cases when debugging a server, one would need access to dump network traffic. I don't think this is an extraordinary requirement.
Jul 21 2011,
@ comments in #12: I do not hope that commenting here will help with Apache. I'm only giving an example to comment #9 that server admins not always have the possibility to lower timeouts. From suggestion in comment #9 about lowering server timeout seems that no one recognizes that there is a bug in Chrome preconnecting SSL sockets and leaving them idle for more than 10 seconds if they where never used. About network traffic dumps: I believe that I am very good administrator and I am using network dumps in everyday administration more than 10 years but for some reason I had believe that there is no way to decrypt dumped SSL traffic even with access to server private key. Only when I had to cope with this problem I discovered this possibility. So I use common sense when I say that it is more difficult to debug this particular problem than problems without SSL. And in bigger companies where there is separate positions for web server admin, operating system admin and security admin it could be that there is no possibility for web server admin to get access to private key of server certificate and identify Chrome as reason for idle connections.
Jul 25 2011,
Given that there is a larger server cost to pre-connect SSL, we probably should be more conservative about that class of speculative pre-connection. Perhaps we can add a negative feedback loop to diminish our (future) speculation when we detect (as Will called it) over-pre-connection. In more general settings, we would like to better estimate the number of needed pre-connections, based on required connections, rather than based on resource count. That transition in our learning algorithms should significantly help to address this issue. It is also plausible that we could detect over-pre-connection on SSL links, and disconnect sooner than a 5-minute time point. We'll have already used some server resources to acquire the connection.... but perhaps we can help by reducing further resource utilization when we detect such a state. All the above approaches really focus on just "being better" about our speculative estimates, so that we don' make (m)any mistakes, but we require no server assistance (hints/headers) to "Get this right." We'll need to think and look at some of these options over time. I don't really see a way to totally control this from a server side perspective. It is mostly too late when we talk to a server... but perhaps we can update our speculative tables based on feedback from a server requesting "less speculation." The current speculative (learned) data structures are indexed by a referrer, and offer suggested connections to sub-resources. The question then comes as to whether it is the sub-resource host (header?) that would like to request less speculation, or the referrer host (header). It probably wouldn't be too hard to have the referrer host header state "don't speculate about my subresources," or "don't speculate about a specific sub-resource," or maybe "don't speculate about SSL sub-resources." More thought needs to go into this selection. I'll assign this bug to myself, but I'll lower the priority to P3 since I'm not clear on what a good resolution would be.
Sep 15 2011,
We have this problem on the Apache used for our SSO (Single Sign On). Chrome users consistently create unused connexions (state R "Reading" when viewed with Apache mod_status) that stay actives until timeout is reached. Ex. for one user (1st GET return login page, next are access to applications through SSO + redirect) : [14/Sep/2011:11:43:09 +0200] "GET /cas/login?service=.. HTTP/1.1" 200 2109 [14/Sep/2011:11:44:00 +0200] "POST /cas/login?service=... HTTP/1.1" 302 215 [14/Sep/2011:11:48:09 +0200] "-" 408 - "-" "-" [14/Sep/2011:11:48:09 +0200] "-" 408 - "-" "-" [14/Sep/2011:11:48:09 +0200] "-" 408 - "-" "-" [14/Sep/2011:11:50:35 +0200] "GET /cas/login?service=... HTTP/1.1" 302 257 [14/Sep/2011:11:55:35 +0200] "-" 408 - "-" "-" [14/Sep/2011:11:55:35 +0200] "-" 408 - "-" "-" [14/Sep/2011:11:55:35 +0200] "-" 408 - "-" "-" [14/Sep/2011:12:02:47 +0200] "GET /cas/login?service=... HTTP/1.1" 302 261 [14/Sep/2011:12:07:47 +0200] "-" 408 - "-" "-" [14/Sep/2011:12:07:47 +0200] "-" 408 - "-" "-" We've decreased Apache TimeOut to 60s to avoid exhausting Apache MaxClients too quickly in case of load, but this is annoying nonetheless..
Apr 19 2012,
I have been hitting this issue since speculative pre-connections were added to Chrome, and I suspect many others that haven't found this bug report have as well. This is becoming a serious issue for my particular environment, now that more users are running Chrome with pre-connections enabled. I'm using Apache 1.3 and cannot upgrade or migrate since my backend depends on Apache 1.3. The symptom is that Chrome makes pre-connections to my server, holds them open (I haven't explicitly timed it, but I do see connections held open for longer than 10 seconds), and thus uses up all my available server slots, of which I have only 10. The server is quite literally doing nothing but waiting for Chrome to close or use the pre-connected sockets, which essentially causes a DoS attack, since most clients (those without pre-connected sockets) time out while trying to connect. Again, I must state, I noticed this initially because I caused a DoS attack against my own site after upgrading Chrome last May (2011). I had hoped something would be done before actual users started using Chrome with pre-connect enabled. Please understand this is most assuredly causing issues for my environment now that it is in widespread use, and there is no easy way for me to workaround Chrome's behaviour. The only real option I have is to stand up another more capable web server in front of my Apache 1.3 servers, to proxy requests. Certainly this is a viable workaround, however it is a fair amount of work for me in terms of configuration, testing and deployment. May I suggest that Chrome permanently disable pre-connecting to sites that timeout? Chrome now has a (shared?) database of servers that are under load to prevent Chrome users from DDoSing sites with Chrome. Can this be leveraged somehow to also disable preconnecting to sites with environments such as mine, where the maximum number of concurrent requests is quite low (i.e. 10) and cannot easily be raised? I would greatly appreciate some more thought be put into solving this problem. Thanks, Dan Sterling
Apr 20 2012,
@#14: Should we repost bug with different wording? Now I see that my expectations for Chrome reaction to some very special headers would be a point off misuse for server admins. But rsleevi's suggestions in comment #7 was very relevant: in case of SSL server returning header "Connection: close" client must not leave any idle connections and Chrome must terminate all current idle connections to that server's port. And to be perfect Chrome should remember this setting for server:port combination until it will not get "Connection: keep-alive" header from the server.
Apr 20 2012,
This is a "me too" response. I do have to ask what the point of pre-connections is. Seems like an over-optimization. We've seen similar problems with aggressively configured wget's. Are there any proxy solutions out there that limit connections based on dynamic behavior? I haven't found any good Apache modules that do "the right thing". Thanks, Rob
Apr 20 2012,
@16: Is your issue strictly due to Apache 1.3? I'm surprised that Apache 1.3 can only handle 10 connections, that sounds wrong to me. In any case, if it's specific to Apache 1.3, then I think we have to simply ask you to upgrade your environment. Apache 1.3 was end of life'd nearly 2 years ago and Apache 2 has been out for almost a decade. @17: Thanks for bringing rsleevi's suggestion back up. I think it is possibly reasonable. I guess it depends on how often sites use Connection: close in a reasonable manner. If lots of important web sites use it incorrectly, then I would consider it reasonable for Chromium to continue to preconnect, despite Connection: close. But I guess it makes sense to err on the side of being conservative here since Connection: close is a reasonable signal that the server is resource constrained. jar@, WDYT? @18: Preconnect makes the web significantly faster. See http://www.belshe.com/2011/02/10/the-era-of-browser-preconnect/ for details.
Apr 20 2012,
@19: It's not that Apache 1.3 can only handle 10 concurrent connections, it's that my backend, which incidentally runs on and cannot easily be separated from Apache 1.3, can only handle 10 concurrent connections without causing the host to run out of memory. Apache provides two functions in my environment. It is both the container for my backend app, and also, since this is the easiest configuration to set up, the front-end web server. No matter what container I put my backend in, it will only be able to handle 10 concurrent connections unless it is completely redesigned. However, I could (and, indeed, should) separate the front and backend; I could stand up a separate front-end web server that accepted connections from the internet and proxied requests to my backend. Since the front-end would not be preconnecting to my backend, I would not starve backend connections, and since the front-end's resource footprint would be small, it could easily handle a large number of concurrent connections. This is the viable workaround I was referring to in @16. Put another way, the issue is not that I should upgrade away from Apache 1.3, the issue is that I should separate my front and backend systems. However, this would require a fairly significant amount of work for me. If Chrome would recognize my environment was not able to handle pre-connections, I could put off this work in favour of more urgent tasks for a bit longer. Additionally, I can imagine situations where it may not be possible to raise the maximum number of concurrent connections. I would hope Chrome could detect when it's communicating with a server that has limited connection slots, and configure itself so it doesn't perform what amounts to a DoS against that server. Thank you, Dan Sterling
Apr 20 2012,
@20: Thanks for the explanation. That makes much more sense. Note that the way the network predictor system works is it analyzes how many concurrent URLs we are loading for www.foo.com, and uses that to feed back into how many connections it thinks we need to load the page, and will preconnect them when revisiting the site. Therefore, *ROUGHLY* speaking, we will never learn to preconnect X connections unless we have previously seen ourselves using X connections. Also, if the server is overloaded, why does it not return an error code? Chromium will understand 5XX error codes and will back off before retrying. For httpbis work on this matter, check out http://trac.tools.ietf.org/wg/httpbis/trac/ticket/255. Chromium probably ought to try to do better to identify cases where servers are poorly designed like this and thus cannot handle much concurrency at all. I'm skeptical we will prioritize this very highly since this is really the 1% or the .1% or whatever of servers. In summary, Chromium should definitely do a better job detecting when we've overpreconnected and thus have wasted idle connections. But in terms of pure server overload, it's probably best if the server indicates its overload situation and instructs the client to back off. And yes, there's the open question of whether or not we should use Connection: close as a signal not to preconnect. I'm tentatively in favor of adopting rsleevi's suggestion here, although I'd like to see data on this impact.
Apr 20 2012,
@21: I do appreciate that Chrome tries to only preconnect when it's likely the connection will be used. In my case, users may generate a fairly large number of connections for a period, and then leave a tab idle for a long period of time. This causes preconnect to open 2, 4 or possibly more connections that then sit idle until they hit the server-side timeout. You're right that it would be best if Apache served a 503 when MaxClients is hit. Instead, new connections simply timeout. Do you know if Chrome backs off when it sees a socket connection timeout, as it would if it saw a 503? Also, the clients that are causing issues may not see a timeout or 503, since they already have connections they can use, and can always immediately reconnect after using a preconnected socket, since the act of using it frees a slot. As for using Connection: Close as a signal, perhaps it could be used to implement less agressive preconnects, rather than completely disabling them. For example, preconnected sockets could be limited to 1 or 2, and/or have a short timeout, say 10 seconds? I appreciate the balance between optimizing for speed and respecting the server's Connection header; that is, I understand the desire to ignore the fact that "Close" may mean "I don't want idle sockets" -- so perhaps a balance could be struck. Certainly what I'm suggesting would be harder to implement, though. One final thought -- the number of servers on the "open web" that are impacted by idle sockets may be small, but there may be more servers on the closed or semi-closed web (e.g. intranet servers, or servers with a restrictive robots.txt) that are more impacted by this. Given that, it may not be easy to collect data regarding Chrome's interactions with those servers.
Apr 20 2012,
willchan,jar: Just so it's not lost in the discussion, I think comment #13 raises a real point about there being a probable bug/design issue with regards to the preconnect logic. jar indicated in comment #5 that preconnected-but-idle sockets should disconnect within ~10 seconds (typically). It sounds like the act of the SSL handshake is throwing off (for the TCP socket pool) the IsConnectedAndIdle() calculation, and the "10 seconds for preconnected sockets" logic isn't being applied to the SSL pool. This is what I was trying to capture in comment http://code.google.com/p/chromium/issues/detail?id=85229#c7 , and which jar hinted at in http://code.google.com/p/chromium/issues/detail?id=87121#c19
Apr 20 2012,
There's also been a change, to improve battery life on mobile, where we only run the 10 second timer on Windows (It has to be run on windows because we don't read data on "idle" sockets, and keeping unread data around too long on XP can result in BSODS). On other platforms, we now only check for idle sockets that need to be closed when something requests a new socket, which could have implications for servers with low connection limits.
Apr 20 2012,
@22: First, let me say I appreciate the rational discourse here. You seem very reasonable and make very valid points. To your first point about temporary spikes, I agree that that is bad. I characterize that as us learning the appropriate number of connections incorrectly. We should fix that. As for connection timeouts, no, we do not retry. Now that you mention it, connection timeouts are a good signal and we should feed that back into the network predictor subsystem so it learns to connect fewer. As for the Connection: Close comment, I should note that we do timeout preconnected sockets that are idle soon. They should be closed within 10-20 seconds (we set the timeout at 10s for unused idle sockets and have a 10s periodic timer to reap timed out sockets). As to the open web vs intranets, I agree about that. It may be the case that, for intranet servers, we should simply disable preconnect. Preconnect's primary use is in mitigating the initial RTTs in connection establishment. In intranets, where RTTs are low, perhaps it's best to simply disable preconnect. Note my comment applies to intranets, not the public servers with restrictive robots.txt. Just to be clear, we recognize we're making tradeoffs here. Clearly preconnect is suboptimal for some fraction of our users. We should fix any obvious bugs, as have been pointed out by yourself and others on this thread. But any global changes where there aren't good signals to identify resource-constrained servers must be evaluated against the significant overall benefit for the vast majority of the open web. As I noted, the benefits of preconnect are quite substantial, so we're very unlikely to adopt solutions that would dramatically reduce its effectiveness. But we definitely do want to fix any bugs and will happily take suggestions for good signals to clamp down or outright disable preconnect for certain servers.
Apr 20 2012,
rsleevi/mmenke: Thanks for making these points. I think we're at the stage now where the thread is getting long and we've identified several areas that clearly need fixing. We should file separate bugs for the individual issues and mark them as blocking this bug. Ryan, can you file a bug for the IsConnectedAndIdle() issue for sockets with SSL handshakes fooling our "previously used" check? Later on today, I'll go through the bug and note other issues and file bugs for them unless someone else beats me to them.
Apr 20 2012,
#21: Also, if the server is overloaded, why does it not return an error code? Apache can't do much about this. Once all the server slots are used up, game over. If you have 10 slots, because you are running fat servers on a relatively small application, and say, three Chrome users click at the same time, all the slots go away instantly. At that point, you can return no resources, but you have three users locking up your 10 servers. Like I mentioned, I see Chrome here "behaving badly" and if I could do something with BrowserMatch, I would, but there's no way to distinguish between a pre-connect request and a regular one unless, say, Chrome put something in an X-* header for pre-connects. #22: servers with a restrictive robots.txt What could we put in our robots.txt to stop pre-connections? I think the "intranet" point is a bit of a red herring. It's the "large application" problem in small environments. We size our system based on what (up until now) was a normal mechanism. Browsers opened connections when they wanted something. With pre-connections, you have to size your system for 4x the number of connections, and you can assume that most of the time only a small percentage of the connections are doing something. That's, I'm pretty sure, how Google web servers work. However, afaik, I don't think Apache supports this concept out of the box. I would love it to store and forward in its proxy, but it doesn't do that. Rather, it immediately opens a back-end connection as soon as a front-end connection opens. This probably could be avoided by having a better proxy, but afaik, there isn't an OSS proxy that supports store-and-forward(?). Thanks, Rob
Jun 20 2012,
Poke after two months. I still can't find more bugs referring to this one so I think no one beat willchan two months ago. Now when my services run on new hardware and software, and services are again opened for Chrome browser, I can see what harm Chrome preconnection could do for poorly designed systems. I attached Apache server status screenshot and can guarantee that all Apache processes/threads in "Reading Request" state are waiting on Chrome preconnected connections. I see no harm here for our system cause Apache application module uses shared connection pool to back-end resources, but in previous version of same module every Apache child had own connection and situation like this would lead to resource exhaustion.
Feb 12 2013,
I put together a quick and dirty perl script to monitor apache 1.3 using the server-status URL, and kill httpd processes that are serving preconnections when a threshold is reached. This works around the issue for me for now. Here's the script: https://gist.github.com/eqhmcow/4774549
Mar 10 2013,
Jul 1 2013,
We have an embedded webserver running in a microblaze on a vertex 6 - with LwIP and the limited resources - these speculative preconnects use too many resources in our case and I would vote to add a parameter to start chrome so it doesn't do this or http headers or something to specify not to use it or to set a maximum number of speculative preconnect sockets. I will continue to tweak my c to try and provide enough resources for chrome - but in the end - it may not be possible.
Jul 2 2013,
We have the same problem in our embedded products. We have modified the source code of the webserver for send a "408: Request Timeout" and for close the Socket. But it doesn't work! It seem that the browser ignores both the status code and the TCP FIN. The only thing to do seem to be an TCP RESET... but it's not so fine. Please consider to disconnect the socket after an 408 error code and adjust the type of connections after this answer. Could this be the solution?
Jul 2 2013,
re: comment 32: sending 408 "response." Until the browser sends a request, it won't listen for (try to read) a response. As a result, jamming a 408 into the socket before getting a request won't induce a teardown. In fact, it will leave some buffered data in the remote (client) end of the socket, waiting to be read. More typical is to teardown the socket if you don't get a request in 10 seconds. re: comment 31: Putting a limit on the maximum number of speculative preconnect sockets. The speculative preconnects are already bounded (restricted) by the rule to never have more than 6 connections to a single host. That may be higher than you desire, but there is a clear limit. Finding a way (header proclamation? other?) to further constrain this limit, especially for preconnects, seems reasonable.
Oct 14 2013,
I never had this problem on my servers with Centos 5 (kernel 2.6.18) and Apache. As soon as i moved up to Centos 6 (kernel 2.6.32), tons of held reading requests and 408s all over the place. My avg. concurrent requests per server have have gone up 500% and throw off all my monitors and scalability. To expect system admins to adjust timeouts for this is completely unreasonable. Who knows how many permutations of dynamic content serving are out there and the time required to serve such content. Personally, I believe this is combination chrome/apache/kernel problem and all parties need to get involved to either fix it or do away with it. Speaking frankly, i think speculative preconnects are an abomination. Hogging web server resources in cause you MIGHT do something? That's just plain wrong any way you slice it. This is 1 step away from a DoS attack and I can't believe there hasn't been an uproar over it.
May 22 2014,
The issue we are seeing is that it seems like pre-connect has recently gotten more aggressive. Due to our 15 second http request timeout (a connection must send a request in less than 15 seconds) our customers are seeing more 408s. Ideally we could find a transparent way to avoid the customer noticing these 408s. Frankly, some sort of reasonable disconnect/reconnect on 408 (perhaps on focus) scheme would be fine for us.
May 23 2014,
Sending a HTTP response when there's no HTTP request sounds buggy. Why don't you just close the connection?
May 23 2014,
Browsers will show a blank page or an error page if the connection is closed while the http request is in flight, so this still has the potential for a poor user experience
May 23 2014,
There's no way for a browser to be sure if a stale socket was timed out by the server or not if a connection is just closed. I'd assume browsers retry the request in that case - Chrome certainly does.
May 23 2014,
Chrome's tendency to open many connections without closing them requires workarounds that affect all browsers, so saying chrome still works in this case partially misses the point
May 23 2014,
Actually, I said other browsers probably do this, too.
May 23 2014,
So chrome causes an issue that other browsers probably handle OK; at the very least, citation needed? At worst, and to be clear chrome can cause this to happen, chrome eats up all the available server slots and no other browsers can even connect.
May 23 2014,
If the server sends a FIN to a browser, this one can work out that any other request cannot be answered. This can be used to time out a socket, server side.
May 23 2014,
Per my comment, there's no way for a browser to know if a stale socket that was closed was timed out by the server, or was closed because the server was unhappy for some other reason, such as not liking the original HTTP request. Other browsers work, therefore, presumably they retry in this case (Since some servers do time out sockets aggressively), or don't use stale sockets (Or don't use stale sockets that have never been used before). So there's most likely no problem with timing out unused sockets (Or used sockets).
May 23 2014,
Upgrade your server to SPDY / HTTP2 and then you can send a GOAWAY frame to gracefully shut down the connection and notify the peer the last accepted request (which eliminates this race). It will also lead browsers to only open a single connection, thereby solving this multiple connection issue.
May 23 2014,
Right, the problem is that this issue only affects old (or simple, such as embedded) servers. Saying that the fix for this bug is that old servers should stop existing definitely misses the point
May 23 2014,
Yes, good idea... But unfortunately our embedded server with 256kB of ram and 512 kB for code space (yes, kilobyte) can't manage nothing else than HTTP 1.0. We can have only 8/10 sockets. They are ok for 10/15 simultaneous users from other browsers but if one user uses chrome...
May 23 2014,
Since a 408 also contains a "Connection: close" header, wouldn't it be enough to check for 408 responses on the preconnect sockets and close them, as the RFC requires? I think this would already fix most of those issues, without hitting the race condition. Anyway, this is what Mozilla's did 9 years ago about this: https://bugzilla.mozilla.org/show_bug.cgi?id=248827
May 23 2014,
408's kind of weird - I don't recall anything in the HTTP spec about reading a response before request headers were even issued. We don't actually have anything sitting around to try and read from the stream before the request was issued, so this would be surprisingly major architectural change. And as already noted, in the connection-keep-alive case, servers generally just close the socket (Possibly at the same time a new request was issued), and browsers have to be able to handle that, anyways. And then what do we do if we get some other response code? Just wait around for another request to come in, and then randomly assign the received response to that request? Just close the socket? That having been said, retrying on 408 may be reasonable, though there are still a whole slew of questions if we did that (What do network extensions see? What does devtools see? What does NavigationTiming mean in this case, etc)
May 25 2014,
Let's not lose sight of the main issue here: The problem: chrome opens many speculative connections, overwhelming a server and causing a DoS. Chrome does not close these sockets for many 10s of seconds or more. Possible solutions: * user can tell chrome not to overwhelm a given server (somehow? probably difficult or unintuitive) * server can tell chrome not to preconnect (via a header? or by closing sockets instead of letting pre-connect sockets stick around forever?) and chrome can act on this by no longer preconnecting to that server * chrome can otherwise learn not to make speculative connections to servers (somehow? probably difficult) Right now I'm continuing to work-around this issue by using a script to hit apache 1.3's server-status page, parsing it, and using signals to kill processes that are being held open by preconnect. If I don't do this, users using chrome quickly use all available server slots, causing a DoS for all users except those chrome users with open connections. This, as recently noted in this bug's comments, is also a problem for embedded servers with a limited ability to accept multiple connections.
May 25 2014,
Hrm...On Windows, we close never used sockets after about 10 seconds, using a timer. On other platforms, we close never used sockets after 10 seconds, but only when a socket is being requested (This is to save battery life, primarily on mobile, by only ramping up the radio when needed). Used sockets have a much longer timeout. I assuming we're talking about non-Windows clients here?
May 25 2014,
Oh, and Windows has the different behavior because XP has a crash issue when sockets are kept around with unread data, and we don't try to read from sockets while they're not being used by a request. If it weren't for the XP crash issue, we'd use the same behavior everywhere.
May 26 2014,
The poor handling of 408 responses when the server's closing a never-used speculative connection seems tangential to the main thrust of this issue (server impacts and management of speculative connections), so I've opened #377581 for that.
May 27 2014,
Thanks for forking the 408 issue to a separate thread. I've commented there and acknowledged the lack of 408 support in Chromium. That's related to this issue here, but is distinct, so please keep the 408 discussion over there.
Apr 27 2015,
So, 4 years and Chrome/Chromium is still configured to DoS low-resource servers? I'm surprised. (And disappointed as the operator of a tiny VPS effectively DoS'ed by my own 4 users...)
Aug 22 2015,
Just run into this problem when trying to reload pages served from a WiFi module with a low-resource HTTP server (only supports a single HTTP request at a time). It would be really good if chrome could somehow be told that it shouldn't try and open a bunch of sockets which are all going to fail.
Aug 24 2015,
Issue 523234 has been merged into this issue.
Oct 19 2015,
Hi, I also ran into the speculative connection problem with a low resource server in an IoT application. One possible solution could be for Chrome to NOT open speculative connections if the main server port is something other than 80 (for http) or if connecting to a server on the local LAN segment as the browser. For an internet accessible embedded server it's a bad idea anyway to be publicly accessible on port 80, because web-crawlers and other creatures can easily overwhelm the already low resources. Also the majority of accesses to embedded IoT servers (like appliances and other connected household items) happen within the local LAN.
Oct 19 2016,
This issue has been available for more than 365 days, and should be re-evaluated. Please re-triage this issue. The Hotlist-Recharge-Cold label is applied for tracking purposes, and should not be removed after re-triaging the issue. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Jun 13 2017,
The perfect solution I think is to have dns prefetch but no tcp preconnect. However it seems chrome wants to be the fastest browser even at the cost of poor implementation.
Sign in to add a comment