New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 377581 link

Starred by 47 users

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2014
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Chromium does not handle 408 responses

Reported by smad...@stackoverflow.com, May 26 2014

Issue description

UserAgent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36

Example URL:

Steps to reproduce the problem:
1. Connecting client visits site, speculative connections created and left open.
2. Remote server closes never-used connections after configured timeout interval by sending a 408 HTTP error, with the FIN flag set.  This is the default behavior for HAProxy.
3. Operating system ACKs the FIN packet, but Chrome doesn't see it or the 408 error, leaving it in the receive buffer ( https://code.google.com/p/chromium/issues/detail?id=85229#c33 )
4. User sends new request. There's a window of time that seems to be around 1 second from the time the 408 is received where a new request, attempting to use this connection that's been closed from the server (that data has already been received, it's not in flight), will send the request after the 408 response has been sitting in the receive buffer, and use that 408 response as the apparent response to the request.

What is the expected behavior?
Realize that the connection has been closed by the server and don't give the user the 408 error as the apparent result of their request.

RFC 2616 Section 8.1.4:

Clients and servers SHOULD both constantly watch for the other side of the transport close, and respond to it as appropriate.

This means that clients, servers, and proxies MUST be able to recover from asynchronous close events. Client software SHOULD reopen the transport connection and retransmit the aborted sequence of requests without user interaction so long as the request sequence is idempotent (see section 9.1.2).

What went wrong?
Chrome does not properly respond to the closure of the server connection per RFC if the closure also contains an HTTP response, and does not properly handle re-sending the request through a working connection, instead incorrectly assuming that the error response is the response to a newly sent request.

Did this work before? Yes I believe this problem was not present in 34 and older.

Chrome version: 35.0.1916.114  Channel: stable
OS Version: 6.3
Flash Version: Shockwave Flash 13.0 r0

See also:
https://code.google.com/p/chromium/issues/detail?id=85229#c48
http://blog.haproxy.com/2014/05/26/haproxy-and-http-errors-408-in-chrome/
http://marc.info/?l=haproxy&m=140111366111398&w=2
 
neverused.png
22.4 KB View Download
badrequest.png
58.9 KB View Download
chromedebugger.png
17.8 KB View Download
Labels: -Cr-Internals-Network -OS-Windows Cr-Internals-Network-HTTP
Status: Available
Summary: Chromium does not handle 408 responses (was: Chrome not obeying connection closure with 408 error from server closing speculative connections)
I agree that it's problematic that Chromium does not handle 408s, irrespective of this specific preconnect issue. I've retitled the bug accordingly. 408s by themselves should never result in a user visible error page.

Comment 3 by james@wheare.org, May 27 2014

Having this exact issue on irccloud.com too. Just for clarification: if the 408 response contains no body does Chrome still fail to detect the connection close? I'm wondering if configuring HAProxy to send an empty response would be a viable workaround for us. Not sure if there's a way to disable sending *any* response headers in that case, but the body should be overridable.

Comment 4 by james@wheare.org, May 27 2014

Also, re #2: to emphasise the real problem here: showing a 408 error page would be a complete non-issue without this broken preconnect behaviour, which results in users complaining that our site is down https://twitter.com/gryzzly/status/471229530611277826

Comment 5 by james@wheare.org, May 27 2014

Followup to my question in #3: just read the "see also" links and yes, an empty response appears to be a workaround. Add this to HAProxy conf

    errorfile 408 /dev/null

Cc: sdayala@chromium.org
Labels: Needs-Feedback
smadden@,Thanks for filing the issue.Seems to be related network issue.
Most of the issue raised in (M35) i.e.,35.0.1916.114 channel :stable are fixed in Latest chrome versions :(M36)36.0.1985.18 (Official Build 271517) and Latest canary :(M37) 37.0.2014.2 (Official Build 272841).Can you please try with the above mentioned Latest chrome versions and let me know if you still encounter the same issue.

Note :please make sure that your chrome settings should be a default browser settings.
If possible send me a demo video this time if problem persist.

Labels: -Needs-Feedback
James, 85229 is probably the correct issue for discussing preconnect related issues. This specific bug is for 408 support.

sdayala: no more user feedback is needed. I've triaged the bug already.

Comment 8 by mmenke@chromium.org, May 27 2014

We check used idle sockets with IsConnectedAndIdle before using them (And reused ones with IsConnected, with a 10 second timeout).  I'm a bit surprised this sounds to be so reproducible, as both situations seem they should be rather racy.
I suspect we hit the race when we send a request and they send a 408 before receiving the request. So it won't trip up the IsConnectedAndIdle() check.
Checking 36.0.1985.18, the problem with speculative connections actually is resolved for servers which wait until after 10s to time out the never-used connection. The reason for this is that the 10 second client side timeout that was not occurring in 35 (see comment #1, image neverused.png) is now happening again in 36 (see attached image); the server won't send a 408 in this case as the client has closed the connection before the server.

However, the problem ought to still be present if the server's 408ing and closing the connection before that 10 seconds, so it's probably still worth making the handling more robust.

re: race condition, see comment #1 attachment badrequest.png; it's not much of a race due to the socket not being read until after the request is sent.  In the packet capture (run on the client), the 408 arrives nearly a second before the request tries to use the socket, then perceives the 408 response that's been sitting buffered for nearly a second as a response to the just-sent request.

Checking the socket buffer for a 408 response before using the socket would narrow (but not eliminate) the time window in which a user could hit this problem; they'd have to have the request and 408 response bytes cross on the wire.

Full support of automatically retrying a request on 408 as Will seems to be leaning toward would resolve both conditions.
36.png
17.4 KB View Download
Owner: mmenke@chromium.org
Status: Assigned
I agree that we should support this, was just trying to figure out why it sounded to reproducible.  Timing out the connection in <= 10 seconds certainly explains it, since we don't check for unread data in the case we have a socket that was never used (This was a fairly recent change, to make preconnect work with SPDY, where we could receive a settings frame before sending any requests).

I expect this to be a pretty easy fix, just need to make sure we can't end up in endless retry loops (Need to retry on stale unused sockets and previously used sockets, but not on fresh sockets).
> "I suspect we hit the race when we send a request and they send a 408 before receiving the request."

Is there a way to "cleanly" resolve this in HTTP/1? 

With SPDY & HTTP/2 the server can report the "last stream ID" via GOAWAY to indicate the last processed stream (or lack of thereof..) and determine if the 408 belongs to the request we just fired or if it was already in flight... But in HTTP/1 if the client and server race the request/response, I don't think we ever have enough information to make a meaningful call? ... Unless there is another/dedicated HTTP error status that can distinguish "I've received request bytes but you didn't finish in allotted time" from "I did not receive any request bytes hence this timeout error"?

With just the 408 + HTTP/1, I don't see how we can "solve" this.. We can tighten the conditions under which this race happens, but not eliminate it? Seems like for HTTP/1 the sane(r) approach would be for the server to close the TCP connection after issuing 408 -- this wouldn't resolve the issue, but it's an easy and immediate fix that tightens the window where this race can occur (no need to wait for any browser "fixes"). Also, I suspect this behavior is true of all browsers, since all of us preconnect sockets, etc.

Cc: igrigo...@chromium.org
> Seems like for HTTP/1 the sane(r) approach would be for the server to close the TCP connection after issuing 408

This is what is happening. In the badrequest.png screenshot, it shows the FIN being set on the 408 response packet.
re: closing the connection, see the badrequest.png image in comment #1 - the server does set the FIN flag on the packet that includes the 408 response, but that isn't respected.

With regard to effectively solving this in HTTP/1, see the second RFC2616 quote in the report; as long as the request is idempotent, the request should be silent retried.

So, under HTTP/1 this issue can be reduced to only being bubbled up to the user if a) the 408 and the user's request cross on the wire, and b) the verb for the request is non-idempotent.
I'm not sure we need to check if it's non-idempotent.  If we're getting a 408, presumably the server didn't handle the response, anyways.

There's a similar race when reusing sockets in general, and we always retry in that case, even though we may get an RST, which is rather ambiguous.
Our test for whether or not a connection is closed may not be noticing the fin when there's still unread data on the socket...  May be worth looking into fixing that as well, if that's the case.
Not sure what the exact standards status of this doc is, was linked in the HAProxyu mailing list, but: http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-26#section-6.3.1

   A user agent MUST NOT automatically retry a request with a non-
   idempotent method unless it has some means to know that the request
   semantics are actually idempotent, regardless of the method, or some
   means to detect that the original request was never applied

Comment 19 by Deleted ...@, May 27 2014

@mmenke: the purpose of the 408 is *exactly* to tell you that you can safely retry, because it guarantees the request was not and will not be processed, which is different from a sole connection close which prevents you from retrying a non-idempotent request.

Comment 20 by Deleted ...@, May 27 2014

one additional information: if the 408 response + FIN happens on a connection where Chrome subsequently requests an image to be incorporated in the website (<img src="abc.png" />), the whole chrome processes (tab) completely freezes and doesn't recover by itself (current stable - V35). Chrome after a while kills the tab with "He's Dead, Jim!":
https://support.google.com/chrome/answer/1270364?hl=en

Memory and CPU load stays stable however while the process is freezing.

Comment 21 by Deleted ...@, May 27 2014

Also, concerning the close detection, checking for FIN with data still unread is not enough, as you may randomly get an asynchronous RST when starting to send if the connection is fully closed on the other side.

I stand corrected, thanks again Willy!
smadden@stackoverflow.com:  Unfortunately, strict adherence to that would make it impossible to ever send a POST on anything but a fresh socket, since we have no visibility into the server's timeout logic.

tribuslukas58:  Since a renderer process is locking up, and HTTP logic is all in the renderer process, that's a completely separate issue.  Please file a new bug for it.
tribuslukas58:  Certainly better checking for a socket being closed won't fix the problem, but a better check may help reduce the occurrence of similar issues.

Comment 25 by Deleted ...@, May 27 2014

@mmenke: for the POST on non-fresh socket, that's perfectly true and is exactly the purpose of the 408, which is to inform the client that there's no risk retrying.

Comment 26 by Deleted ...@, May 27 2014

@mmenke: do you have a simple way to decide to retry upon a 408 ? It seems you already have this ability when you detect that the connection died after sending the request, so if you can apply the same logic when you find that you got a 408, you should be able to seamlessly get rid of the issue.
willy:  Indeed, that's exactly what I plan to do (See comment #11).

I think getting a better signal about an idle socket being closed is also worth investigating, however.

Comment 28 by Deleted ...@, May 27 2014

I remember that I had to silently close the idle persistent connections in haproxy as most servers do, precisely because some browsers used to display the 408 on persistent connections (the bug that Mozilla fixed in 2004). I don't know what's the current state of the deployed browser ecosystem regarding this though. And it's a shame because 408 provides all the details allowing a POST to be safely sent over a keep-alive connection, which is particularly interesting for login pages over 3G links!

Comment 29 by Deleted ...@, May 27 2014

Concerning comment #11, it might be worth retrying as well on fresh sockets if you manage to store a retry counter somewhere. The reason is that many sites are used to run with very low request timeouts inherited from the ages of pre-forked servers, and these servers are sensible to packet drops on the request path. And some normal users seldomly get 408 on the first request for sites running with a 5s request timeout. A highly loaded ADSL line or a 3G connection under poor radio condition can easily trigger this. And again, better hide all the transport details to the user as much as possible if it's safe to retry.
Willy:  Certainly a valid point, but I consider better handling of servers that very aggressively timeout unused connections a separate issue, which may or may not be worth tackling - I think we'd want to gather stats on cases that look somewhat like that, before taking any action.

Comment 31 by Deleted ...@, May 27 2014

Yes I think that's reasonable. Also, on this specific case (request over fresh connection) I think you're not the only browser to report the 408 error, so your efforts won't be enough to improve the web's reliability :-)

Project Member

Comment 32 by bugdroid1@chromium.org, Jun 4 2014

------------------------------------------------------------------
r274760 | mmenke@chromium.org | 2014-06-04T10:55:54.640025Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/net/http/http_network_transaction_unittest.cc?r1=274760&r2=274759&pathrev=274760
   M http://src.chromium.org/viewvc/chrome/trunk/src/net/http/http_network_transaction.cc?r1=274760&r2=274759&pathrev=274760
   M http://src.chromium.org/viewvc/chrome/trunk/src/net/base/net_log_event_type_list.h?r1=274760&r2=274759&pathrev=274760

Retry requests on reused sockets when receiving 408 responses.

408s indicate a socket was left idle for too long before
sending a request.

It's possible these errors are being surfaced to users more often
than previously due to https://codereview.chromium.org/169643006,
for servers that very aggressively time out never-used sockets.

BUG= 377581 

Review URL: https://codereview.chromium.org/303443011
-----------------------------------------------------------------
Project Member

Comment 33 by bugdroid1@chromium.org, Jun 4 2014

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d58ceea83f332d8958b10e2f21ae38ad15026e45

commit d58ceea83f332d8958b10e2f21ae38ad15026e45
Author: mmenke@chromium.org <mmenke@chromium.org@0039d316-1c4b-4281-b951-d872f2087c98>
Date: Wed Jun 04 10:55:54 2014

Retry requests on reused sockets when receiving 408 responses.

408s indicate a socket was left idle for too long before
sending a request.

It's possible these errors are being surfaced to users more often
than previously due to https://codereview.chromium.org/169643006,
for servers that very aggressively time out never-used sockets.

BUG= 377581 

Review URL: https://codereview.chromium.org/303443011

git-svn-id: svn://svn.chromium.org/chrome/trunk/src@274760 0039d316-1c4b-4281-b951-d872f2087c98


Status: Fixed
Labels: M-37

Comment 36 Deleted

Project Member

Comment 37 by bugdroid1@chromium.org, Jun 5 2014

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/bling/chromium.git/+/d58ceea83f332d8958b10e2f21ae38ad15026e45

commit d58ceea83f332d8958b10e2f21ae38ad15026e45
Author: mmenke@chromium.org <mmenke@chromium.org@0039d316-1c4b-4281-b951-d872f2087c98>
Date: Wed Jun 04 10:55:54 2014

Comment 38 by Deleted ...@, Jun 16 2014

Just a quick update, guys.

I'm not actually sure if the upper updates mean that the issue is supposed to be have been fixed but I can confirm it persists.

Thank you very much for your efforts!
It should be fixed in Chrome 37.

Comment 40 by Deleted ...@, Jun 16 2014

Aah, thank you so much for your reply! 

As I see currently I've been testing it with Version 35.0.1916.153m which is one idea older I guess. It still shows Chrome as being up to date so is the 37. still to be implemented? 

Thanks again! Really appreciate fixing the nasty bugger!
We have a 6 week cycle, and each version spends about 6 weeks in beta before release.  Chrome 36 is currently in beta.  So that gives you less than 12 weeks until it hits stable channel.

Comment 42 by Deleted ...@, Jun 17 2014

So that's how it's working.. 

Thank you for the clarification!
 Issue 386571  has been merged into this issue.

Comment 44 by Deleted ...@, Jan 15 2015

Hi,

As of this date with Chromium version 39.0.2171.95 m, the problem is still there (or has reappeared).

-dennis
Please file a new bug.

Comment 46 by tank1...@gmail.com, Jan 15 2015

And post link here, please
Also, worth noting we don't retry for fresh sockets.  If we create a socket and instantly send a request over it, only to get a 408, we don't retry.  If you're doing a synthetic test where you just send a 408 to the initial request and see if we retry, we won't.

If we create a socket, wait a while, and then send a request, we do.  We also do if we create a socket, successfully send a request over it, and then try to reuse the socket (Regardless of whether there's any delay between the two requests or not).

Comment 48 by Deleted ...@, Jul 19 2015

Still happening this to me

amberfx11:  Please file a new bug, with a description of the problem.
Components: Internals>Network
Components: -Internals>Network>HTTP

Sign in to add a comment