QUIC - when QUIC_SESSION wins we orphan TCP job, but we create HTTP2_SESSION |
||||
Issue description
QUIC - when QUIC_SESSION wins we orphan TCP job, but we create HTTP2_SESSION which does nothing other send flow control messages.
The following is an example:
18: HTTP_STREAM_JOB
t=398 [st= 0] +HTTP_STREAM_JOB [dt=405]
--> alternative_service = "Uninitialized :0"
--> original_url = "...."
...
t=398 [st= 0] HTTP_STREAM_JOB_DELAYED
--> resume_after_ms = 300
t=690 [st=292] HTTP_STREAM_JOB_ORPHANED
t=702 [st=304] +HOST_RESOLVER_IMPL_REQUEST [dt=0]
--> address_family = 0
...
t=802 [st=404] HTTP2_SESSION_POOL_IMPORTED_SESSION_FROM_SOCKET
--> source_dependency = 29 (HTTP2_SESSION)
...
29: HTTP2_SESSION
t= 801 [st= 0] HTTP2_SESSION_INITIALIZED
--> protocol = "h2"
--> source_dependency = 28 (SOCKET)
t= 801 [st= 0] HTTP2_SESSION_SEND_SETTINGS
--> settings = ["[id:3 flags:0 value:1000]","[id:4 flags:0 value:6291456]"]
t= 802 [st= 1] HTTP2_STREAM_UPDATE_RECV_WINDOW
--> delta = 15663105
--> window_size = 15728640
t= 802 [st= 1] HTTP2_SESSION_SENT_WINDOW_UPDATE_FRAME
--> delta = 15663105
--> stream_id = 0
...
t= 826 [st= 25] HTTP2_SESSION_RECEIVED_WINDOW_UPDATE_FRAME
--> delta = 983041
--> stream_id = 0
t= 826 [st= 25] HTTP2_SESSION_UPDATE_SEND_WINDOW
--> delta = 983041
--> window_size = 1048576
...
t=141278 [st=140477]
,
Mar 28 2016
> If we orphan the TCP job, how do we create a HTTP2 session, :-). Though we call orphan routine, it disconnects the TCP connection only if there is a blocking job. If there is no blocking_job, TCP job is not orphaned and the TCP job continues (I have found some cases, where it could do certificate verification in some net-internals logs). I am guessing it might have been done to keep a warm http (I am guessing this code was implemented when we were trying HTTP vs SPDY) connection. I will find the root cause of this issue this week. https://code.google.com/p/chromium/codesearch#chromium/src/net/http/http_stream_factory_impl_job.cc&rcl=1459164587&l=333
,
Mar 28 2016
Delay TCP is already working fairly well, but it sounds like this might improve it further.
,
Jun 21 2016
Ryan and I were talking about this issue: One possible reason could be delay_tcp_race time estimate is not long enough and the main job proceeds to create a socket. https://drive.google.com/open?id=0BxE5tbyxGBZVTXFHd1hvMlZzVHM is the an example of old net-internal logs. 111: HTTP_STREAM_JOB https://accounts.google.com/ Start Time: 2016-01-15 16:23:29.256 t=14316 [st= 1] HTTP_STREAM_JOB_DELAYED --> resume_after_ms = 113 t=14473 [st=158] +HOST_RESOLVER_IMPL_REQUEST [dt=0] --> address_family = 0 --> allow_cached_response = true --> host = "gfe-vl.l.google.com:443" --> is_speculative = false t=14473 [st=158] HOST_RESOLVER_IMPL_IPV6_REACHABILITY_CHECK --> cached = true --> ipv6_available = true t=14473 [st=158] HOST_RESOLVER_IMPL_CACHE_HIT t=14473 [st=158] -HOST_RESOLVER_IMPL_REQUEST t=14474 [st=159] +SOCKET_POOL [dt=170] t=14569 [st=254] HTTP_STREAM_JOB_ORPHANED t=14643 [st=328] SOCKET_POOL_BOUND_TO_CONNECT_JOB --> source_dependency = 126 (CONNECT_JOB) t=14644 [st=329] SOCKET_POOL_BOUND_TO_SOCKET --> source_dependency = 128 (SOCKET) .... t=14648 [st=333] HTTP2_SESSION_POOL_IMPORTED_SESSION_FROM_SOCKET --> source_dependency = 134 (HTTP2_SESSION) t=14653 [st=338] -HTTP_STREAM_JOB 126: CONNECT_JOB ssl/accounts.google.com:443 Start Time: 2016-01-15 16:23:29.415 t=14474 [st= 0] +SOCKET_POOL_CONNECT_JOB [dt=169] --> group_name = "ssl/accounts.google.com:443" t=14474 [st= 0] +SOCKET_POOL_CONNECT_JOB_CONNECT [dt=169] t=14474 [st= 0] TCP_CLIENT_SOCKET_POOL_REQUESTED_SOCKET --> host_and_port = "accounts.google.com:443" t=14474 [st= 0] +SOCKET_POOL [dt=73] t=14547 [st= 73] SOCKET_POOL_BOUND_TO_CONNECT_JOB --> source_dependency = 127 (CONNECT_JOB) 114: QUIC_SESSION accounts.google.com Start Time: 2016-01-15 16:23:29.261 ... t= 14320 [st= 0] QUIC_SESSION_CRYPTO_HANDSHAKE_MESSAGE_SENT --> CHLO< ... t= 14565 [st= 245] QUIC_SESSION_CRYPTO_HANDSHAKE_MESSAGE_RECEIVED --> SHLO< ... t= 14571 [st= 251] QUIC_HTTP_STREAM_SEND_REQUEST_HEADERS --> :authority: accounts.google.com ... Our delay estimate was 113ms. But it took longer and we have orphaned the job at 254ms (because we didn't do 0RTT and we send request headers at 251ms). Our RTT estimate is almost correct (1.5 * ~70 ms). Main job started proceeding after 113ms and gets a socket from socket_pool at 159ms and does a CONNECT.
,
Jun 21 2016
In that case, the delay just isn't long enough, I agree. If this is the core problem, then there are two options I can think of: 1) Cancel the HTTP2 job when either a second CHLO is sent or an SHLO is received. - This ensures an unnecessary HTTP2 connection isn't created when the QUIC handshake is progressing. 2) Use 1.5 * handshake time instead of 1.5 * RTT, since in this case, we spend a full RTT waiting for cert validation. I suspect this would be less reliable, though.
,
Jun 21 2016
When I look at requests for play.google.com that were delayed, I notice that a few were delayed by 107ms or so, but many are delayed by 5 or 20ms, which seems much smaller than an RTT. Any idea why?
,
Jun 21 2016
Also, we delay HTTP jobs even when there's an HTTP2 session already open. That doesn't sound ideal for end user latency. 6732: HTTP_STREAM_JOB https://hangouts.google.com/ 6735: HTTP_STREAM_JOB https://hangouts.google.com/
,
Jun 22 2016
Hi Ian, I was going through old internal logs created in 2016-01-15. Since then we had fixed " if the TCP job hasn't started, then delay the job by the given delay when it starts." bug which was fixed on 01/23. https://codereview.chromium.org/1586513002/ Wanted to share with Ryan what I had seen earlier. Will see if I could duplicate the problem you had mentioned in comment#4 (my guess is above CL fixed that bug).
,
Jun 22 2016
Good to know, I should have looked at the log date.
,
Aug 31 2016
,
Apr 24 2017
,
Apr 25 2018
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot |
||||
►
Sign in to add a comment |
||||
Comment 1 by ianswett@google.com
, Mar 27 2016