QUIC not marked as broken if 0-RTT was attempted and cached RTT is large |
|||||||||||
Issue descriptionIt appears that currently QUIC is only marked as broken by http_stream_factory_impl_job_controller.cc when the QUIC job fails. It is *not* marked as broken by quic_stream_factory.cc if the handshake is not confirmed. This seems like a significant regression.
,
Feb 13 2017
Hm. I can't seem to reproduce this. Repro steps: Launch chrome and use host-resolver rules to map the server to a particular IP Fetch a URL from the server Observe Alt-Svc mapping, and QUIC_SESSION created. use iptables to drop incoming UDP packets from that IP Reload URL Observe 0-RTT QUIC_SESSION created which has requests sent but receives no response Observe request fail and is retried Observe new 0-RTT QUIC_SESSION created in which handshake-confirmation is not required. When this session closes, Alt-Svc is marked as broken, as expected. :/
,
Feb 14 2017
,
Feb 14 2017
,
Feb 14 2017
After much looking at the code and the net-logs, I think we understand why QUIC is not marked as broken in this case. To avoid false positives, QUIC is only marked broken when a QUIC job fails and a TCP job succeeds. In the net-log from the user, the TCP jobs do not actually succeed. Instead, the appear to be stuck for ~30 seconds until they are cancelled. So that explains why QUIC is not marked as broken. However, it does not explain why the TCP jobs are hung. However, QUIC jobs are given a head-start on the TCP job by approximately 1.5 times the SRTT estimate for that server. Looking at the code, out best hypothesis is that the cache SRTT estimate for the server in question is larger than 30 seconds. This means that the TCP job has not had time to start before it is finally killed. This suggests a couple of fixes: 1) We should cap the head start to some sane value (5 seconds?) 2) When a QUIC connection fails, the TCP job should resume immediately and not wait for the head start timer to expire.
,
Feb 14 2017
To #5: Great work figuring this out! On the head start question: 5 seconds seems way more leeway than we need. Most RTTs (I think > 90th percentile) fall within 750ms. Is 1 second adequate?
,
Feb 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d27cf4ed111ab7f3bdab607fe48e4446d6cf15ca commit d27cf4ed111ab7f3bdab607fe48e4446d6cf15ca Author: rch <rch@chromium.org> Date: Wed Feb 15 06:50:11 2017 Add net-log entries when HttpStream jobs are forced to wait, and when they're delayed and resumed. BUG= 691688 Review-Url: https://codereview.chromium.org/2699433003 Cr-Commit-Position: refs/heads/master@{#450608} [modify] https://crrev.com/d27cf4ed111ab7f3bdab607fe48e4446d6cf15ca/net/http/http_stream_factory_impl_job.cc [modify] https://crrev.com/d27cf4ed111ab7f3bdab607fe48e4446d6cf15ca/net/http/http_stream_factory_impl_job_controller.cc [modify] https://crrev.com/d27cf4ed111ab7f3bdab607fe48e4446d6cf15ca/net/http/http_stream_factory_impl_job_controller.h [modify] https://crrev.com/d27cf4ed111ab7f3bdab607fe48e4446d6cf15ca/net/log/net_log_event_type_list.h
,
Feb 15 2017
AH-HAH! I remembered that Chrome sends the SRTT estimate in the IRTT field of the CHLO which means I can go look at the net-internals to see if the trace in fact has a crazy high SRTT estimate. Indeed it does! IRTT: 1088931519 That value is in microseconds, which corresponds to 1088 seconds. This completely explains the observed behavior! The proposed fixes in comment 5 will definitely fix the problem. Looking at other QUIC sessions in that log, I see some other sessions with crazy SRTT estimate (and some with reasonable SRTT estimates).
,
Feb 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ff2bfbfd76f3c43e5618addd34c82fed42d2a6f5 commit ff2bfbfd76f3c43e5618addd34c82fed42d2a6f5 Author: zhongyi <zhongyi@chromium.org> Date: Wed Feb 15 15:23:05 2017 Resume the main job immediately when QUIC job fails, not wait for the head start timer to exprie BUG= 691688 Review-Url: https://codereview.chromium.org/2693043008 Cr-Commit-Position: refs/heads/master@{#450705} [modify] https://crrev.com/ff2bfbfd76f3c43e5618addd34c82fed42d2a6f5/net/http/http_stream_factory_impl_job_controller.cc [modify] https://crrev.com/ff2bfbfd76f3c43e5618addd34c82fed42d2a6f5/net/http/http_stream_factory_impl_job_controller.h [modify] https://crrev.com/ff2bfbfd76f3c43e5618addd34c82fed42d2a6f5/net/http/http_stream_factory_impl_job_controller_unittest.cc [modify] https://crrev.com/ff2bfbfd76f3c43e5618addd34c82fed42d2a6f5/net/http/http_stream_factory_test_util.cc
,
Feb 16 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/326a7fdb3a840dd926b4b4ff14ca7fe24d87ffc0 commit 326a7fdb3a840dd926b4b4ff14ca7fe24d87ffc0 Author: zhongyi <zhongyi@chromium.org> Date: Thu Feb 16 19:54:24 2017 Cap the delay time for the main job when racing with QUIC for delayed tcp case so that if http_server_properties has cached a inappropriate srtt will not block the main job for a extremely long time. BUG= 691688 Review-Url: https://codereview.chromium.org/2690393005 Cr-Commit-Position: refs/heads/master@{#451066} [modify] https://crrev.com/326a7fdb3a840dd926b4b4ff14ca7fe24d87ffc0/net/http/http_stream_factory_impl_job_controller.cc [modify] https://crrev.com/326a7fdb3a840dd926b4b4ff14ca7fe24d87ffc0/net/http/http_stream_factory_impl_job_controller_unittest.cc
,
Feb 16 2017
The code landed in comment#9 and #10 should fix the issue. Do we need to merge the fix in comment 9?
,
Feb 16 2017
It's probably easiest to merge all 3 to make the patches apply cleanly. Can you request a merge?
,
Feb 16 2017
Sure. Shall we merge this to M57, which is in Beta currently and will go stable in a month?
,
Feb 16 2017
,
Feb 16 2017
Please tag with affected OSs. Thanks.
,
Feb 16 2017
,
Feb 17 2017
,
Feb 17 2017
Your change meets the bar and is auto-approved for M57. Please go ahead and merge the CL to branch 2987 manually. Please contact milestone owner if you have questions. Owners: amineer@(clank), cmasso@(bling), ketakid@(cros), govind@(desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Feb 17 2017
Please merge your change to M57 branch 2987 before 5:00 PM PT Friday (02/17), so we can pick it up for next week Beta release. Thank you.
,
Feb 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e5aaefbb2c1f86245d1130c7c645d193555f49f8 commit e5aaefbb2c1f86245d1130c7c645d193555f49f8 Author: Zhongyi Shi <zhongyi@chromium.org> Date: Fri Feb 17 20:51:42 2017 Add net-log entries when HttpStream jobs are forced to wait, and when they're delayed and resumed. BUG= 691688 Review-Url: https://codereview.chromium.org/2699433003 Cr-Commit-Position: refs/heads/master@{#450608} (cherry picked from commit d27cf4ed111ab7f3bdab607fe48e4446d6cf15ca) Review-Url: https://codereview.chromium.org/2699833007 . Cr-Commit-Position: refs/branch-heads/2987@{#579} Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943} [modify] https://crrev.com/e5aaefbb2c1f86245d1130c7c645d193555f49f8/net/http/http_stream_factory_impl_job.cc [modify] https://crrev.com/e5aaefbb2c1f86245d1130c7c645d193555f49f8/net/http/http_stream_factory_impl_job_controller.cc [modify] https://crrev.com/e5aaefbb2c1f86245d1130c7c645d193555f49f8/net/http/http_stream_factory_impl_job_controller.h [modify] https://crrev.com/e5aaefbb2c1f86245d1130c7c645d193555f49f8/net/log/net_log_event_type_list.h
,
Feb 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b0fbd102ec86c933c893ee3e7d45675f53e4ebe3 commit b0fbd102ec86c933c893ee3e7d45675f53e4ebe3 Author: zhongyi <zhongyi@chromium.org> Date: Fri Feb 17 22:05:04 2017 Revert of Add net-log entries when HttpStream jobs are forced to wait, and when they're delayed and resumed. (patchset #1 id:1 of https://codereview.chromium.org/2699833007/ ) Reason for revert: Revert this issue as it blocks the Beta build( dependent on https://chromium.googlesource.com/chromium/src/+/9255195315efa671012f0bd0da1afe685d70b2f6). Will fix the cl and re-merge. Original issue's description: > Add net-log entries when HttpStream jobs are forced to wait, and when they're delayed and resumed. > > BUG= 691688 > > Review-Url: https://codereview.chromium.org/2699433003 > Cr-Commit-Position: refs/heads/master@{#450608} > (cherry picked from commit d27cf4ed111ab7f3bdab607fe48e4446d6cf15ca) > > Review-Url: https://codereview.chromium.org/2699833007 . > Cr-Commit-Position: refs/branch-heads/2987@{#579} > Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943} > Committed: https://chromium.googlesource.com/chromium/src/+/e5aaefbb2c1f86245d1130c7c645d193555f49f8 TBR= # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG= 691688 Review-Url: https://codereview.chromium.org/2701053002 Cr-Commit-Position: refs/branch-heads/2987@{#582} Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943} [modify] https://crrev.com/b0fbd102ec86c933c893ee3e7d45675f53e4ebe3/net/http/http_stream_factory_impl_job.cc [modify] https://crrev.com/b0fbd102ec86c933c893ee3e7d45675f53e4ebe3/net/http/http_stream_factory_impl_job_controller.cc [modify] https://crrev.com/b0fbd102ec86c933c893ee3e7d45675f53e4ebe3/net/http/http_stream_factory_impl_job_controller.h [modify] https://crrev.com/b0fbd102ec86c933c893ee3e7d45675f53e4ebe3/net/log/net_log_event_type_list.h
,
Feb 17 2017
The commit in comment #21 reverts the merge as the CL which was trying to be merged on comment #20 has dependency uncaught during the merge. Revert the merge to unblock beta build. I will retry the merge with dependency resolved. In addition, per comment #12, all the three commits in comment #7, 9, 10 need to be merged.
,
Feb 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/71fe65787c774c74c90c397ac90293c75aad6a5f commit 71fe65787c774c74c90c397ac90293c75aad6a5f Author: Zhongyi Shi <zhongyi@chromium.org> Date: Fri Feb 17 22:56:09 2017 Resume the main job immediately when QUIC job fails, not wait for the head start timer to exprie BUG= 691688 Review-Url: https://codereview.chromium.org/2693043008 Cr-Commit-Position: refs/heads/master@{#450705} (cherry picked from commit ff2bfbfd76f3c43e5618addd34c82fed42d2a6f5) Review-Url: https://codereview.chromium.org/2701913003 . Cr-Commit-Position: refs/branch-heads/2987@{#583} Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943} [modify] https://crrev.com/71fe65787c774c74c90c397ac90293c75aad6a5f/net/http/http_stream_factory_impl_job_controller.cc [modify] https://crrev.com/71fe65787c774c74c90c397ac90293c75aad6a5f/net/http/http_stream_factory_impl_job_controller.h [modify] https://crrev.com/71fe65787c774c74c90c397ac90293c75aad6a5f/net/http/http_stream_factory_impl_job_controller_unittest.cc [modify] https://crrev.com/71fe65787c774c74c90c397ac90293c75aad6a5f/net/http/http_stream_factory_test_util.cc
,
Feb 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/75d8442e1d3f6d4358da176aa0092c03c76b975d commit 75d8442e1d3f6d4358da176aa0092c03c76b975d Author: zhongyi <zhongyi@chromium.org> Date: Fri Feb 17 23:52:20 2017 Revert of Resume the main job immediately when QUIC job fails, not wait for the head start timer to exprie (patchset #1 id:1 of https://codereview.chromium.org/2701913003/ ) Reason for revert: Unittests has dependency on cls which is not merged. Original issue's description: > Resume the main job immediately when QUIC job fails, not wait for the head start timer to exprie > > BUG= 691688 > > Review-Url: https://codereview.chromium.org/2693043008 > Cr-Commit-Position: refs/heads/master@{#450705} > (cherry picked from commit ff2bfbfd76f3c43e5618addd34c82fed42d2a6f5) > > Review-Url: https://codereview.chromium.org/2701913003 . > Cr-Commit-Position: refs/branch-heads/2987@{#583} > Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943} > Committed: https://chromium.googlesource.com/chromium/src/+/71fe65787c774c74c90c397ac90293c75aad6a5f TBR= # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG= 691688 Review-Url: https://codereview.chromium.org/2707583002 Cr-Commit-Position: refs/branch-heads/2987@{#584} Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943} [modify] https://crrev.com/75d8442e1d3f6d4358da176aa0092c03c76b975d/net/http/http_stream_factory_impl_job_controller.cc [modify] https://crrev.com/75d8442e1d3f6d4358da176aa0092c03c76b975d/net/http/http_stream_factory_impl_job_controller.h [modify] https://crrev.com/75d8442e1d3f6d4358da176aa0092c03c76b975d/net/http/http_stream_factory_impl_job_controller_unittest.cc [modify] https://crrev.com/75d8442e1d3f6d4358da176aa0092c03c76b975d/net/http/http_stream_factory_test_util.cc
,
Feb 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ecda574641950e9fa2902aeb4bf1aac97843ad2c commit ecda574641950e9fa2902aeb4bf1aac97843ad2c Author: Zhongyi Shi <zhongyi@chromium.org> Date: Sat Feb 18 00:05:28 2017 Reland the code with fixed depency in unittests. Resume the main job immediately when QUIC job fails, not wait for the head start timer to exprie BUG= 691688 Review-Url: https://codereview.chromium.org/2693043008 Cr-Commit-Position: refs/heads/master@{#450705} (cherry picked from commit ff2bfbfd76f3c43e5618addd34c82fed42d2a6f5) R=rch@chromium.org Review-Url: https://codereview.chromium.org/2705753002 . Cr-Commit-Position: refs/branch-heads/2987@{#585} Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943} [modify] https://crrev.com/ecda574641950e9fa2902aeb4bf1aac97843ad2c/net/http/http_stream_factory_impl_job_controller.cc [modify] https://crrev.com/ecda574641950e9fa2902aeb4bf1aac97843ad2c/net/http/http_stream_factory_impl_job_controller.h [modify] https://crrev.com/ecda574641950e9fa2902aeb4bf1aac97843ad2c/net/http/http_stream_factory_impl_job_controller_unittest.cc [modify] https://crrev.com/ecda574641950e9fa2902aeb4bf1aac97843ad2c/net/http/http_stream_factory_test_util.cc
,
Feb 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8b44f98e742ec88474cadfce805f44f59da49800 commit 8b44f98e742ec88474cadfce805f44f59da49800 Author: Zhongyi Shi <zhongyi@chromium.org> Date: Sat Feb 18 00:33:39 2017 Cap the delay time for the main job when racing with QUIC for delayed tcp case so that if http_server_properties has cached a inappropriate srtt will not block the main job for a extremely long time. BUG= 691688 Review-Url: https://codereview.chromium.org/2690393005 Cr-Commit-Position: refs/heads/master@{#451066} (cherry picked from commit 326a7fdb3a840dd926b4b4ff14ca7fe24d87ffc0) R=rch@chromium.org Review-Url: https://codereview.chromium.org/2703943002 . Cr-Commit-Position: refs/branch-heads/2987@{#586} Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943} [modify] https://crrev.com/8b44f98e742ec88474cadfce805f44f59da49800/net/http/http_stream_factory_impl_job_controller.cc [modify] https://crrev.com/8b44f98e742ec88474cadfce805f44f59da49800/net/http/http_stream_factory_impl_job_controller_unittest.cc
,
Feb 18 2017
We are dropping merge the commit in comment #7 as it has heavy dependency on an unmerged CL, which only adds more logging. The other two CLs has a slight dependency on an unmerged CL, I have fixed the conflicts, verified both should build and test successfully on a local release branch. Since it takes a long time to build, I will standby until the build succeeds in case anything fires. Sorry about the land-revert pattern with drover.
,
Mar 24 2017
|
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by rch@chromium.org
, Feb 13 2017