New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 691688 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug



Sign in to add a comment

QUIC not marked as broken if 0-RTT was attempted and cached RTT is large

Project Member Reported by rch@chromium.org, Feb 13 2017

Issue description

It appears that currently QUIC is only marked as broken by http_stream_factory_impl_job_controller.cc when the QUIC job fails. It is *not* marked as broken by quic_stream_factory.cc if the handshake is not confirmed. This seems like a significant regression.
 

Comment 1 by rch@chromium.org, Feb 13 2017

The change to not mark QUIC as broken when 0-RTT fails appears to have landed here:

https://codereview.chromium.org/305213002

This was changed to mark as "recently" broken which should have prevented requests from being attached to subsequent connections, but this appears to not be working.

Comment 2 by rch@chromium.org, Feb 13 2017

Hm. I can't seem to reproduce this. Repro steps:

Launch chrome and use host-resolver rules to map the server to a particular IP
Fetch a URL from the server
Observe Alt-Svc mapping, and QUIC_SESSION created.
use iptables to drop incoming UDP packets from that IP
Reload URL
Observe 0-RTT QUIC_SESSION created which has requests sent but receives no response
Observe request fail and is retried
Observe new 0-RTT QUIC_SESSION created in which handshake-confirmation is not required. When this session closes, Alt-Svc is marked as broken, as expected. :/

Comment 3 by yye...@google.com, Feb 14 2017

Cc: yyefet@chromium.org

Comment 4 by yye...@google.com, Feb 14 2017

Labels: Hotlist-Enterprise

Comment 5 by rch@chromium.org, Feb 14 2017

Cc: zhongyi@chromium.org
After much looking at the code and the net-logs, I think we understand why QUIC is not marked as broken in this case. To avoid false positives, QUIC is only marked broken when a QUIC job fails and a TCP job succeeds. In the net-log from the user, the TCP jobs do not actually succeed. Instead, the appear to be stuck for ~30 seconds until they are cancelled. So that explains why QUIC is not marked as broken.

However, it does not explain why the TCP jobs are hung. However, QUIC jobs are given a head-start on the TCP job by approximately 1.5 times the SRTT estimate for that server. Looking at the code, out best hypothesis is that the cache SRTT estimate for the server in question is larger than 30 seconds. This means that the TCP job has not had time to start before it is finally killed.

This suggests a couple of fixes:
1) We should cap the head start to some sane value (5 seconds?)
2) When a QUIC connection fails, the TCP job should resume immediately and not wait for the head start timer to expire.

Comment 6 by jri@chromium.org, Feb 14 2017

To #5: Great work figuring this out! On the head start question: 5 seconds seems way more leeway than we need. Most RTTs (I think > 90th percentile) fall within 750ms. Is 1 second adequate?

Comment 8 by rch@chromium.org, Feb 15 2017

AH-HAH! I remembered that Chrome sends the SRTT estimate in the IRTT field of the CHLO which means I can go look at the net-internals to see if the trace in fact has a crazy high SRTT estimate. Indeed it does!

IRTT: 1088931519

That value is in microseconds, which corresponds to 1088 seconds. This completely explains the observed behavior! The proposed fixes in comment 5 will definitely fix the problem.

Looking at other QUIC sessions in that log, I see some other sessions with crazy SRTT estimate (and some with reasonable SRTT estimates).
Project Member

Comment 10 by bugdroid1@chromium.org, Feb 16 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/326a7fdb3a840dd926b4b4ff14ca7fe24d87ffc0

commit 326a7fdb3a840dd926b4b4ff14ca7fe24d87ffc0
Author: zhongyi <zhongyi@chromium.org>
Date: Thu Feb 16 19:54:24 2017

Cap the delay time for the main job when racing with QUIC for delayed
tcp case so that if http_server_properties has cached a inappropriate
srtt will not block the main job for a extremely long time.

BUG= 691688 

Review-Url: https://codereview.chromium.org/2690393005
Cr-Commit-Position: refs/heads/master@{#451066}

[modify] https://crrev.com/326a7fdb3a840dd926b4b4ff14ca7fe24d87ffc0/net/http/http_stream_factory_impl_job_controller.cc
[modify] https://crrev.com/326a7fdb3a840dd926b4b4ff14ca7fe24d87ffc0/net/http/http_stream_factory_impl_job_controller_unittest.cc

The code landed in comment#9 and #10 should fix the issue. Do we need to merge the fix in comment 9?

Comment 12 by rch@chromium.org, Feb 16 2017

It's probably easiest to merge all 3 to make the patches apply cleanly. Can you request a merge?
Labels: MERGE
Sure. Shall we merge this to M57, which is in Beta currently and will go stable in a month?
Labels: Merge-Request-57
Please tag with affected OSs.  Thanks.
Labels: OS-All

Comment 17 by rch@chromium.org, Feb 17 2017

Summary: QUIC not marked as broken if 0-RTT was attempted and cached RTT is large (was: QUIC not marked as broken if 0-RTT was attempted)
Project Member

Comment 18 by sheriffbot@chromium.org, Feb 17 2017

Labels: -Merge-Request-57 Hotlist-Merge-Approved Merge-Approved-57
Your change meets the bar and is auto-approved for M57. Please go ahead and merge the CL to branch 2987 manually. Please contact milestone owner if you have questions.
Owners: amineer@(clank), cmasso@(bling), ketakid@(cros), govind@(desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Please merge your change to M57 branch 2987 before 5:00 PM PT Friday (02/17), so we can pick it up for next week Beta release. Thank you.
Project Member

Comment 20 by bugdroid1@chromium.org, Feb 17 2017

Labels: -merge-approved-57 merge-merged-2987
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e5aaefbb2c1f86245d1130c7c645d193555f49f8

commit e5aaefbb2c1f86245d1130c7c645d193555f49f8
Author: Zhongyi Shi <zhongyi@chromium.org>
Date: Fri Feb 17 20:51:42 2017

Add net-log entries when HttpStream jobs are forced to wait, and when they're delayed and resumed.

BUG= 691688 

Review-Url: https://codereview.chromium.org/2699433003
Cr-Commit-Position: refs/heads/master@{#450608}
(cherry picked from commit d27cf4ed111ab7f3bdab607fe48e4446d6cf15ca)

Review-Url: https://codereview.chromium.org/2699833007 .
Cr-Commit-Position: refs/branch-heads/2987@{#579}
Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943}

[modify] https://crrev.com/e5aaefbb2c1f86245d1130c7c645d193555f49f8/net/http/http_stream_factory_impl_job.cc
[modify] https://crrev.com/e5aaefbb2c1f86245d1130c7c645d193555f49f8/net/http/http_stream_factory_impl_job_controller.cc
[modify] https://crrev.com/e5aaefbb2c1f86245d1130c7c645d193555f49f8/net/http/http_stream_factory_impl_job_controller.h
[modify] https://crrev.com/e5aaefbb2c1f86245d1130c7c645d193555f49f8/net/log/net_log_event_type_list.h

Project Member

Comment 21 by bugdroid1@chromium.org, Feb 17 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b0fbd102ec86c933c893ee3e7d45675f53e4ebe3

commit b0fbd102ec86c933c893ee3e7d45675f53e4ebe3
Author: zhongyi <zhongyi@chromium.org>
Date: Fri Feb 17 22:05:04 2017

Revert of Add net-log entries when HttpStream jobs are forced to wait, and when they're delayed and resumed. (patchset #1 id:1 of https://codereview.chromium.org/2699833007/ )

Reason for revert:
Revert this issue as it blocks the Beta build( dependent on https://chromium.googlesource.com/chromium/src/+/9255195315efa671012f0bd0da1afe685d70b2f6). Will fix the cl and re-merge.

Original issue's description:
> Add net-log entries when HttpStream jobs are forced to wait, and when they're delayed and resumed.
>
> BUG= 691688 
>
> Review-Url: https://codereview.chromium.org/2699433003
> Cr-Commit-Position: refs/heads/master@{#450608}
> (cherry picked from commit d27cf4ed111ab7f3bdab607fe48e4446d6cf15ca)
>
> Review-Url: https://codereview.chromium.org/2699833007 .
> Cr-Commit-Position: refs/branch-heads/2987@{#579}
> Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943}
> Committed: https://chromium.googlesource.com/chromium/src/+/e5aaefbb2c1f86245d1130c7c645d193555f49f8

TBR=
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG= 691688 

Review-Url: https://codereview.chromium.org/2701053002
Cr-Commit-Position: refs/branch-heads/2987@{#582}
Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943}

[modify] https://crrev.com/b0fbd102ec86c933c893ee3e7d45675f53e4ebe3/net/http/http_stream_factory_impl_job.cc
[modify] https://crrev.com/b0fbd102ec86c933c893ee3e7d45675f53e4ebe3/net/http/http_stream_factory_impl_job_controller.cc
[modify] https://crrev.com/b0fbd102ec86c933c893ee3e7d45675f53e4ebe3/net/http/http_stream_factory_impl_job_controller.h
[modify] https://crrev.com/b0fbd102ec86c933c893ee3e7d45675f53e4ebe3/net/log/net_log_event_type_list.h

The commit in comment #21 reverts the merge as the CL which was trying to be merged on comment #20 has dependency uncaught during the merge. Revert the merge to unblock beta build. 

I will retry the merge with dependency resolved. In addition, per comment #12, all the three commits in comment #7, 9, 10 need to be merged.
Project Member

Comment 24 by bugdroid1@chromium.org, Feb 17 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/75d8442e1d3f6d4358da176aa0092c03c76b975d

commit 75d8442e1d3f6d4358da176aa0092c03c76b975d
Author: zhongyi <zhongyi@chromium.org>
Date: Fri Feb 17 23:52:20 2017

Revert of Resume the main job immediately when QUIC job fails, not wait for the head start timer to exprie (patchset #1 id:1 of https://codereview.chromium.org/2701913003/ )

Reason for revert:
Unittests has dependency on cls which is not merged.

Original issue's description:
> Resume the main job immediately when QUIC job fails, not wait for the head start timer to exprie
>
> BUG= 691688 
>
> Review-Url: https://codereview.chromium.org/2693043008
> Cr-Commit-Position: refs/heads/master@{#450705}
> (cherry picked from commit ff2bfbfd76f3c43e5618addd34c82fed42d2a6f5)
>
> Review-Url: https://codereview.chromium.org/2701913003 .
> Cr-Commit-Position: refs/branch-heads/2987@{#583}
> Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943}
> Committed: https://chromium.googlesource.com/chromium/src/+/71fe65787c774c74c90c397ac90293c75aad6a5f

TBR=
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG= 691688 

Review-Url: https://codereview.chromium.org/2707583002
Cr-Commit-Position: refs/branch-heads/2987@{#584}
Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943}

[modify] https://crrev.com/75d8442e1d3f6d4358da176aa0092c03c76b975d/net/http/http_stream_factory_impl_job_controller.cc
[modify] https://crrev.com/75d8442e1d3f6d4358da176aa0092c03c76b975d/net/http/http_stream_factory_impl_job_controller.h
[modify] https://crrev.com/75d8442e1d3f6d4358da176aa0092c03c76b975d/net/http/http_stream_factory_impl_job_controller_unittest.cc
[modify] https://crrev.com/75d8442e1d3f6d4358da176aa0092c03c76b975d/net/http/http_stream_factory_test_util.cc

Project Member

Comment 25 by bugdroid1@chromium.org, Feb 18 2017

Project Member

Comment 26 by bugdroid1@chromium.org, Feb 18 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8b44f98e742ec88474cadfce805f44f59da49800

commit 8b44f98e742ec88474cadfce805f44f59da49800
Author: Zhongyi Shi <zhongyi@chromium.org>
Date: Sat Feb 18 00:33:39 2017

Cap the delay time for the main job when racing with QUIC for delayed tcp case so that if http_server_properties has cached a inappropriate srtt will not block the main job for a extremely long time.

BUG= 691688 

Review-Url: https://codereview.chromium.org/2690393005
Cr-Commit-Position: refs/heads/master@{#451066}
(cherry picked from commit 326a7fdb3a840dd926b4b4ff14ca7fe24d87ffc0)

R=rch@chromium.org

Review-Url: https://codereview.chromium.org/2703943002 .
Cr-Commit-Position: refs/branch-heads/2987@{#586}
Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943}

[modify] https://crrev.com/8b44f98e742ec88474cadfce805f44f59da49800/net/http/http_stream_factory_impl_job_controller.cc
[modify] https://crrev.com/8b44f98e742ec88474cadfce805f44f59da49800/net/http/http_stream_factory_impl_job_controller_unittest.cc

We are dropping merge the commit in comment #7 as it has heavy dependency on an unmerged CL, which only adds more logging. 

The other two CLs has a slight dependency on an unmerged CL, I have fixed the conflicts, verified both should build and test successfully on a local release branch. Since it takes a long time to build, I will standby until the build succeeds in case anything fires. Sorry about the land-revert pattern with drover.

Comment 28 by rch@chromium.org, Mar 24 2017

Status: Fixed (was: Started)

Sign in to add a comment