New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 872011 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Oct 24
Cc:
Components:
EstimatedDays: ----
NextAction: 2018-08-16
OS: Android
Pri: 2
Type: Bug



Sign in to add a comment

Migrate after successful probing may cause a connection being closed with INTERNAL_ERROR

Project Member Reported by zhongyi@chromium.org, Aug 7

Issue description

Recent data collected from cronet applications indicates that there's higher rate of INTERNAL_ERROR in experiment turning on connection migration on path degrading. 

This is caused by QuicChromiumClientSession::OnProbeNetworkSucceeded which is called on the stack of QuicConnection::ProcessUdpPacket() when processing the probing response packet on the new path. QCCS::OnProbeNetworkSucceeded will call QCCS::MigrateToSocket, which eventually calls QuicChromiumPacketReader::StartReading. If there are synchronous reads, this will then run into a corner case: the outmost flusher is living on the stack of ProcessUdpPacket for the initial probing response packet, and there are new packets being process before the stack returns. If an ACK is queued, by some of the synchronous read, and a subsequent synchronous read will blow up in QuicConnection::OnUnauthenticatedHeader()[1].

So the stack looks like follows:
QuicChromiumPacketReader::ProcessReadResult() for packet 1: probing response in probing reader
  QuicConnection::ProcessUdpPacket() for packet 1
    ScopedPacketFlusher flusher <------ outmost flusher
    QuicFramer::ProcessPacket for packet 1
      QuicChromiumClientSession::OnConnectivityProbeReceived
        ...
        QuicChromiumClientSession::OnProbeNetworkSucceeded
          QuicChromiumClientSession::MigrateToSocket()
            QuicChromiumPacketReader::StartReading return sync'ly
            QuicChromiumPacketReader::ProcessReadResult for packet 2
                ...
                ScopedPacketFlusher flusher <--- inner flusher, no flush
                .... cause ACK frame queued
            QuicChromiumPacketReader::ProcessReadResult retunr for packet 2
            QuicChromiumPacketReader::StartReading return sync'ly
            QuicChromiumPacketReader::ProcessReadResult for packet 3
              ...
              QuicConnection::OnUnauthenticatedHeader blows up for packet 3







< return Quic

[1]https://cs.chromium.org/chromium/src/net/third_party/quic/core/quic_connection.cc?gsn=ProcessUnauthenticatedHeader&g=0&l=684
 
The reason this is triggered on probing is the outmost flusher must live on the stack while processing new packets. Without probing logic, a packet is always read after the stack to process the previous packet returns. 

To trigger this bug, the client has sent a number of connectivity probes before receiving the first probing response. And server is sending multiple connectivity probe reponses in a row, which arrive on the client at the same time. 
Project Member

Comment 2 by bugdroid1@chromium.org, Aug 10

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d3d5f5023f75dfda721f62fb40e5ee80e3ce16e3

commit d3d5f5023f75dfda721f62fb40e5ee80e3ce16e3
Author: Zhongyi Shi <zhongyi@chromium.org>
Date: Fri Aug 10 00:22:22 2018

Do not force a started reader to start reading when migrate to a probing socket.

A QuicChromiumPacketReader::StartReading() should only be called once.
It will read again automatically when the previous read result has been
processed. Calling StartReading on a started reader may break the
invariant and cause bugs like  http://crbug.com/872011 

Bug:  872011 
Change-Id: I6dde0a3daacb9ed9b6d7d4781de3c132e0a6aad7
Reviewed-on: https://chromium-review.googlesource.com/1166340
Commit-Queue: Zhongyi Shi <zhongyi@chromium.org>
Reviewed-by: Ryan Hamilton <rch@chromium.org>
Cr-Commit-Position: refs/heads/master@{#581977}
[modify] https://crrev.com/d3d5f5023f75dfda721f62fb40e5ee80e3ce16e3/net/quic/quic_chromium_client_session.cc
[modify] https://crrev.com/d3d5f5023f75dfda721f62fb40e5ee80e3ce16e3/net/quic/quic_chromium_client_session.h
[modify] https://crrev.com/d3d5f5023f75dfda721f62fb40e5ee80e3ce16e3/net/quic/quic_chromium_client_session_test.cc
[modify] https://crrev.com/d3d5f5023f75dfda721f62fb40e5ee80e3ce16e3/net/quic/quic_stream_factory_test.cc

Cc: mef@chromium.org
NextAction: 2018-08-16
Status: Fixed (was: Assigned)
Change on Comment 2 landed in 70.0.3518.0. The fix will be picked up by Cronet release. 
The NextAction date has arrived: 2018-08-16
Should we merge back to M69?
Good point! Cronet releases really frequently, but not the gmscore? 
Labels: Merge-Request-69
Status: Assigned (was: Fixed)
Project Member

Comment 8 by sheriffbot@chromium.org, Aug 17

Labels: -Merge-Request-69 Merge-Review-69 Hotlist-Merge-Review
This bug requires manual review: M69 has already been promoted to the beta branch, so this requires manual review
Please contact the milestone owner if you have questions.
Owners: amineer@(Android), kariahda@(iOS), cindyb@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Pls apply appropriate OSs label.
And also provide merge justification, why it is needed and how safe it is to merge to M69 this late in release cycle.
Labels: OS-Android
The code technically only affects Android as Android is the only platform supports connection migration. And this change only potentially affects Cronet users who are enrolled in the experiment. 

This change addresses a corner case and should not cause new crash as QUIC connection will close the connection if there're errors during process read result. We'd like to request a fix to be merged to M69 as we expect this feature will be released to GMSCore users where gmscore release cycle is relatively long. 
Labels: -Hotlist-Merge-Review -Merge-Review-69 Merge-Approved-69
Approving for merge to M69, branch 3497.
Project Member

Comment 12 by sheriffbot@chromium.org, Aug 21

Cc: benmason@chromium.org
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible!

If all merges have been completed, please remove any remaining Merge-Approved labels from this issue.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 13 by bugdroid1@chromium.org, Aug 21

Labels: -merge-approved-69 merge-merged-3497
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/527f9e15f6942c29270888f5cf415911917c6df2

commit 527f9e15f6942c29270888f5cf415911917c6df2
Author: Zhongyi Shi <zhongyi@chromium.org>
Date: Tue Aug 21 23:53:50 2018

Do not force a started reader to start reading when migrate to a probing socket.

A QuicChromiumPacketReader::StartReading() should only be called once.
It will read again automatically when the previous read result has been
processed. Calling StartReading on a started reader may break the
invariant and cause bugs like  http://crbug.com/872011 

(cherry picked from commit d3d5f5023f75dfda721f62fb40e5ee80e3ce16e3)

Bug:  872011 
Change-Id: I6dde0a3daacb9ed9b6d7d4781de3c132e0a6aad7
Reviewed-on: https://chromium-review.googlesource.com/1166340
Commit-Queue: Zhongyi Shi <zhongyi@chromium.org>
Reviewed-by: Ryan Hamilton <rch@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#581977}
Reviewed-on: https://chromium-review.googlesource.com/1184305
Reviewed-by: Zhongyi Shi <zhongyi@chromium.org>
Cr-Commit-Position: refs/branch-heads/3497@{#758}
Cr-Branched-From: 271eaf50594eb818c9295dc78d364aea18c82ea8-refs/heads/master@{#576753}
[modify] https://crrev.com/527f9e15f6942c29270888f5cf415911917c6df2/net/quic/chromium/quic_chromium_client_session.cc
[modify] https://crrev.com/527f9e15f6942c29270888f5cf415911917c6df2/net/quic/chromium/quic_chromium_client_session.h
[modify] https://crrev.com/527f9e15f6942c29270888f5cf415911917c6df2/net/quic/chromium/quic_chromium_client_session_test.cc
[modify] https://crrev.com/527f9e15f6942c29270888f5cf415911917c6df2/net/quic/chromium/quic_stream_factory_test.cc

Status: Fixed (was: Assigned)

Sign in to add a comment