New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 594226 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 2
Type: Bug



Sign in to add a comment

Brief flashes of ERR_QUIC_PROTOCOL_ERROR after resuming from suspend and transitioning networks

Project Member Reported by rsleevi@chromium.org, Mar 11 2016

Issue description

I recently observed this over the past week, but failed to have the foresight to capture net-internals log. I'll try to reproduce over this coming weekend, but after discussing with rch@, jri@m and zhongyi@, filing the bug now.

While traveling, I was frequently transitioning wireless networks (hotel, airport, conference, tethering), and frequently sleeping/resuming (close the laptop / open the laptop). During this time, I had GMail, Google Calendar, and Chromium Codereview open as tabs.

Whenever I'd resume (which coincided with a network transition), I'd briefly see a net-error page showing ERR_QUIC_PROTOCOL_ERROR. The time this page spent on screen was often under a second - and a reload of the page quickly caused it to disappear and the expected content to load.

From discussing with QUIC folks, we were wondering if QUIC is perhaps triggering different logic than HttpNetworkTransactions' logic for handling when the underlying StreamSocket was a keep-alive socket to be reused.
 
Which Chrome version was this seen on?
48.0.2564.116. Apparently I like old software and I cannot lie.

Comment 3 by jri@chromium.org, Mar 11 2016

Labels: Needs-Feedback
Thanks for reporting this, Ryan. Looking forward to seeing the net-internals when you have them. I'm marking this as Needs-Feedback for now.
Ping! Have you seen this again Ryan?
Sorry, was in the process of replacing my MacBook. I'll be doing some more extensive traveling over the next few weeks, and I will make sure to grab net-logs as I do the wifi transition dance again.
Project Member

Comment 6 by sheriffbot@chromium.org, Mar 29 2016

Labels: -Needs-Feedback Needs-Review
Owner: jri@chromium.org
Thank you for providing more feedback. Adding requester "jri@chromium.org" for another review and adding "Needs-Review" label for tracking.

For more details visit https://sites.google.com/a/chromium.org/dev/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Needs-Review Needs-Feedback
Owner: ----
Resetting to myself ;)
Still seeing this in stable. Judging by my chrome://histograms, I've seen it 3 times in this browsing instance alone, in the main frame, and several hundreds for subresources.

The unfortunate irony is that each time it's happened, it's been right after I've closed chrome://net-internals because it was getting too sluggish in Chrome (and reset didn't help). It seems consistent if I allow the laptop to go into a sleep state, and then resume, it'll have that hiccup. This is independent of any Wifi network migration, so it should be reproducible in the office.

I'll keep trying to repro, but I believe it should have ample opportunity for others to try and repro and see. Having gmail open is the reliable source of this.
Project Member

Comment 9 by sheriffbot@chromium.org, Apr 16 2016

Labels: -Needs-Feedback Needs-Review
Owner: rsleevi@chromium.org
Thank you for providing more feedback. Adding requester "rsleevi@chromium.org" for another review and adding "Needs-Review" label for tracking.

For more details visit https://sites.google.com/a/chromium.org/dev/issue-tracking/autotriage - Your friendly Sheriffbot
The flicker is very likely autoreload which has an unfortunate tendency to mask transient errors.

Looking at UMA, QUIC is a major source of autoreloads that aren't obviously network-related. We probably should have Finch watching these histograms...

https://uma.googleplex.com/p/chrome/histograms/?endDate=04-17-2016&dayCount=1&histograms=Net.AutoReload.ErrorAtSuccess&fixupData=true&showMax=true&filters=channel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
Labels: -Needs-Review Needs-Feedback
Owner: ----

Comment 12 Deleted

Labels: -Res Restrict-View-Google
rsleevi:  Are you still seeing this?

Adding Restrict-View-Google (Which I assume is what tbansal tried to do)
Labels: -Restrict-View-Google
Yes, I still see this every day, and yet never can capture it with net-internals.

I deleted tbansal's comment because I don't believe we should Restrict-View-Google this bug.
Project Member

Comment 15 by sheriffbot@chromium.org, Jun 24 2016

Labels: -Needs-Feedback Needs-Review
Owner: rsleevi@chromium.org
Thank you for providing more feedback. Adding requester "rsleevi@chromium.org" for another review and adding "Needs-Review" label for tracking.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 16 by rch@chromium.org, Jun 28 2016

Labels: -Needs-Review
Owner: rch@chromium.org
Labels: Needs-Feedback
Missed the nag bit without the needs-feedback :) Will try to get logs again :)
Ping?  rsleevi@, are we waiting on something from you?

I have a 7.5mb netlog demonstrating this on 54.0.2840.71. Who wants to investigate? :)
Project Member

Comment 20 by sheriffbot@chromium.org, Nov 10 2016

Labels: -Needs-Feedback Needs-Review
Thank you for providing more feedback. Adding "Needs-Review" label for tracking.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 21 by rch@chromium.org, Dec 15 2016

Ah-hah! Looking at the net-internals shared offline, I can see what's happening.

After waking from suspend (I assume), we try to send a request which fails with QUIC_PROTOCOL_ERROR:

722216: URL_REQUEST
t=77472 [st=1]     -HTTP_STREAM_REQUEST
t=77472 [st=1]      HTTP_STREAM_REQUEST_BOUND_TO_QUIC_SESSION
                    --> source_dependency = 721390 (QUIC_SESSION)
t=77472 [st=1]     +HTTP_TRANSACTION_SEND_REQUEST  [dt=1]
t=77472 [st=1]        HTTP_TRANSACTION_QUIC_SEND_REQUEST_HEADERS
...
t=77473 [st=2]     -HTTP_TRANSACTION_SEND_REQUEST
                    --> net_error = -356 (ERR_QUIC_PROTOCOL_ERROR)

Looking at the QUIC session, we see that it's failing with a write error:

721390: QUIC_SESSION
...
t=  77473 [st=  39506]    QUIC_CHROMIUM_CLIENT_STREAM_SEND_REQUEST_HEADERS
                          --> :authority: mail.google.com
                              :method: GET
                              :path: /mail/u/0/
                              :scheme: https
                              ...
t=  77473 [st=  39506]    QUIC_SESSION_STREAM_FRAME_SENT
                          --> fin = false
                          --> length = 895
                          --> offset = "8144"
                          --> stream_id = 3
t=  77473 [st=  39506]    QUIC_SESSION_CLOSED
                          --> from_peer = false
                          --> quic_error = 27 (QUIC_PACKET_WRITE_ERROR)
t=  77473 [st=  39506]    QUIC_SESSION_CLOSED
                          --> from_peer = false
                          --> quic_error = 27 (QUIC_PACKET_WRITE_ERROR)

Drilling down to the socket, we can see why:

t=1069457 [st=1031490] 
721389: UDP_SOCKET
...
t=77473 [st=39507]    UDP_SEND_ERROR
                      --> net_error = -109 (ERR_ADDRESS_UNREACHABLE)
t=77473 [st=39507] -SOCKET_ALIVE

So the root of the problem is that this request is being made before the network is ready, apparently. But that's not terribly satisfying.

I *think* that if QUIC were not being used, then TCP sockets would be closed on suspend and we'd try to create a new connection which would also fail. I wonder what's different.

QUIC isn't currently wired up to close connections on suspend, but we could consider doing this. (We almost did this to deal with an Android problem, but this signal does not exist on Android). I'll cook up a CL to do this. What do you think, sleevi?
rch: That may paper over the symptom, but I'm curious from our past discussions (and in the original description), if it's still not an issue similar to:

"we were wondering if QUIC is perhaps triggering different logic than HttpNetworkTransactions' logic for handling when the underlying StreamSocket was a keep-alive socket to be reused."

That is, why is the new request (Event 721390) not treating the QUIC_SESSION_CLOSED 'similar to' when a new HTTP/1 socket is pulled from the pool, attempted to be used, but for various reasons (e.g. timeout, network issue, etc) is no longer viable. In the H/1 case, a new socket is grabbed and the request attempted.

The obvious con to "always treat the 2nd and subsequent stream on an H/2 & QUIC 'as if' they were reused connections from a pool" would be that it would potentially cause a lot more request reissuance, but it would do so in a non-user-observable way, right?

Closing based on suspect seems like it would work, but be pretty heavy-handed - it's worst behaviour is when everything is fine. The above proposal (making H/2&QUIC like H/1) seem like they would only exhibit the worst behaviour when things were REALLY wrong.

Comment 23 by rch@chromium.org, Dec 15 2016

Yeah, I thought about that. So I looked at HandleIOError in HttpNetworkTransaction and it does not appear to handle ERR_ADDRESS_UNREACHABLE:

https://cs.chromium.org/chromium/src/net/http/http_network_transaction.cc?rcl=1481818704&l=1504

I might be misreading, but it seemed like ERR_ADDRESS_UNREACHABLE would not be handled by this code and would bubble up the stack. Am I looking in the wrong place?
No, you're reading correctly, but I think it's because the UDP vs TCP errors will surface differently. That is, a reused TCP socket (even on resume) "shouldn't" result in ERR_ADDRESS_UNREACHABLE if it had been disconnected (it'd manifest as ERR_SOCKET_NOT_CONNECTED or _RESET/_CLOSED/_ABORTED), but the QUIC code, being UDP based, would.

That's why I think it manifests when QUIC is enabled - even under H/2-over-TLS, it manifests as _RESET/_CLOSED/_ABORTED - but QUIC, being UDP, wouldn't, and so manifests more fully up the stack.

Comment 25 by rch@chromium.org, Dec 16 2016

sleevi: Looking at the QUIC_SESSION event in net_internals, I see an apparent 17s pause before the failed request is sent. Presumably, much more than 17s actually elapsed.

t=  60060 [st=  22093]    QUIC_SESSION_STOP_WAITING_FRAME_RECEIVED
                          --> sent_info = {"least_unacked":"68"}
t=  77473 [st=  39506]    QUIC_CHROMIUM_CLIENT_STREAM_SEND_REQUEST_HEADERS

If so, I think this will be fixed (worked around) by https://codereview.chromium.org/2487973003, which is in M56, I think. I'd be curious if this happens for you with that version.

Comment 26 by rch@chromium.org, Dec 27 2016

Labels: -Needs-Review Needs-Feedback
Network triager here, friendly ping.   Stable channel has been at M56 for some time.   Comment #25 suggests the issue may be now fixed.

Comment 28 by rch@chromium.org, Mar 9 2017

I believe this should be fixed. sleevi: can you confirm?
@rch: Do you recall a CL that fixed it? I can't remember having seen it recently though :)
Project Member

Comment 30 by sheriffbot@chromium.org, Mar 10 2017

Labels: -Needs-Feedback
Thank you for providing more feedback. Adding requester "rch@chromium.org" to the cc list and removing "Needs-Feedback" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 31 by rch@chromium.org, Mar 10 2017

Status: Fixed (was: Unconfirmed)
Apparently I believed this was fixed (worked around) by https://codereview.chromium.org/2487973003, according to comment #25.

Since you haven't seen it recently, and I think it's fixed, I'm going to go ahead and mark this as fixed. (Yay!) But definitely feel free to reopen if you see this happen again.

Sign in to add a comment