Brief flashes of ERR_QUIC_PROTOCOL_ERROR after resuming from suspend and transitioning networks |
|||||||||||||||
Issue descriptionI recently observed this over the past week, but failed to have the foresight to capture net-internals log. I'll try to reproduce over this coming weekend, but after discussing with rch@, jri@m and zhongyi@, filing the bug now. While traveling, I was frequently transitioning wireless networks (hotel, airport, conference, tethering), and frequently sleeping/resuming (close the laptop / open the laptop). During this time, I had GMail, Google Calendar, and Chromium Codereview open as tabs. Whenever I'd resume (which coincided with a network transition), I'd briefly see a net-error page showing ERR_QUIC_PROTOCOL_ERROR. The time this page spent on screen was often under a second - and a reload of the page quickly caused it to disappear and the expected content to load. From discussing with QUIC folks, we were wondering if QUIC is perhaps triggering different logic than HttpNetworkTransactions' logic for handling when the underlying StreamSocket was a keep-alive socket to be reused.
,
Mar 11 2016
48.0.2564.116. Apparently I like old software and I cannot lie.
,
Mar 11 2016
Thanks for reporting this, Ryan. Looking forward to seeing the net-internals when you have them. I'm marking this as Needs-Feedback for now.
,
Mar 24 2016
Ping! Have you seen this again Ryan?
,
Mar 28 2016
Sorry, was in the process of replacing my MacBook. I'll be doing some more extensive traveling over the next few weeks, and I will make sure to grab net-logs as I do the wifi transition dance again.
,
Mar 29 2016
Thank you for providing more feedback. Adding requester "jri@chromium.org" for another review and adding "Needs-Review" label for tracking. For more details visit https://sites.google.com/a/chromium.org/dev/issue-tracking/autotriage - Your friendly Sheriffbot
,
Mar 29 2016
Resetting to myself ;)
,
Apr 15 2016
Still seeing this in stable. Judging by my chrome://histograms, I've seen it 3 times in this browsing instance alone, in the main frame, and several hundreds for subresources. The unfortunate irony is that each time it's happened, it's been right after I've closed chrome://net-internals because it was getting too sluggish in Chrome (and reset didn't help). It seems consistent if I allow the laptop to go into a sleep state, and then resume, it'll have that hiccup. This is independent of any Wifi network migration, so it should be reproducible in the office. I'll keep trying to repro, but I believe it should have ample opportunity for others to try and repro and see. Having gmail open is the reliable source of this.
,
Apr 16 2016
Thank you for providing more feedback. Adding requester "rsleevi@chromium.org" for another review and adding "Needs-Review" label for tracking. For more details visit https://sites.google.com/a/chromium.org/dev/issue-tracking/autotriage - Your friendly Sheriffbot
,
Apr 20 2016
The flicker is very likely autoreload which has an unfortunate tendency to mask transient errors. Looking at UMA, QUIC is a major source of autoreloads that aren't obviously network-related. We probably should have Finch watching these histograms... https://uma.googleplex.com/p/chrome/histograms/?endDate=04-17-2016&dayCount=1&histograms=Net.AutoReload.ErrorAtSuccess&fixupData=true&showMax=true&filters=channel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
,
Apr 21 2016
,
Jun 23 2016
rsleevi: Are you still seeing this? Adding Restrict-View-Google (Which I assume is what tbansal tried to do)
,
Jun 23 2016
Yes, I still see this every day, and yet never can capture it with net-internals. I deleted tbansal's comment because I don't believe we should Restrict-View-Google this bug.
,
Jun 24 2016
Thank you for providing more feedback. Adding requester "rsleevi@chromium.org" for another review and adding "Needs-Review" label for tracking. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jun 28 2016
,
Sep 28 2016
Missed the nag bit without the needs-feedback :) Will try to get logs again :)
,
Oct 11 2016
Ping? rsleevi@, are we waiting on something from you?
,
Nov 2 2016
I have a 7.5mb netlog demonstrating this on 54.0.2840.71. Who wants to investigate? :)
,
Nov 10 2016
Thank you for providing more feedback. Adding "Needs-Review" label for tracking. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 15 2016
Ah-hah! Looking at the net-internals shared offline, I can see what's happening.
After waking from suspend (I assume), we try to send a request which fails with QUIC_PROTOCOL_ERROR:
722216: URL_REQUEST
t=77472 [st=1] -HTTP_STREAM_REQUEST
t=77472 [st=1] HTTP_STREAM_REQUEST_BOUND_TO_QUIC_SESSION
--> source_dependency = 721390 (QUIC_SESSION)
t=77472 [st=1] +HTTP_TRANSACTION_SEND_REQUEST [dt=1]
t=77472 [st=1] HTTP_TRANSACTION_QUIC_SEND_REQUEST_HEADERS
...
t=77473 [st=2] -HTTP_TRANSACTION_SEND_REQUEST
--> net_error = -356 (ERR_QUIC_PROTOCOL_ERROR)
Looking at the QUIC session, we see that it's failing with a write error:
721390: QUIC_SESSION
...
t= 77473 [st= 39506] QUIC_CHROMIUM_CLIENT_STREAM_SEND_REQUEST_HEADERS
--> :authority: mail.google.com
:method: GET
:path: /mail/u/0/
:scheme: https
...
t= 77473 [st= 39506] QUIC_SESSION_STREAM_FRAME_SENT
--> fin = false
--> length = 895
--> offset = "8144"
--> stream_id = 3
t= 77473 [st= 39506] QUIC_SESSION_CLOSED
--> from_peer = false
--> quic_error = 27 (QUIC_PACKET_WRITE_ERROR)
t= 77473 [st= 39506] QUIC_SESSION_CLOSED
--> from_peer = false
--> quic_error = 27 (QUIC_PACKET_WRITE_ERROR)
Drilling down to the socket, we can see why:
t=1069457 [st=1031490]
721389: UDP_SOCKET
...
t=77473 [st=39507] UDP_SEND_ERROR
--> net_error = -109 (ERR_ADDRESS_UNREACHABLE)
t=77473 [st=39507] -SOCKET_ALIVE
So the root of the problem is that this request is being made before the network is ready, apparently. But that's not terribly satisfying.
I *think* that if QUIC were not being used, then TCP sockets would be closed on suspend and we'd try to create a new connection which would also fail. I wonder what's different.
QUIC isn't currently wired up to close connections on suspend, but we could consider doing this. (We almost did this to deal with an Android problem, but this signal does not exist on Android). I'll cook up a CL to do this. What do you think, sleevi?
,
Dec 15 2016
rch: That may paper over the symptom, but I'm curious from our past discussions (and in the original description), if it's still not an issue similar to: "we were wondering if QUIC is perhaps triggering different logic than HttpNetworkTransactions' logic for handling when the underlying StreamSocket was a keep-alive socket to be reused." That is, why is the new request (Event 721390) not treating the QUIC_SESSION_CLOSED 'similar to' when a new HTTP/1 socket is pulled from the pool, attempted to be used, but for various reasons (e.g. timeout, network issue, etc) is no longer viable. In the H/1 case, a new socket is grabbed and the request attempted. The obvious con to "always treat the 2nd and subsequent stream on an H/2 & QUIC 'as if' they were reused connections from a pool" would be that it would potentially cause a lot more request reissuance, but it would do so in a non-user-observable way, right? Closing based on suspect seems like it would work, but be pretty heavy-handed - it's worst behaviour is when everything is fine. The above proposal (making H/2&QUIC like H/1) seem like they would only exhibit the worst behaviour when things were REALLY wrong.
,
Dec 15 2016
Yeah, I thought about that. So I looked at HandleIOError in HttpNetworkTransaction and it does not appear to handle ERR_ADDRESS_UNREACHABLE: https://cs.chromium.org/chromium/src/net/http/http_network_transaction.cc?rcl=1481818704&l=1504 I might be misreading, but it seemed like ERR_ADDRESS_UNREACHABLE would not be handled by this code and would bubble up the stack. Am I looking in the wrong place?
,
Dec 15 2016
No, you're reading correctly, but I think it's because the UDP vs TCP errors will surface differently. That is, a reused TCP socket (even on resume) "shouldn't" result in ERR_ADDRESS_UNREACHABLE if it had been disconnected (it'd manifest as ERR_SOCKET_NOT_CONNECTED or _RESET/_CLOSED/_ABORTED), but the QUIC code, being UDP based, would. That's why I think it manifests when QUIC is enabled - even under H/2-over-TLS, it manifests as _RESET/_CLOSED/_ABORTED - but QUIC, being UDP, wouldn't, and so manifests more fully up the stack.
,
Dec 16 2016
sleevi: Looking at the QUIC_SESSION event in net_internals, I see an apparent 17s pause before the failed request is sent. Presumably, much more than 17s actually elapsed.
t= 60060 [st= 22093] QUIC_SESSION_STOP_WAITING_FRAME_RECEIVED
--> sent_info = {"least_unacked":"68"}
t= 77473 [st= 39506] QUIC_CHROMIUM_CLIENT_STREAM_SEND_REQUEST_HEADERS
If so, I think this will be fixed (worked around) by https://codereview.chromium.org/2487973003, which is in M56, I think. I'd be curious if this happens for you with that version.
,
Dec 27 2016
,
Mar 9 2017
Network triager here, friendly ping. Stable channel has been at M56 for some time. Comment #25 suggests the issue may be now fixed.
,
Mar 9 2017
I believe this should be fixed. sleevi: can you confirm?
,
Mar 10 2017
@rch: Do you recall a CL that fixed it? I can't remember having seen it recently though :)
,
Mar 10 2017
Thank you for providing more feedback. Adding requester "rch@chromium.org" to the cc list and removing "Needs-Feedback" label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Mar 10 2017
Apparently I believed this was fixed (worked around) by https://codereview.chromium.org/2487973003, according to comment #25. Since you haven't seen it recently, and I think it's fixed, I'm going to go ahead and mark this as fixed. (Yay!) But definitely feel free to reopen if you see this happen again. |
|||||||||||||||
►
Sign in to add a comment |
|||||||||||||||
Comment 1 by cbentzel@chromium.org
, Mar 11 2016