Improve QUIC Delay TCP logic to account for DNS resolution |
|||||||
Issue descriptionDNS resolution can take a long time. QUIC's "Delay TCP logic" delays a TCP job using past SRTT estimates, which do not account for DNS resolution time. This leads to spurious TCP/TLS handshakes as seen in this NetLog (https://drive.google.com/a/google.com/file/d/0B_9WseIqRoaFZzFQSC1MZGpmb1k/view?usp=sharing) Since TCP and QUIC jobs have their own DNS resolution paths, I wonder whether we could unifying those two. If that's not feasible, we can look into only resuming TCP job after QUIC job has finished DNS resolution.
,
Sep 1 2017
Nice find! We (well, you, iirc) unified the proxy resolution logic into the JobController so that neither job starts until proxy resolution is complete. It seems plausible to me that we could do something similar here for host resolution. I think plumbing the host resolution results down into the socket pool layer might be tricky, but plumbing it into QUIC should be easy. So we could possibly resolve it there, and then start the jobs. Alternatively, the transport connect job has a host_resolution_callback() that it invokes (on the Job, iirc) when it resolves the host. This callback currently checks to see if there is a SpdySession which has been created since the Transport job was created. We could extend/change this callback to also check QuicSessions, though I've not thought too deeply about how challenging this might be to implement. We could probably plumb a similar callback down the the QuicStreamFactory Job to call back up when the host is resolved, and start/un-pause the TCP job then. (The Job would probably ping the N Requests, and each Request could ping the associated Job) Of course, the simplest thing would be to increase the delay to compensate for the DNS resolution time. Though the attached net-internals the DNS resolution takes 6 seconds which is probably larger than any value we'd be likely to pick.
,
Sep 1 2017
Thanks, Ryan. I am also in favor of separating out DNS resolution of TCP ConnectJob. This will help us to address Issue 761487 (where the TCP ConnectJob's 250ms timeout doesn't account for DNS resolution). Since we are on the same page, I will mark this as triaged and assign myself. I will see if we can do approach #1 without requiring too much modification on socket pools.
,
Sep 26 2017
I started a discussion thread on net-dev@: https://groups.google.com/a/chromium.org/d/msg/net-dev/PwMhnUDK8CE/qBh-FfA0BgAJ
,
Nov 10 2017
xunjieli: Can you summarize the conclusion of that thread here on the bug? Do you think it's in a state where there is a straightforward implementation?
,
Nov 10 2017
Thanks, Ryan. Yes, in the discussion thread, we arrived at a more-or-less straightforward implementation. To summarize, making QUIC's "delay TCP" logic aware of DNS is worth investigating. The underlying problem is that DNS is slow, which is a problem that the network stack team is currently addressing (e.g. through DNS over HTTP, AsyncDNS). So this bug is of lower priority. To investigate further, the approach is have a signaling interface for Quic to unpause TCP job if DNS is taking a long time (This is also the idea proposed by rch@ in #2). Gather UMA data to see how much a difference this improvement makes, and whether to roll out this new behavior by default. (Un-assigning myself. I didn't find time to work on this)
,
Nov 13 2017
wangix: this would be great to get working
,
Nov 13 2017
,
Dec 15 2017
,
Dec 21 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7c5d11a8ac216b6a139914029edf0a8b338b65d8 commit 7c5d11a8ac216b6a139914029edf0a8b338b65d8 Author: Yixin Wang <wangyix@chromium.org> Date: Thu Dec 21 19:50:54 2017 Improve QUIC/TCP race logic to not start TCP job until after QUIC host resolution Currently, HttpStreamFactoryImpl::JobController races a QUIC and a TCP job. Modify HttpStreamFactoryImpl::JobController to only starts the TCP job after the QUIC job finishes DNS resolution. If the QUIC job doesn't succeed/fail immediately, then HttpStreamFactoryImpl::JobController will start the TCP job after a set delay.However, if the QUIC job is taking too long because DNS resolution is slow, then it makes little sense to start the TCP job since it too has to do DNS resolution. Now, JobController will wait for the QUIC job to finish DNS resolution before starting the delay timer, after which the TCP job will start. Bug: 761483 Change-Id: I61cc77073e7ed8f9261ac7fed0f5e0510bbb4aad Reviewed-on: https://chromium-review.googlesource.com/821851 Reviewed-by: Ryan Hamilton <rch@chromium.org> Commit-Queue: Yixin Wang <wangyix@chromium.org> Cr-Commit-Position: refs/heads/master@{#525777} [modify] https://crrev.com/7c5d11a8ac216b6a139914029edf0a8b338b65d8/net/http/http_stream_factory_impl_job.cc [modify] https://crrev.com/7c5d11a8ac216b6a139914029edf0a8b338b65d8/net/http/http_stream_factory_impl_job.h [modify] https://crrev.com/7c5d11a8ac216b6a139914029edf0a8b338b65d8/net/http/http_stream_factory_impl_job_controller_unittest.cc [modify] https://crrev.com/7c5d11a8ac216b6a139914029edf0a8b338b65d8/net/quic/chromium/quic_network_transaction_unittest.cc [modify] https://crrev.com/7c5d11a8ac216b6a139914029edf0a8b338b65d8/net/quic/chromium/quic_stream_factory.cc [modify] https://crrev.com/7c5d11a8ac216b6a139914029edf0a8b338b65d8/net/quic/chromium/quic_stream_factory.h [modify] https://crrev.com/7c5d11a8ac216b6a139914029edf0a8b338b65d8/net/quic/chromium/quic_stream_factory_test.cc
,
Dec 21 2017
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by xunji...@chromium.org
, Sep 1 2017