Very large default timeout in DnsConfigService
Project Member Reported by email@example.com, Apr 20 2012
In absence of explicit timeout: option in /etc/resolv.conf, res_ninit initializes timeout to **30 seconds**. This is unreasonably large and will lead to bad performance. I think we should cap the timeout to 5 seconds.
Apr 20 2012,
Having seen many network activities take longer than 5 seconds, I'm curious to better understand what we will do when we "time out?" Reading about other systems, some speculatively assume that the network configuration is similar to, or identically to what was seen previously or recently. If we want to talk about improving performance, it is not as clear that changing the timeout for this class of setup is as significant as some speculation and proactive use of some guesses in races against various initializations. Bottom line: Please add comments about what will happen when the timeout expires, and how this will lead to a better user experience. Thanks!
Apr 20 2012,
When we time out, we try again with doubled timeout. If this times out, give up (assuming default "attempts:2") and call getaddrinfo. Obviously with initial timeout=30s, this is a miserable situation. (Note this is on Mac only.) Although we are currently focusing on correctness rather than performance of [net/dns], I think 30s latency after packet loss is unreasonable. The long term plan is to tune the timeout to the observed response time.
Apr 21 2012,
On windows, the lack of a UDP response to a DNS query "times out" in 1 second, instigating a retry (when we use getaddrinfo()). I *think* that Mac retries at least that fast. When you "time out," is this intended to be during use of the same resolver as getaddreinfo() would be using?? I think we should understand what getaddrinfo() on Mac is currently doing, and try to be eat least as aggressive. FWIW: *IF* this is all about the time-out of single UDP query (as I mentioned above), then 5 seconds is way too slow. With 1-2% packet loss, this 5 seconds would add a 50-100ms expected delay.... which is not good. Again, I'm not completely clear about which time-out this is talking about, and I'd welcome clarification in this bug. If, for instance, we're not using the default external resolver, I'd be very interested in racing resolutions (ours vs getaddrinfo()) much much much sooner than 5 seconds or even 1 second.
Apr 21 2012,
To make this clear: this bug applies to DnsConfigService, a service which reads the system configuration so that our DNS resolver knows nameserver addresses, suffix search list and other parameters configured by the user or administrator. I have no doubt that Mac's getaddrinfo will retry faster than 30s. I believe that the reason res_ninit (which we use to obtain the resolver configuration) indicates 30s timeout is caused by some bug in Mac's version of libresolv (30s is coincidentally the maximum allowed value). My proposal is to conservatively cap the timeout at 5 seconds (default timeout on Linux and BSD) so that people using --enable-async-dns are not unnecessarily penalized in case of packet loss. I suspect 1 second would be fine as well, although I'm concerned it would introduce non-negligible probability of spurious timeout on slow links. Again, there are plans to tune the timeout to the smallest practical value, using observed response time as feedback. However, we are currently focusing on correctness, i.e., making sure that for vast majority of queries we do not need to fallback to getaddrinfo, and that we can identify queries that will most likely fail over DNS. To further clarify, we don't currently race our DNS with getaddrinfo in order to keep the number of outstanding DNS queries at the typical level. We really want to avoid upsetting middleboxes at this time.
Apr 22 2012,
SGTM... this should be a one-time configuration (or at least once per network change??)... so I guess the 5 seconds max is not too painful. IMO, you should land a histogram soon (even if we're not tuning) to see how many users see large values. Thanks for the clarification!
Aug 15 2012,
The following revision refers to this bug: http://src.chromium.org/viewvc/chrome?view=rev&revision=151626 ------------------------------------------------------------------------ r151626 | firstname.lastname@example.org | 2012-08-15T01:26:45.094652Z Changed paths: M http://src.chromium.org/viewvc/chrome/trunk/src/net/dns/dns_config_service.cc?r1=151626&r2=151625&pathrev=151626 M http://src.chromium.org/viewvc/chrome/trunk/src/net/dns/dns_config_service.h?r1=151626&r2=151625&pathrev=151626 M http://src.chromium.org/viewvc/chrome/trunk/src/net/dns/dns_config_service_posix.cc?r1=151626&r2=151625&pathrev=151626 [net/dns] Hardcode DnsConfig.timeout to 1 second. Currently, it is read from OS (if available) or set to 5 seconds (on Windows). In some cases the OS provides ridiculous values (30s on Mac). This change hard-codes the timeout to 1 second, which is the default used by getaddrinfo on Windows and Mac. BUG= 124437 Review URL: https://chromiumcodereview.appspot.com/10826212 ------------------------------------------------------------------------
Mar 10 2013,
Apr 29 2013,
Sign in to add a comment