New issue
Advanced search Search tips

Issue 124437 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner: ----
Closed: Apr 2013
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 2
Type: Bug

Sign in to add a comment

Very large default timeout in DnsConfigService

Project Member Reported by, Apr 20 2012

Issue description

In absence of explicit timeout: option in /etc/resolv.conf, res_ninit initializes timeout to **30 seconds**. This is unreasonably large and will lead to bad performance.

I think we should cap the timeout to 5 seconds.


Comment 1 by, Apr 20 2012

Having seen many network activities take longer than 5 seconds, I'm curious to better understand what we will do when we "time out?"

Reading about other systems, some speculatively assume that the network configuration is similar to, or identically to what was seen previously or recently.  If we want to talk about improving performance, it is not as clear that changing the timeout for this class of setup is as significant as some speculation and proactive use of some guesses in races against various initializations.

Bottom line: Please add comments about what will happen when the timeout expires, and how this will lead to a better user experience.  Thanks!

Comment 2 by, Apr 20 2012

When we time out, we try again with doubled timeout. If this times out, give up (assuming default "attempts:2") and call getaddrinfo. Obviously with initial timeout=30s, this is a miserable situation. (Note this is on Mac only.)

Although we are currently focusing on correctness rather than performance of [net/dns], I think 30s latency after packet loss is unreasonable. The long term plan is to tune the timeout to the observed response time.

Comment 3 by, Apr 21 2012

On windows, the lack of a UDP response to a DNS query "times out" in 1 second, instigating a retry (when we use getaddrinfo()). I *think* that Mac retries at least that fast.

When you "time out," is this intended to be during use of the same resolver as getaddreinfo() would be using??

I think we should understand what getaddrinfo() on Mac is currently doing, and try to be eat least as aggressive.

FWIW: *IF* this is all about the time-out of single UDP query (as I mentioned above),  then 5 seconds is way too slow.  With 1-2% packet loss, this 5 seconds would add a 50-100ms expected delay.... which is not good.

Again, I'm not completely clear about which time-out this is talking about, and I'd welcome clarification in this bug.

If, for instance, we're not using the default external resolver, I'd be very interested in racing resolutions (ours vs getaddrinfo()) much much much sooner than 5 seconds or even 1 second.

Comment 4 by, Apr 21 2012

To make this clear: this bug applies to DnsConfigService, a service which reads the system configuration so that our DNS resolver knows nameserver addresses, suffix search list and other parameters configured by the user or administrator.

I have no doubt that Mac's getaddrinfo will retry faster than 30s. I believe that the reason res_ninit (which we use to obtain the resolver configuration) indicates 30s timeout is caused by some bug in Mac's version of libresolv (30s is coincidentally the maximum allowed value). My proposal is to conservatively cap the timeout at 5 seconds (default timeout on Linux and BSD) so that people using --enable-async-dns are not unnecessarily penalized in case of packet loss. I suspect 1 second would be fine as well, although I'm concerned it would introduce non-negligible probability of spurious timeout on slow links.

Again, there are plans to tune the timeout to the smallest practical value, using observed response time as feedback. However, we are currently focusing on correctness, i.e., making sure that for vast majority of queries we do not need to fallback to getaddrinfo, and that we can identify queries that will most likely fail over DNS.

To further clarify, we don't currently race our DNS with getaddrinfo in order to keep the number of outstanding DNS queries at the typical level. We really want to avoid upsetting middleboxes at this time.

Comment 5 by, Apr 22 2012

SGTM... this should be a one-time configuration (or at least once per network change??)... so I guess the 5 seconds max is not too painful.  IMO, you should land a histogram soon (even if we're not tuning) to see how many users see large values.

Thanks for the clarification!
Project Member

Comment 6 by, Aug 15 2012

The following revision refers to this bug:

r151626 | | 2012-08-15T01:26:45.094652Z

Changed paths:

[net/dns] Hardcode DnsConfig.timeout to 1 second.

Currently, it is read from OS (if available) or set to 5 seconds (on Windows). In some cases the OS provides ridiculous values (30s on Mac). 

This change hard-codes the timeout to 1 second, which is the default used by getaddrinfo on Windows and Mac.

BUG= 124437 

Review URL:
Project Member

Comment 7 by, Mar 10 2013

Labels: -Area-Internals -Internals-Network-DNS Cr-Internals Cr-Internals-Network-DNS

Comment 8 by, Apr 29 2013

Status: Fixed

Sign in to add a comment