Remote hosts retrying downed DNS server

Mon Nov 5 01:23:34 UTC 2001

> 
> Chuck,
> As far as I can ascertain, in some scenarios a resolving BIND server will
> always retry a server which is down first.  I haven't seen this myself, but
> the algorithm used in BIND 8 and earlier looks suspect to me.  I haven't
> looked at BIND 9 code but I suspect it's similar.  This BTW is just theory,
> although I do a lot of work with non-existent servers! which are therefore
> always down, I haven't ever seen this but then I also haven't looked for
> it.  When BIND first picks up a set of NS records, it seems to allocate a
> random RTT to each, between 1 and 25-ish millisecs.  If a server is down
> when queried then the stored RTT is multiplied by 1.2 using integer
> arithmetic.

	Well it is 1.2x +  0.7 * measured rtrip time of the server
	that answered.

	start 0, rtrip 0, result 0
	start 1, rtrip 0, result 1
	start 1, rtrip 1, result 1
	start 1, rtrip 2, result 1
	start 2, rtrip 0, result 2
	start 2, rtrip 1, result 2
	start 3, rtrip 0, result 3
	start 3, rtrip 1, result 3
	start 4, rtrip 0, result 4

	Which means only the above conditions fail to penalise a
	failing server.  All the servers have to be close for this
	to be a problem.

	Note 1.2 and 0.7 do not have an exact binary representations so
	result may vary across compilers and compiler options.

	Note however this will not prevent the other servers being
	tried.

	Mark

>  With my compiler, this means the that if the original random
> RTT is 5 or less (20% chance), then that servers RTT will never be
> incremented, usually ensuring that it always has the lowest RTT of the
> servers for the domain.  I have seen mention of some subtleties concerning
> the way the servers are selected, I understand that the calculated RTTs are
> then put into RTT bands so that in some cases different RTTs are considered
> equidistant but I haven't looked at that bit.
> Since this is only my evaluation of the situation, I'd be very interested
> in comments from ISC or Nominum.
> rgds
> Marc TXK
> 
> 
> 
>                                                                             
>                                          
>                     Chuck Sterling                                          
>                                          
>                     <csterlin at ziane        To:     comp-protocols-dns-bind at m
> oderators.isc.org                        
>                     t.com>                 cc:                              
>                                          
>                     Sent by:               Subject:     Remote hosts retryin
> g downed DNS server                      
>                     bind-users-boun                                         
>                                          
>                     ce at isc.org                                              
>                                          
>                                                                             
>                                          
>                                                                             
>                                          
>                     31/10/2001                                              
>                                          
>                     02:49                                                   
>                                          
>                                                                             
>                                          
>                                                                             
>                                          
> 
> 
> 
> 
> 
> When one of my three DNS servers at work goes down, there is a small
> percentage of external hosts that retry that address over and over, to
> the exclusion of the other two servers that are still up. Most external
> hosts will try once and then move on to a working server, not retrying
> the down one. All three are listed in the "whois" record and all three
> handle reverse DNS lookups.
> 
> I'm interested in understanding what causes these retries. At least some
> of them must be DNS servers, judging by the ns1... and ns2... style
> domain names their addresses resolve to. And at least two are in what I
> would have to think of as "trusted" domains. I could understand a client
> workstation having only one domain name server in its search list, but
> doubt that external hosts would be using our DNS machines rather than
> their own, or that these external hosts are simple clients. Last time
> this happened, we were having trouble sending and receiving e-mail to at
> least two external sites, to the extent that some mail was not delivered
> at all and others were delayed several hours (where we expect delays of
> only a minute or so). This sorta pissed off a few users that missed
> important calls...
> 
> I pulled the plug on one machine this morning and captured about 1/2
> hour of snoop output at the victimlan side or our firewall to get a
> small pile of addresses for hosts hitting our DNS machines, and some
> were doing the retry bit, as expected.
> 
> Now that I have a list of suspects, is there a program, or some command
> in dig, etc., that I can run to retrieve DNS server info, such as what
> version of BIND or whatever substitutes for it, and so forth, from these
> remote machines? Seems like I've read about this sort of thing but have
> not had occasion to pay attention before... I'm wondering if a
> particular DNS implementation is at fault. I'd wonder if it was a
> problem on my end if not for there being no excessive retries from most
> of the remote hosts; I'm guessing they are working right and the others
> are munged.
> 
> Thanks,
> Chuck Sterling
> csterlin at zianet.com
> 
> 
> 
> 
> 
--
Mark Andrews, Internet Software Consortium
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: Mark.Andrews at isc.org