anycasting, DNS client retry/failover

Fri Mar 6 22:44:01 UTC 2009

Hi Gordon,

I am running several Bind 9.4.x nameservers inside and outside.

Inside I can see my clients, diverse Linuxes, query ns1 and when there
does not come an answer within a second, they query ns2 from
/etc/resolv.conf.

So ns2 will ask the same request ns1 did - but one second later and
to another random nameserver or maybe the same one.

I quess queries seen outside will be

sec 00 ns1-1
sec 01 ns1-2  ns2-1
sec 02 ms1-3  ns2-2

if ns1-2 or ns2-1 do answer, then my client will not ask ns1-1 again.

Nevertheless suppose ns2-1 does answer but ns1-2 does not then
ns1 will continue with ns1-3 and only if they all are dead it will
come back to ns1-1.

Most likely ns1-1 only will see a second query if the query was for
a non existing domain.

You have lost one query but after a random time there will come
another query and ns1-1 maybe same as ns2-3 will get another chance.

/etc/hosts is not random. ns1 always gets the first query and ns2
only get queries that were not answered within a second.

So if my internal ns1 and ns2 were anycasted that would make a big
diff.

Outside the diff does not look so big as dns does already randomise.

Kind regards
Peter

Gordon A. Lang wrote:
> I have just implemented DNS anycasting on our inside network using Cisco
> content switches to monitor the health of the servers and to advertise
> an OSPF route when the back-end services are alive.  I have three CSS's
> simultaneously advertising the same service address to the network, and
> clients get routed to the nearest one.  It works great.
> 
> Anyone else try this?
> 
> When I was testing, I sent 2000 queries per second from two sources
> simultaneously on diverse parts of the network, and proceeded to start
> disconnecting and reconnecting cables on the content switches to see how
> well it all worked.  No matter what I did, I could not seem to lose more
> than 10 packets per link-state-change (which is very good in my mind). 
> But when I stopped the services on the actual servers, it took up to 5
> seconds before the content switch registered the fault (because the
> keepalives are currently configured for every 5 seconds), and I lost
> thousands of queries in those few seconds.
> 
> I am considering reducing the keepalive period to improve this fault
> response, but I'd like to get a better understanding of the DNS client
> behavior when it's queries go unanswered.
> 
>>> From what I recall, the typical DNS client will send a single query
>>> packet 
> to its first-configured dns resolver and wait 1 second for a response. 
> If no response comes, the DNS client sends a second query to the same
> dns resolver and waits either 1 second or 2 seconds, depending on if the
> client is progressive or not, for a response.  If still no response
> comes, most DNS clients will ask the same dns resolver one last time,
> and wait either 1 more second or 4 seconds, depending on the client. 
> And perhaps some non-progressive DNS clients try a fourth time.  If
> still no response comes, then the DNS client starts from the beginning
> with the second-configured DNS resolver.
> 
> If this is true, then I would think a keepalive period of 3 seconds
> ought to divert queries away from dead servers fast enough to satisfy
> the vast majority of DNS client requests before failing over to the
> second-configured dns resolver.
> 
> Any comments?
> 
> And despite what I have read about DNS clients over the years, what I
> have experienced in real life has left me uncertain about what really
> happens. Typically, prior to this anycast deployment, when our
> first-configured dns resolver went down, users complained about waiting
> 60 to 90 seconds before their web pages would come up.  That does not
> make sense to me because I thought the second-configured resolver would
> be used within a few seconds.
> 
> Can any suggest why real life doesn't reflect what is written?
> 
> Thanks.
> 
> -- 
> Gordon A. Lang
> 
> _______________________________________________
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users

-- 
Peter and Karin Dambier
Cesidian Root - Radice Cesidiana
Rimbacher Strasse 16
D-69509 Moerlenbach-Bonsweiher
+49(6209)795-816 (Telekom)
+49(6252)750-308 (VoIP: sipgate.de)
mail: peter at peter-dambier.de
http://www.peter-dambier.de/
http://iason.site.voila.fr/
https://sourceforge.net/projects/iason/
ULA= fd80:4ce1:c66a::/48