DNS Query Behavior with Global Forwarders Statement

Wed Aug 13 03:42:09 UTC 2008

On 12 Aug 2008, at 19:46:37, Kevin Darcy wrote:

> Merton Campbell Crockett wrote:
>> My corporate network consists of roughly 100 different sites located
>> throughout North America.  At each site there is a Network Management
>> System (NMS) running ISC BIND and DHCP.  Each NMS is the master name
>> server for the forward and reverse DNS zones assigned to the site.
>>
>> No NMS has direct access to the Internet and forwards all DNS queries
>> to a regional name server that has access to the Internet.  The
>> forwarders are defined as follows.
>>
>> 	options {
>> 		...
>> 		forward only;
>> 		forwarders { 10.73.2.6; 10.10.2.6; 10.35.2.6; };
>> 		...
>> 	};
>>
>> The order in which the forwarders changes depending upon the region  
>> in
>> which the site is located.
>>
>> I was asked to look at a problem involving name resolution at several
>> sites.  I had expected to see all DNS queries being forwarded to the
>> "closest" regional name server.  What I found using tcpdump was that
>> all name servers in the list were being used in a round-robin  
>> fashion,
>> i.e. I would see a group of queries sent to the first name server,  
>> the
>> second name server was used for the next group, the third was used  
>> for
>> the next group before the cycle restarted.
>>
>> Is this an artifact of the -P2 changes or was the use of RTT dropped
>> for some other reason?
>>
>>
> My understanding is that the RTT-based forwarder selection is  
> "banded",
> so that if a bunch of forwarders' RTTs all fall within the same "band"
> they'll be used either randomly, or in a strict round-robin fashion.
>
> Is the latency of the network in question sufficiently high that a
> "close" regional forwarder might end up being "banded" with forwarders
> that are physically much further away?

One NMS is co-located with the regional name server.  Average RTT  
between the systems is .75 ms. The RTT to the other regional name  
servers is 70 ms and 110 ms.  Although one would expect the NMS to  
send all queries to the co-located server, it only uses it 1/3 of the  
time.

At a site with no co-located regional name server, the average RTT to  
the regional name server are 14 ms, 75 ms, and 110 ms.  Again, one  
would expect the one only 14 ms away to be favored, but it is only  
selected 1/3 of the time.

I just checked the above times.  During the business day, some of the  
times will be in the 200 ms range due to congestion.

Can the "banding" be adjusted in some way?

> I would further speculate that the "clumping" you're seeing (a bunch  
> of
> queries to one forwarder, followed by a bunch of queries to the next
> forwarder in the list, etc.), might be the result of multiple worker
> threads following the same round-robin sequence. But that's pure
> speculation on my part; I haven't looked at the code to confirm this.
> For all I know, you're running on uniprocessor boxes, or didn't even
> compile with threads enabled...

I'm not sure about threading on the NMS.  IT configured the systems.

Merton Campbell Crockett
m.c.crockett at roadrunner.com