selecttest tool

Walter Gould gouldwp at auburn.edu
Thu Aug 14 15:46:18 UTC 2008


JINMEI Tatuya / ???? wrote:
> I don't know the answer to this question, but your operational
> environment seems to be extraordinary in some points:
>
> - it's acting both as an authoritative and as a caching server
> - as an authoritative server, it's managing a pretty large number of
>   zones (which may require resource-consuming operations such as zone
>   transfers)
> - as a caching server, it seems to be handling a high volume of
>   queries (several thousands concurrent clients)
>
> While we've worked hard on P2 to make it as scalable as possible while
> keeping it as conservative as possible, this environment may just
> exceed the ability of the conservative implementation.
>
> I know operators don't like a radical solution, but I'd really like
> you to give beta version a try.  At least the next beta versions
> (which will hopefully be released later this week or early next week)
> should be much stable than the currently available ones, and should
> not be as "radical" as you might think.
>
> ---
> JINMEI, Tatuya
> Internet Systems Consortium, Inc.
>
>   

I have found my problem. Your above statement "it seems to be handling a 
high volume of queries (several thousands concurrent clients)" was right 
on target. I decided to look more closely at the traffic that was 
hitting our server (I know our number of recursive clients didn't use to 
be in the thousands).

Using dnstop (a pretty useful tool) and tcpdump, we found that 4 spam 
filtering servers we have on campus were performing many, many thousands 
of recursive lookups against our primary DNS server. While this was 
happening during the peak hours (9am to 3pm) our DNS server couldn't 
keep up with the recursive requests. Unintentionally, it was being DoS'd.

Once we notified the admin's who maintains these spam filtering servers 
that they were overloading our server, they changed their servers to 
distribute DNS resolution across two or three other campus DNS servers 
as well as the primary server that I admin. Since they have done that, 
performance on our primary server has been much better and the number of 
recursive clients has been in the 60-100 range.

I have to believe that they changed their DNS settings to point 
primarily to our server about the same time that the Kaminsky 
vulnerability was released. I know before that time frame, we never had 
an issue with high numbers of recursive clients.

Thank you Jinmei and the others on the BIND mailing list for your help 
in trying to diagnose and solve my problem. I am sorry to have bothered 
you all when it was really a "me" problem. ISC - you guys rock. Keep up 
the great work!!

Walter

-- 
Walter P. Gould
Info. Tech. Specialist
Office of Information Technology
Auburn University, AL




More information about the bind-users mailing list