BIND 9.x caching Performance under heavy loads

Thu Mar 10 09:34:43 UTC 2005

On Mon, 7 Mar 2005 roy.mongiovi at bellsouth.com wrote:

> We've got a number of caching-only nameservers, some running redhat AS
> 2.1 and some 3.0.  The AS 2.1 servers are bind 9.2.1 (standard redhat),
> although we've tried 9.2.3, and 9.2.2p3.  The AS 3.0 servers are
> running bind 9.2.4, also standard redhat.  We're in the process of
> upgrading everything to 3.0 in an attempt to have a better platform for
> dealing with this problem.
> 
> We also see the slowdown after 24 hours operation.  We're currently
> keeping things under control with a nightly restart of bind, but
> obviously that's not an ideal situation.
> 
> We don't restrict cache size.  It's a pretty vanilla caching-only
> nameserver as far as bind goes.  We've got four dual processor IBM
> x335s with 2.8 gigahertz processors and 8 gig of ram.  A load balancer
> distributes incoming requests to 4 of those servers.  Each of those
> servers forwards requests to two back-end servers that get to the
> internet around the load balancer.  It's built that way because of the
> load balancer, but that's another story.
> 
> After about 24 hours of operation, the CPU busy starts to gradually
> climb on the four front end servers.  At the same time, the back end
> servers slow down in processing incoming queries.  Although their CPU
> busy doesn't climb, the UDP receive queue for bind fills up.  For some
> reason, it just doesn't process requests as fast as it previously had.
> 
> External load is about the same each day, and restarting bind clears up
> the problem so I think it has to be some sort of bug in bind.  The
> really interesting part is that these servers went into operation in
> April 2004, and we didn't see any problems until the end of October.
> Since October, however, we've had this problem and it seems to be
> escalating.
> 
> I'm trying the "cleaning-interval 0;" fix now.  We'll know tomorrow if
> that helps.  I'm also going to be building a non-thread, non-ipv6
> version to see if that can handle our load and if it fixes the problem.

If you're only using ipv4 you might consider passing the "-4" option to 
named, it saves times upon resolving. In case of a dual cpu "-n 2" 
helps (allthough bind should be able to determine the amount of 
cpu's, but i also specify the amount of cpu's manually in the startup 
arguments). Also, normally 
performance of bind, mem usage is also a littlebit determined by your os.
For example, in FreeBSD 5.3 and higher and NetBSD 2.0 and higher bind is 
compiled against libpthread, allowing the kernel to schedule bind 
bussiness and deal with it nicely etc. So it's not just a bind issue i'd 
say. Allthough, i must say i am running bind 9.3.0, maybe it resolved some 
issues in previous version concerning the probs you describe?
Bye,

Mipam.