Bind9 Crazy-high CPU on Linux

Mark Andrews Mark_Andrews at isc.org
Fri Jan 19 01:58:28 UTC 2007


> Thank you for all the references and help.
> 
> I upgraded to 9.4rc1 with the following results:
> 
> Massive jump in memory usage (about double).  The named process now shows a
> memory footprint of close to 900MB where before 500MB would kill it.
> 
> CPU stays between 20-25% and spikes to 30-35% during a cleaning interval
> which lasts only a minute or two.
> 
> Previously with 9.3.2, rndc status showed upwards of 2,000 or more recursive
> clients.  Now it shows only less than 500 at any given time and the output
> format has changed:
> 
> recursive clients: 463/3900/4000
> 
> The server has been up a little over 36 hours.
> 
> I also noted three new items in the named.stats file.  "Duplicate" and
> "dropped" are new values.  Does anyone know how to fit them in to the
> greater scheme?  For example recursion can be subtracted from a combined
> total of success, referral, nxrrset, nxdomain and failure to generate a
> percentage.  Where do the new values fit?
> 
> success 30428464
> referral 2099872
> nxrrset 6270659
> nxdomain 16121686
> recursion 29924813
> failure 8892309
> duplicate 1101621

	Duplicate queries are ones where a indentical query was
	recieved (source address and port, qname, qtype, qclass)
	while a existing query was being resolved.

> dropped 159814

	Excessive recursive queries for <qname, qtype, qclass> other
	than duplicate queries.  Excessive is self adjusting within
	10:100 queries.  Successful resolution of a query for which
	there were drops raises the number of simultanious recursive
	clients for a given <qname, qtype, qclass> tuple (provided
	another susccesful query has not already raised the threshhold).
	This is then decaded with a timer.

	From the above figures, it looks like your client base is
	making lots of queries which fail to resolve.
 
> -Matt
>  
> 
> > -----Original Message-----
> > From: bind-users-bounce at isc.org 
> > [mailto:bind-users-bounce at isc.org] On Behalf Of Stefan Puiu
> > Sent: Tuesday, January 16, 2007 8:13 AM
> > To: Schlosser, Matt D.
> > Cc: bind-users at isc.org
> > Subject: Re: Bind9 Crazy-high CPU on Linux
> > 
> > Hi,
> > 
> > On 1/15/07, Schlosser, Matt D. <mschlosser at eschelon.com> wrote:
> > > The machines run between 800 and 1,000 queries/second for both
> > > authoritative and recursive zones.  After 12-24 hours, the CPU will
> > > spike to 100% and sit there while the machine times out any more
> > > queries.  The only resolution is to restart bind.
> > 
> > I haven't personally experienced this, but I've seen it reported quite
> > a few times on this list. IIRC, it's been reported that the cache
> > cleaning can be quite heavy sometimes, so you might want to adjust the
> > cleaning interval.
> > 
> > Also, recompiling BIND with internal malloc support was reported to
> > help (this requires editing a header file IIRC). That part seems to be
> > detailed here:
> > 
> > http://groups.google.com/group/comp.protocols.dns.bind/browse_
> > thread/thread/c830e65e2247c630/bfe3178894e98351?lnk=gst&q=jinm
> > ei+internal+malloc&rnum=1#bfe3178894e98351
> > 
> > No idea why running on Windows would make a difference.
> > 
> > Look in the archives, I believe it's been quite well covered.
> > 
> > Stefan.
> > 
> > 
> 
> 
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: Mark_Andrews at isc.org



More information about the bind-users mailing list