bind-9.3.2 / CPU issue.

Jan Gyselinck bind-users at b0rken.net
Mon Aug 21 10:02:51 UTC 2006


A bit off-topic for this ml, but you might want to try PowerDNS recursor 
(www.powerdns.com).  New on the scene, Open Source and everything, but it
doesn't fail when too much ServFail's happen, nor does it slow down with a 
big cache size.  Depending on what queries you get you'll see a big to a 
huge performance improvement in what your servers will be able to handle.  
It's running in prod over here since more than a month (one box doing
500 q/s, the other doing 800 q/s in peaktimes on quite dated hw), and it's
going to stay that way too.

--
Jan Gyselinck


On Mon, Aug 21, 2006 at 10:59:57AM +0200, mbrandeis at 013barak.net.il wrote:
> Hello all, 
> 
> Just wanted to inform you we've had the same problem described here for half a year.
> 
> When replacing the daemon with djbdns (dnscache) the problem has gone away permanently.
> 
> Right now i've had to degrade to bind 8.4.7 to avoid this problem.
> 
> After digging into the texts, I found out the assumption is this problem is directly related to the cache size.
> 
> once the cache grows over size X, most of the cpu goes to *time* / *local-time* calls (use strace/truss to check that).
> 
> the probable reason is that the server is performing tons of timestamp checks just to determine if entries in the cache have expired or not.
> 
> so, after Z time the cache grows over size X and the load of keeping it up-to-date and cross the ttl with the current time costs too much resources.
> 
> I dont know how true is that. but thats the best theory I've heard so far.
> 
> As I said, I just degraded to bind 8.4.7.  I'll be thrilled to hear a different solution. (without damaging the cache size)
> 
> Best Regards,
> Meron Brandeis
> System Unix
> Barak ITC, Ltd.
> 
> -----Original Message-----
> From: bind-users-bounce at isc.org [mailto:bind-users-bounce at isc.org]On
> Behalf Of Pawel Rogocz
> Sent: Sunday, August 20, 2006 1:48 AM
> Cc: bind-users at isc.org
> Subject: Re: bind-9.3.2 / CPU issue.
> 
> 
> This issue has been troubling us for almost two years now, since we
> deployed BIND9.
> 
> We have bunch of nameds running behind load balancer, getting on average
> 1k dns queries per second each.
> 
> We currently run with watchdogs which kill named if it starts using 100%
> CPU.
> 
> Just recently I noticed that when named enters this state, it starts
> replying with erroneous data.
> 
> For example, cached data never gets its TTL decreased,
> 
> www.sun.com has always TTL of 900. Also queries of type ANY
> against authoritative data intermittently fail wirth SrvFail error. 
> 
> We also see increased number of Udp InErrors in /proc/net/snmp when
> named enters this state.
> 
> We have run with all sorts of Linux 2.2/2.4 kernels and the problem was
> always there. We curently run 9.3.2 with internal malloc enabled.
> 
> 
> Pawel
> 
> 
> On Tue, Aug 15, 2006 at 02:53:43PM -0700, Kelsey Cummings wrote:
> > FWIW, I've seen similar behavior on some of our recusive servers in
> > specific roles.  The only thing that might be unusual about our config is
> > that a very high portion of the requests are going to forwarded zones.
> > 
> > It's be a consistent problem for us through all versions of bind 9 - we've
> > had to us bind 8 to keep them stable.  We suspected it could be a problem
> > with our compiler/libraries but the problem consistently occurs regardless
> > of what distribution or version we try to run.  (All linux.)
> > 
> > It seems to be load related - only affects two of our internal recursors
> > that do ~1k reqs/sec whereas our other more lightly loaded servers don't
> > exhibit the same exact symptoms (although they also have been known to spin
> > on the CPU.)
> > 
> > -- 
> > Kelsey Cummings - kgc at corp.sonic.net      sonic.net, inc.
> > System Architect                          2260 Apollo Way
> > 707.522.1000                              Santa Rosa, CA 95407
> > 
> 
> 
> -- 
> 
> 
> 
> 
> 
> **********************************************************************
> The information contained in this e-mail message may be
> privileged and confidential. The information is intended only 
> for the use of the individual or entity named above. If the 
> reader of this message is not the intended recipient, you are
> hereby notified that any dissemination, distribution or copying
> of this communication is strictly prohibited. If you have
> received this communication in error, please notify us 
> immediately by telephone, or by e-mail and delete the message
> from your computer. Thank you!
> Unless otherwise stated, any views or opinions expressed in
> this e-mail are solely those of the author and do not represent those of 
> Barak I.T.C (1995) The International Telecommunications Services Corp. Ltd.
> **********************************************************************
> 
> 



More information about the bind-users mailing list