BIND 9.x caching Performance under heavy loads

Wed May 11 17:04:32 UTC 2005

We had the same problem: bind runs fine for a day, then slows down and
stops processing incoming packets.  Our servers are dual pentium,
hyperthreading enabled, 8 gig RAM, running Redhat (originally AS 2.1,
now 3.0).  We run bind "-n 2", which seems to give the best performance
even with hyperthreading enabled.

Our servers are quite busy.  The symptoms we see is that CPU usage
climbs steadily, although memory usage does not.  Eventually we get to
the point where we aren't receiving incoming packets fast enough to
avoid dropping them.  Restarting bind fixes the problem.  If we restart
every 24 hours, we don't see the problem.

We had tried some of the config changes recommended here without
effect.  Limiting the cache size and/or turning off cache cleaning
changed the memory footprint but didn't prevent bind from having
problems after running for 24 hours.

I was also worried about the comment that the internal memory allocator
is slow in a multi-CPU environment,  but benchmarking in the lab
actually shows its performance to be better (more consistent) than
using system malloc.  We're currently in the process of testing bind
9.2.5, compiled with ISC_MEM_USE_INTERNAL_MALLOC.  No configuration
changes have been made: we're using the default cache cleaning interval
and unlimited cache size.  The internal malloc version has been running
for a week now without a restart, which would have been completely
impossible for the system malloc version.  I'm cautiously optimistic.

Is this a linux only problem?  Bind runs on so many systems, how could
a problem like this run under the radar for so long?  How do we get to
the bottom of this so it gets fixed a better way?
Roy Mongiovi

Bernhard Schmidt wrote:
> On 2005-03-31, Bernhard Schmidt <berni at birkenwald.de> wrote:
>
> > But still, the CPU hog is there, and the CPU hog is not explainable
with
> > the general difference in speed. After all, it is running fine for
some
> > days until CPU usage explodes. I'm running the non-threaded version
now
> > since one day, if it keeps running this way until next week the bug
is
> > related to threading.
>
> I got some new results, maybe someone could point me to the options
we
> might have.
>
> The bug appeared with the multithreaded version as well as with the
> non-threaded version. We removed the biiiig RBL zone from the server
and
> the bug still appeared. So far we did only find _one_ option which
makes
> our problem disappear:
>
> 	max-cache-size 300M;
>
> if we set it to unlimited or to something higher than 300M (for
example
> 400M) BIND starts burning the CPU within two days. Remind you that
this
> server has 4GB of RAM and is dedicated to this BIND process. No
ulimits
> are set.
>
> Our next try will probably be to compile with
> ISC_MEM_USE_INTERNAL_MALLOC. I read there are some (performance)
issues
> with this flag and running threaded, would it be wise to use the
> non-threaded version and basically ignore three of four CPUs?
> 
> Thanks
> Bernhard