memory leak or other problem

Craig Cocca craigc at uia.net
Tue Sep 11 15:36:45 UTC 2007


>  I am running CentOS 4.5 and
> the latest version of Bind for that distribution bind-9.2.4-24. On one
> of my servers there is not enough RAM in the system and about every 24
> hours I have to restart the bind server. After I restart everything  
> runs
> great until I get to about 80% memory usage.
>
> The same problem happens on my other 2 servers but they have 2 and  
> 4 GB
> of memory so this takes much longer. When the memory usage gets to
> around 80% or more the dns server stops responding and I get a
> notification from my nagios monitoring system.
>
> Has anybody experienced this and if so what have you done to fix it? I
> would like to just let these servers run and not have to worry about
> them. I do realize I could just restart them every day but this is  
> only
> gonna work as long as they don't have to be restarted more then once a
> day which for now is ok. I would like to find a more permanent  
> solution
> and more insight into what is causing this.
>
>


We are seeing EXACTLY the same problem on our BIND installation on  
FreeBSD 6.1 (BIND 9.4.1-P1).  Things run fine until we get to about  
80% memory usage, then named starts eating up more and more CPU  
resources.  Eventually, the daemon stops responding to recursive  
lookups and has to be restarted (in fact, we had to come up with a  
monitoring system to detect when this happens and restart  
named....not good!).  Recently, we increased the amount of memory  
allocated to named, and just like Andrew's post, it increased the  
amount of time between restarts (from one or more times per day to  
about once every three days), but did not fix the inherent problem.

So far, we've tried the following suggested fixes, but to no avail:

1)  Set the maxdsize and maxdlsize directives in loader.conf to allow  
FreeBSD to allocate more than 512 MB to a process
2)  Changed our named.conf to decrease the cache size, making sure  
that it was smaller than both the named datasize and the FreeBSD max  
datasize
3)  Decreased the cache ttl so allow records to expire more quickly  
(in the hope that this might stave off the ever-growing memory  
footprint)
4)  The next thing we were thinking about doing was adding more  
memory (we currently have 1GB, thinking about going to 2GB), but I  
think this would have the same effect as with Andrew in just  
prolonging the inevitable crash.

If the named developers can please comment on this issue and what is  
being done to fix it, I think all in the community would appreciate  
it.  I'm seeing more and more posts about this online, so it seems to  
be a real problem.

For those who are having this problem, here is the "watcher" script  
that we use to monitor and restart named as necessary:

------------------------------------------------------------------------ 
--

#!/usr/bin/perl
# Watcher for named
# Restarts named if it dies on us
# Developed by ULTIMATE Internet Access (craigc at uia.net)

my $myip = "<enter your nameserver IP here>";
my $named_status = `/bin/ps ax | /usr/bin/grep 'named' | /usr/bin/ 
grep --invert-match 'grep'`;

my $lookup_query = `/usr/bin/host yahoo.com <enter your nameserver  
hostname here>`;
my $memory_check = `/usr/bin/top -b | /usr/bin/grep -e 'Mem:'`;

my $named_memory_check = `/usr/bin/top -b | /usr/bin/grep 'named'`;

# Get timestamp
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime  
time;

$year += 1900;
$mon++;
if($mon<10) { $mon = "0$mon"; }
if($mday<10) { $mday = "0$mday"; }
if($hour<10) { $hour = "0$hour"; }
if($min<10) { $min = "0$min"; }
if($sec<10) { $sec = "0$sec"; }

my $timestamp = "[$year/$mon/$mday $hour:$min:$sec] ";

# If BIND is dead, let's get it going again
if(!$named_status)
         {
         `/usr/local/sbin/named -4`;
         print "$timestamp BIND was restarted (process died).\n";
         }
elsif($lookup_query !~ /has address/i)
         {
         `/usr/bin/killall -9 named`;
         `/usr/local/sbin/named -4`;
         print "$timestamp BIND was restarted (named stopped  
responding).\n";
         }
else
         {
         print "$timestamp BIND is already running.\n";
         }

# Print out the memory check to the log
print $memory_check . $named_memory_check . "\n$lookup_query 
\n------------\n";

------------------------------------------------------------------------ 
------------

Thanks,

Craig Cocca




More information about the bind-users mailing list