bind9 is taking little Breaks for Some Reason.

Martin McCormick martin at dc.cis.okstate.edu
Wed Jun 27 15:20:56 UTC 2007


	I think I may have found something significant so I am
asking the list if this rings any bells?

	I got to looking at the files in /var/named/db and noticed that we had
several journal files that had grown to over a gigabyte in size.
All together, we had 5.18363e+09 or over 5-GB of journals with 2
or 3 of the journals in the 1.5-GB range.

	Our present server is no slouch, but these are really
huge files so I sort of took a deep breath, stopped bind, moved
the journals away to a backup directory for safe keeping, and
then restarted bind.

	We have not had a single timeout in about 2 hours.  In the slightly 
more than 8 hours between Midnight and the last failure of DHCP to update bind,
there were 961 failures in clusters. Between 4 minutes past
Midnight and 00:16:46, there were 120 systems that couldn't be
registered due to these timeouts.  Then, everything was grand
until 01:34 at which time, another cluster started.

	Yesterday, the platform running bind couldn't even do an
rndc stats command on itself which rules out the network.:-)

I think the problem had to do with the length of the .jnl files.
I did a journalprint on the biggest one and the zone serial 
numbers at the beginning appear to go back to some date in 2005. 
Our logs only go back to November of 2005 so this has been growing for 
a _long_ time.

	I am sorry for taking so much bandwidth to describe what
appears to be happening, but problem-solving mode isn't pretty.

	The problem could be related to anything from the way
FreeBSD4.11 handles large file pointers to the fact that the
server uses a raid configuration and it may not be quite fast
enough to keep up with updating that large of a file.

	I imagine the new server which is going to be about half
again as fast will keep up longer, but it looks like it is a
good idea to properly trim off the journals every 6 months to a year.

Martin McCormick WB5AGZ  Stillwater, OK 
Systems Engineer
OSU Information Technology Department Network Operations Group



More information about the bind-users mailing list