reconfig times

Wed May 25 04:20:30 UTC 2005

In article <d700tn$1kge$1 at sf1.isc.org>, Kelsey Cummings <kgc at sonic.net> 
wrote:

> I've been trying to track down a couple of performance related problems
> with bind9 and have found at least one thing that's causing us trouble 
> today.
> 
> Initially while one of my name servers has just been started it
> consistently will process a reconfig in under a second - usually around
> 0.6.  A perfectly acceptable time to reload and validate the config and
> start answering requests again.  However, after a number of days of
> operation (these are also recursive servers) the server will suddenly start
> to take longer to come back from a reconfig until it starts to take longer
> than 10 seconds to come back with causes my anycast system to withdraw
> routes to the servers.
> 
> It appears that all of the time is spent in dumping and reloading the
> entire red/black tree to check it for consistency with the new
> configuration.  What I'm curious about is no apparent linear relation between
> the cache size and the length of time that it takes for the reconfig to
> finish.  If anything, it appears to be dependent on the length of time the
> server has been in operation. 

My guess is that it has to do with the locality of the data in the 
cache.  In some cases the data that needs to be reloaded is spread far 
and wide, so there's lots of page thrashing; other times, it's close 
together so there isn't as much paging.

I suggest running vmstat during the reconfigs to see what's happening 
with paging.

-- 
Barry Margolin, barmar at alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***