Recommended setup with large cache memory

Fri Sep 9 06:16:29 UTC 2005

Hi,

I am ISP hostmaster and we have DNS that gets resolver queries 
about 4 million / 10 minutes at day time. There's eight servers 
behind load balancer. Performance tuning options are set like 
this:

         max-cache-size 1024M;
         cleaning-interval 15;
         max-cache-ttl 604800;
         max-ncache-ttl 300;
         interface-interval 0;
         recursive-clients 4096;

One of the tricks to keep cpu time in reasonable limits is to use 
compile time option ISC_MEM_USE_INTERNAL_MALLOC like this:

export CFLAGS="-DISC_MEM_USE_INTERNAL_MALLOC=1"
./configure --prefix=/foobar/... [etc] && make
make install

With configuration above I see resolvers that don't crash, drop 
packets and use about 10-20% of cpu time. The cpu is two times AMD 
Opteron(tm) 1800 MHz. In startup there is option -n 2 used. Cache 
size seems to be on every resolver about 500 megabytes and stays 
in that figure day after day.

-- 
    Sami Kerola
    http://www.iki.fi/kerolasa/

08.09.2005 20:40, Kevin Darcy <kcd at daimlerchrysler.com>:

> Well, I don't have any direct experience in this area, since our 
> nameservers' caches only get up to less than 50Mb even when 
> cleaning-interval is set to 0 (I guess our users have a 
> relatively-narrow set of names they want to resolve, compared 
> to, say, a typical ISP). Having thus disclaimed, I think that 
> _theoretically_ you could try cleaning *more* frequently (each 
> cleaning should then have less impact on performance because 
> there are fewer entries to purge on each pass) and/or 
> setting/lowering max-cache-ttl and/or max-ncache-ttl to prevent 
> your cache getting clogged up by RRs with unreasonably-large TTL 
> or negative-caching-TTL values -- of course, setting or lowering 
> max-cache-ttl/max-ncache-ttl is going to increase your network 
> and CPU usage, as the cache hit ratio drops and the server must 
> then fetch RRs more often, so tune these values very carefully. 
> You could dump your cache (assuming you have enough disk space 
> :-) some time when it's mature, and analyze it to see how full 
> it is of these long-lived entries, to get a handle on how much 
> benefit you would get from tuning max-cache-ttl/max-ncache-ttl. 
> For extra credit, you could cross-reference this analysis 
> against a querylog analysis, to get a handle on what your cache 
> hit ratio is, and what it would most likely drop to if those 
> long-lived entries were capped.
>
> You could also consider some sort of clustering/load-balancing 
> solution, but again, like max-cache-ttl/max-ncache-ttl tuning, 
> your issue there is what impact is the lower cache hit ratio 
> going to have, and is that impact even worse than just having 
> the nameserver get slow once in a while when its doing its 
> cleaning...
>
>                                                            - Kevin
>
> P.S. It would be nice if there was a way to initiate a cache 
> cleaning through rndc. Then cleaning-interval could be set to 0 
> in named.conf and a cron job could be set up to do the cleaning 
> on a set time-of-day schedule, during hours when the impact on 
> customers would be minimal. In a clustered/load-balanced 
> scenario, the cleanings could be staggered and perhaps even the 
> cluster/load-balancer could be intentionally "failed over" while 
> each instance is cleaning itself, so as to make things 
> completely transparent to the users.
>
> Attila Nagy wrote:
>
>> I would like to ask, what is the recommended setup, where large 
>> cache memory is configured to the named process. In this case 
>> large means about 1-1.5 GB.
>>
>> The problem is that if I set a max-cache-size to a high value, 
>> the purging of old records (default cleaning-interval every 60 
>> minutes) takes minutes even on a fast machine. (2.6 GHz 
>> Opteron).
>>
>> During this time the server responds very slowly, if at all.
>>
>> If I set cleaning-interval to 0 (or increase to days for 
>> example), the used memory grows above the configured limit. I 
>> guess if it reaches the physical limit, it will begin swapping, 
>> but I don't want to see it.
>>
>> Wouldn't a simple LRU, or even a random drop method would 
>> better than this regular pruning which bogs down the machine 
>> for minutes?
>>
>> Will this change in 9.4?
>>
>> Thanks,