Recommended setup with large cache memory

Fri Sep 9 12:26:26 UTC 2005

Brad Knowles wrote:
>> BTW, I am not the only one:
>> http://lists.freebsd.org/pipermail/freebsd-current/2004-December/044565.html
> 	First off, Jinmei only tested BIND 9.3.0, not 9.3.1.  There have 
> been a number of improvements made in 9.3.1, some of which may have 
> come from his testing.
Yes, I am aware of that. The last time I upgraded to 9.3.1 the peak CPU 
usage of that machine went down from around 60% to 30% under the same load.

> 	Secondly, he tested on FreeBSD 5.3, and there was a paradigm 
> shift in the way FreeBSD handled SMP between 4.x and 5.x, a process 
> which is not expected to be mostly complete until we get sometime 
> into 6.x-RELEASE.  Meanwhile, you should be using Linux instead, as 
> Jinmei himself shows at the bottom of that post.  It seems that Linux 
> handles the mutexes that BIND uses much better than FreeBSD does.
Do you have benchmarks between Linux and FreeBSD, with and without 
threading? As you say, the above is old, both parties have evolved since.

BTW, we run some nameservers (bind 9.3.1) on multiprocessor 
Solaris/sparc machines and the effect seems to be similar. Turning on 
threading does not improve performance.

> 	Feel free to make any source code modifications you want, but 
> please at least submit your code back to ISC for their consideration.
I would be glad to have the time to do this. :)

>> See http://www.danga.com/memcached/ for details.
> 	I am familiar with memcached.
What is your opinion on using that to store the cached data?

>> You can run and use multiple memcached machines and if one fails, there
>> is no problem. There is no SPF, if I am right.
> 	You mean SPOF?  Yes, for the data you've lost, there is 
> definitely a SPOF -- the machine that crashed.
We can call it as SPOF or SPoF too.
Should I care? It's a cache. If the needed records are not available, it 
goes out to the network and do a query.
Is this a SPOF?

For me SPOF means that the service is available. Can be a removed entry 
(or any number of entries, while the service is still available) in a 
cache a SPOF?

>> That's what we are using. But if we -say- have four machines with four
>> gigs of RAM in each of them, this is simply wasting of resources.
> 	And people who look at their big expensive drive arrays with all 
> those disks think that they are wasting everything, if they don't 
> stripe all their data across them all.
Sorry, but I don't understand this. Do people get big, expensive drive 
arrays for squid caches?
Is an entry, which can be retrieved from the network anytime valuable, 
which needs extra protection besides its integrity?

> 	Everyone always seems to ignore reliability and tries to shoot 
> for absolute maximum performance -- until they have a catastrophic 
> failure.  Then they wish they'd gone for high availability and fault 
> resilience, instead.
What do you think?

>> If we could have four caches with 512 MB of RAM and four machines with 4
>> gigs, I would have 16 GB of cache, which is unique to the whole cluster,
>> so there is no multiple instances of the same data, and there is no such
>> a problem that my nameserver of the IP 1.1.1.1 gives different answers
>> for subsequent queries.
> 	Uhh, excuse me?  How are you calculating these numbers?  How are 
> you coming to these conclusions?
For what conclusions?
Different answers:
if you have a virtual IP address with a load balancer, which routes the 
queries to a number of caches you will have inconsistency in the answers.
For example it will be possible that the first query for mx.domain.com 
will be negative (because in one of the caches there is an entry for it) 
and the next one will give an IP address.

>> Also, if a backed server disappears, I don't care about the loss of 4
>> GBs of cached RRs. What is important to maintain the working state, which
>> is the case in this setup.
> 	Hell, yes -- you do care about the loss of that data.  And where 
> do you think that the working state is being maintained, anyway?
Is this an issue with memcached?

> 	BIND can be plenty fast.  See Jinmei's post that you yourself 
> quoted.  Rick Jones has also gotten some very high performance out of 
> BIND.  Multiple different sources have pushed it to do 20-30k queries 
> per second, or more.
With a simple queryperf "benchmark" I could do about 35k qps on an UP 
machine, if I query the same (cached) A record. This performance doesn't 
really changes with the cache size in use.

BTW, a commercial product could handle the same (production) load with 
about 3% of CPU usage, while bind still ate about 30-40% on that machine 
(after upgrading to 9.3.1).

> 	Yes, there are programs out there that are faster than BIND (see 
> my own performance testing at 
> <http://www.shub-internet.org/brad/papers/dnscomparison/>), but BIND 
> can still do quite nicely.
I've already seen your paper. I think it would be interesting to repeat 
that experiment.

-- 
Attila Nagy                                   e-mail: Attila.Nagy at fsn.hu
Adopt a directory on our free software   phone @work: +361 371 3536
server! http://www.fsn.hu/?f=brick             cell.: +3630 306 6758