Performance Test Metrics for dns server performance.

Fri May 4 20:52:56 UTC 2001

Hi dns'ers,

I'm building an enterprise dns solution. This solution will need to pass a
QA performance evaluation. Our QA lab doesn't have anything well defined for
how to test so I need to define a set of test metrics. The things we'll be
testing is running BIND 8 and djbdns on three platforms and comparing
security, performance, and reliability.

I've spend the last couple days spending a lot of time scouring the net for
methodologies for testing dns server performance. From all that I've settled
on what I think is a pretty reliable set of test metrics. Dnsfilter seems to
be the only tool out there that is designed kick a name server in jaw so
I've given it a first go round. For now the focus is on performance so I
figure three seperate tests would be appropriate:

 Recursive query test - Simulate client connections doing queries. Simulate
using dnsfilter on local ip space.
 Raw caching performance - lookup 10,000 cacheable queries and then repeat
the lookup.
 Randomly querying data the server is authoritative for. 

 I've run a set of initial tests and come up with some very interesting
numbers. What I'm looking for is other tests and test methods that will
insure the accuracy and fairness of the tests. I'm also hoping I can get
some valuable feedback on interpreting the results.

For this initial test, here's what I used:

  DNS server: PIII 600, 512MB, Intel Pro 100+ (100BaseT, full-duplex)
        FreeBSD 4.3, MylexAR250 RAID 5 on 5 disks w/32MB cache.
    BIND 8.2.3-REL - unlimited - (options "files unlimited")
    Dnscache-1.0.5 - 290MB - (MAXUDP 2000, -o2500)

  DNS client: Dual PIII 850, 1GB, Intel Pro 100+ (100BaseT, full-duplex)
        FreeBSD 4.3, MylexER2000 RAID 5 on 3 disks w/32MB cache.
     1) cat iplist | dnsfilter -c 1000 -l 100000
     2) cat iplist | dnsfilter -c 10000 -l 100000
     3) cat iplist | dnsfilter -c 100000 -l 100000

Dnsfilter is a simple program that takes a list of IP's on STDIN and does
lookups on each IP. It inserts the hostname or an error into each query line
and outputs to STDOUT. It supports two options, -c is the number of
simultaneous queries, and -l is how many lines of input to read ahead.

The file "iplist" is a compilation of 90,112 IP addresses that my company
owns and are local IP's (to our NOC). So the answers will all be found on
our three local name servers (solaris & bind 8) which are on the same LAN.
By limiting it to our IP space I'm limiting the tests skew for network
conditions, etc. The only real factor I have no control over is the load on
the "real" name servers I'm querying so I've staggered the timing of the
tests to limit variance.

The above commands (1-3) were dumped into shell script and then executed in
the following fashion:

  <start dnscache on server>
  time ./iplist-runtest-1000.sh > iplist-1000.out1
  time ./iplist-runtest-1000.sh > iplist-1000.out2
  time ./iplist-runtest-1000.sh > iplist-1000.out3
  <stop dnscache on server and start BIND, repeat test>

  <stop BIND, start dnscache, start test>
  time ./iplist-runtest-10000.sh > iplist-10000.out1
  time ./iplist-runtest-10000.sh > iplist-10000.out2
  time ./iplist-runtest-10000.sh > iplist-10000.out3
  <stop dnscache, start BIND, repeat test>

  <stop BIND, start dnscache, start test>
  time ./iplist-runtest-100000.sh > iplist-100000.out1
  time ./iplist-runtest-100000.sh > iplist-100000.out2
  time ./iplist-runtest-100000.sh > iplist-100000.out3
  <stop dnscache, start BIND, repeat test>

After the test runs it was simple to do a few greps on the output to check
the number of successes, timeouts, and temporary failures:

  grep = iplist-1000[00].out | wc -l
  grep : iplist-1000[00].out | grep time | wc -l
  grep : iplist-1000[00].out | grep temp | wc -l

I expected the second and third runs to be substantially faster than the
first one due to the caching nature of the servers. This turned out not to
be the case. I'm assuming that's because the majority of the time is spent
dealing with lookups that time out or fail. In total I ran 18 queries (each
of the three aformentioned tests, three times, against each name server). 

After all the tests were run I compiled the data into a spreadsheet. I also
expected that as the number of simultaneous requests increased, the query
success rate would rapidly diminish. This turned out to be the case with
BIND but not with dnscache.

So, here are the averaged results:

  dnscache-1.0.5 - 290MB - 90,112 requests
  simultaneous   time(s)   completed      timed-out      temp fail
  1,000          964       40,308         9,281          10,073
  10,000         976       40,496         8,928          11,132
  100,000        875       40,786         7,562          11,816

  BIND 8.2.3-REL - 6-8MB - 90,112 requests
  1,000          1144      18,966         5,203          47,638
  10,000         1157      14,299         4,899          54,236
  100,000        1200      12,771         5,185          56,575

While watching the CPU load on the dns server, dnscache regularly chewed up
30-40% of the CPU while answering queries. This doesn't seem to be quite as
exhaustive as having several hundred mail servers pointed at dnscache (as
one of my production caches has). Dnsfilter on the client was chewing up
massive portions of one of the 850MHz CPU's for parts of the test and
appeared idle at other times. I'm not sure why that was. BIND was very
friendly to the CPU, very seldom did it climb above 5-6% of the CPU. 

While doing the queries, the time taken to resolve was not terribly
different between the two servers (~20% higher for BIND). The value of the
data output however appears to vary wildy. Dnscache consistently resolved
about 45% of our IP space while BIND 8 ranged between 14% and 22% accuracy.
BIND's performance didn't seem to suffer from load until you started looking
at the output. As the numbers indicate, as the connections increased, the
value of the output diminished. These numbers aren't quite what I was
expecting. It would appear BIND has some load balancing features built in
that just start failing queries as the load ramps up?

Again, these are intitial tests, designed only to evaluate test
methodologies and to determine an accurate way to measure DNS server
performance. Does this sound like a reasonable way to test? Is there a
better way? Any suggestions or comments are welcome.

Matt