DNS performance testing - FreeBSD & Solaris - BIND & djbdns

Matt Simerson mpsimerson at hostpro.com
Thu Jun 21 21:29:02 UTC 2001


The reference hardware for FreeBSD systems is the HP LPr with a single
700MHz PIII CPU and 1GB RAM. Solaris Sparc** systems are E420's with 2GB of
RAM and a single 450MHz CPU. 

NOTES:

* Much of the Solaris testing is incomplete due to difficulties first with
hardware (due to shipping damage), OS tuning (requiring help from Sun
Engineers), and a bonehead error by a sysadmin costing several days
necessary for the Solaris machines to chew threw the Class A testing.

** It must be noted that we don't have a tremendous amount of Sun expertise
within our department. The Sun's arrived, preconfigured from the guys in our
company who do have lots of expertise with them and I installed dnscache,
BIND 8, and top, and a few other utilities to monitor the system. Tuning was
done by them with the help (and lack thereof) of a couple Sun engineers.

*** This testing is almost entirely limits checking. It is in addition to
the tests I posted earlier to this list reflecting the real world
performance testing I was doing. The following is exerpts from the QA test
documents produced as part of our testing.


5.	Features/Items To Be Tested

Number	Name	Description
1	Caching Performance	Publicly available DNS servers will NOT be
caching servers. Caching servers will be available ONLY to dns requests from
our IP space.
2	Query performance	How many queries can each server reliably
serve in a given unit of time? 
3	Propagation time	How long it takes from the initial entry of
a DNS change before that change is available on the public internet
4	Uptime	How reliable is the server? Are there any known DoS attacks?

5	Security	A comprehensive security audit will be performed on
each system. Host and application security will be tested thoroughly.
7	System manageability	How easy is it for a system administrator to
make changes? How much time  is necessary for administration tasks on a
routine basis?
8	Scalability	Use load balancers to determine how load scales
across multiple servers.

 
6.	Testing Strategy / Approach

1. Caching performance. 

For testing caching performance, we'll use a dns client program named
dnsfilter running on 1 to 3 client machines to rapidly query the name
servers. Dnsfilter takes a list of IP addresses as input and outputs the
result as a list of IP addresses and the hostnames returned by the DNS. I
wrote a program that generates an input file suitable for dnsfilter. An
input file named "iplist-a" will contain an entire Class A block of IP
addresses and a second file named 'iplist-b" will contain an entire Class B
block of IP addresses. 

A dnswall has been set up on 216.122.69.110. The program is named walldns
and is capable of answering over 6,000 dns requests per second. This is well
beyond the limits of all the dns servers and caches that we are testing.
Each caching name server (dnscache, BIND 8, BIND 9) will be configured to
forward all requests to the dnswall. The resolv.conf file on the caching
name servers will contain no "search" or "domain" keywords and will contain
only one nameserver entry that points all DNS resolution to the currently
active dns server on that machine. The resolver will be configured to only
search the dns for resolution.

To begin the test, start up the first dns server to be tested. For dnscache,
the command "services start" will start it up. For BIND 8, the command "ndc
start" will work. BIND 9 must be started manually. Grab an SSH login into
the dns client and get ready to start testing. After starting up the caching
server, start the monitoring program "watchme.sh". Watchme is a script that
will record the next 5 minutes of activity on the server, outputting the CPU
and memory usage of each server to a file named "watchme.out".  We'll then
run a batch of tests and record the output. So, here's exactly what to do:

Server:
   # script                         ; records the session
   # services start             ; starts up dnscache
   # watchme.sh               ; monitors server CPU and RAM usage.

Client: 
  # script
  # time runtest-b.sh > b1.out
  # time runtest-b.sh > b2.out
  # time runtest-b.sh > b3.out

Record the following from watchme.out: starting and ending RAM value, Max
CPU. 
Record the following from the time output: real time. 
Grep the output files to record: successful queries ("grep = a?.out | wc
-l")
Calculate the accuracy and qps. Accuracy = success / total. Total is the
number of lines in the file iplist-b. Qps (queries per second) = success /
time. 

A table is included below to record the data. Once the first run is done for
dnscache, clear the cache server "svc -t /services/dnscache", start the
watchme script and then run the test again on the clients but with the class
A iplist:

Server:
   # svc -t /service/dnscache        ; restarts dnscache
   # watchme.sh                            ; monitors server CPU and RAM
usage.

Client: 
  # time runtest-a.sh > a1.out
  # time runtest-a.sh > a2.out
  # time runtest-a.sh > a3.out

Record the results and then proceed to the next dns server (BIND 8) and
finally for (BIND 9). Follow the exact same tests as above except alter the
commands used to start and stop BIND as documented above.

65,536 queries (Class B) - 1 client - FreeBSD
Cache	     start  end   CPU	Time	success   Accuracy  qps
            RAM	RAM	
dnscache	100	100	52%	30	65,536	100	2,184
dnscache	100	100	57%	16	65,536	100	4,096
dnscache	100	100	51%	16	65,536	100	4,096
							
BIND 8	2	8	38%	13	65,536	100	5,041
BIND 8	8	8	82%	30	65,536	100	2,185
BIND 8	8	8	87%	33	65,536	100	1,986
							
BIND 9	5	10	95%	72	65,536	100	910
BIND 9	10	10	61%	23	65,536	100	2,850
BIND 9	10	10	64%	23	65,536	100	2,850


65,536 queries (Class B) - 1 client - Solaris
Cache	     start  end   CPU	Time	success   Accuracy  qps
            RAM	RAM	
dnscache	100	100	76%	76	65,536	100	862
dnscache	100	100	80%	33	65,536	100	1986
dnscache	100	100	80%	34	65,536	100	1927
							
BIND 8	3	9	35%	53	65,536	100	1237
BIND 8	9	9	88%	51	65,536	100	1285
BIND 8	9	9	92%	51	65,536	100	1285


16711680 queries (Class A) - 1 client - FreeBSD
Cache	     start  end   CPU	Time	success   Accuracy  qps
            RAM	RAM	
Dnscache	100	100	85%	2:14:35	16,777,216	*
2,087
Dnscache	100	100	85%	2:15:41	16,777,216	*
2,071
Dnscache	100	100	85%	2:24:45	16,777,216	*
1,942
							
BIND 8	2	gok	85%	23:31:00	CRASH	*	FAILED
BIND 8					*	*	
BIND 8					*	*	
							
BIND 9	5	gok	96%	22:29:55	FAILED	*	FAILED
BIND 9					*	*	
BIND 9					*	*	

**** Solaris testing cancelled due to time constraints.

 
Once that batch of tests is run, we'll do the exact same series of test
again except with three clients hitting the name servers at the same time. 

65,536 queries (Class B) - 3 clients - FreeBSD
Cache	     start  end   CPU	Time	success   Accuracy  qps
            RAM	RAM	
dnscache	100	100	75	63	196,608		3,121
dnscache	100	100	65	50	196,608		3,932
dnscache	100	100	65	50	196,608		3,932
							
BIND 8	2	8	85	44	196,608		4,468
BIND 8	8	8	95	86	196,608		2,286
BIND 8	8	8	94	83	196,608		2,369
							
BIND 9	5	12	95	108	196,608		1,820
BIND 9	12	12	94	69	196,608		2,849
BIND 9	12	12	94	66	196,608		2,979

**** Solaris testing cancelled due to time constraints.


2. Query performance. 

For BIND, cache performance is the same is query performance because BIND
reads all the zone files in at start time and serves them from it's cache.
For djbdns, we need to test the query performance of tinydns which will be
serving all the authoritative data.  We'll do this by creating a data file
for tinydns that serves all the reverse space within 216.122 and point
dnsfilter at it (just like in the cache testing). Record all the following
data from each run:

Server:
   # services start
   # watchme.sh                            ; monitors server CPU and RAM
usage.

Client: 
  # time dnsfilter < iplist-b > b1.out
  # time dnsfilter < iplist-b > b2.out
  # time dnsfilter < iplist-b > b3.out

Record the results and then proceed to the next dns server (BIND 8) and
finally for (BIND 9). Follow the exact same tests as above except alter the
commands used to start and stop BIND as documented above.

65,536 queries (Class B) - FreeBSD - 1 client
Server	start	end   CPU	Time	qps
            RAM   RAM
tinydns	1	1	72	18	3641
tinydns	1	1	74	18	3641
tinydns	1	1	72	18	3641

65,536 queries (Class B) - Solaris - 1 client
tinydns	2	2	96	159	412
tinydns	2	2	97	159	412
tinydns	2	2	96	159	412


As contrasted by dnscache, tinydns isn't significantly affected by logging.
Whereas by default dnscache chewed through disk space like it was a blue
light special, tinydns is fairly lightweight. I first ran the tests with
logging on (because I forgot to turn them off) and the times for the runs
were at 19 seconds each. With logging off, it dropped to 18 seconds. Full
logging with tinydns is not as expensive as I had guessed.

As we can see, tinydns doens't run well on Solaris at all. It's screams
along on FreeBSD so I'm guessing that it's due to FreeBSD's filesystem
caching being more agressive than Solaris's.



More information about the bind-users mailing list