DNS server caching performance test results.

Thu May 17 20:03:53 UTC 2001

At 11:59 AM -0600 5/17/01, Matt Simerson wrote:

>  It possible but it shouldn't be. Many sites use forwarding using it's
>  forwarding mechanism darned well better work. If it doesn't, that's
>  something I'd rather learn about in testing rounds rather than production.
>  As it turns out, it worked just fine.

	Forwarding is evil, and should be avoided if at all possible.  It 
just doesn't work the way people expect it to, and it causes far too 
many weirdnesses if/when it goes wrong.

	So, how can you be *sure* that forwarding worked just fine in 
your instance, unless you turn it off and run another test to 
compare?  And even if it didn't screw you up this time, how can you 
be sure that it won't screw you up in the future?

	IMO, it's better to just turn off forwarding altogether.

>  That seems to be a moot point because after they (forward) query the data
>  once from walldns, they have the data cached anyway. Either way works in
>  pretty much the same fashion. My way removes the additional query for
>  216.in-addr.arpa.

	Right, but if they don't forward (hint), then they need to find 
out who is serving that zone, and unless you're going to register 
your "walldns" server as authoritative with the TLD nameservers, the 
only other way to do that is to have the caching nameservers think 
that they are primary for the parent zone, and then have them 
delegate the child zone themselves to the "walldns" server.

	And what happens when you point your clients directly at the 
"walldns" server itself?  What kind of performance can you get out of 
it when you don't have to go through the caching nameservers?

>  Funny you should ask. :-)  I actually did this while testing dnsfilter. I
>  posted the results to the djbdns mailing list because the behavior didn't
>  seem quite right to me. I would have a expected that increasing the number
>  of parallel queries would continually increase the qps rating until the
>  maximum ability of the name server was reached. That turned out not to be
>  the case. I tested with values ranging from 1 to 10,000 and found any value
>  between 5 and 20 to be optimum number. Not coincidentally, 10 is the
>  default.

	I'm not too surprised.  At some point, all those parallel threads 
would have to start tripping over themselves, and you'd have more and 
more of them that are blocked on the client side, waiting for the CPU 
to become available, as opposed to waiting on the server for the 
answer to the question they had asked.

>>  	Fascinating.  I wonder if BIND 8 is returning the answer before
>>  storing it in the cache, while the other two programs are storing it
>>  in the cache first and then returning the answer?
>
>  I'm guessing that's exactly what BIND 8 does. How else could you explain
>  numbers like that?

	Someone more familiar with the code would have to answer that 
question.  Mark?

>  Briefly. I don't recall why I stopped looking at it but we determinted early
>  on that it wasn't well suited for what we're doing.

	I'd be interested to know what those reasons were.  I'm not 
familiar with it myself, but I am curious.

>  I mostly agree with you. However, I think that most people don't care
>  exactly how efficient it is, they want to know how much memory it's going to
>  use.

	I disagree.  I think that people are interested in both.

>        What I've provided gives them some decent rules of thumb (like take
>  whatever BIND 8 is using and add 20% for BIND 9). Your question has made me
>  curious though so I've added a new field to my test spreadsheet. I can watch
>  BIND via top or ps aux to see how much RAM it's grown by and dnscache logs
>  how many bytes of data it's written to cache so I'll record that data next
>  time.

	Cool.  I'll be interested to see those results.

>  Can you define this test methodology better?  Am I just keeping track of the
>  third clients output?

	Yes.  The other two are just there to provide background noise 
and to try to push the server hard enough to make it feel seriously 
stressed.  You still want to gather the server-side statistics to see 
how many queries were answered during the testing period, but the 
numbers you should be most interested in are the client-side numbers 
when the server was being stressed by the other two clients.

>                         Am I running the test in client 3 at the same time as
>  1 and 2 are cycling through?

	Yes.

>                                If so, I'm just going to see extended times for
>  client 3 to resolve the data. I have to keep track (on all clients) of how
>  many queries are being answered and correlate that number to elapsed test
>  time to get a meaningful qps rating.

	Not necessarily.  Track the one client, and the server-side performance.

>  Nope, but it can be for the next one...  I think I'll write a little script
>  that records the CPU % of the name server process every second during the
>  run. So what's the best number to record, MAX untilization?

	I'd record all four numbers -- system busy %, user busy %, wait %, idle %.

>                                                               Disk activity is
>  pretty meaningless since each of the systems have 32MB of cache on the RAID
>  controller and the output files are 3MB in size.

	I wouldn't be so sure.  I'd track it anyway, just to see if there 
is unexpected filesystem activity.

>  The test machines I have for this are all single procs. :-(  Uh, hmmm,
>  what's this dual 700 under my other desk doing next week?  Hmmm, maybe that
>  will generate some fun numbers. :-) Hi ho, hi ho, off to the NOC I go.

	I'll be interested to see the numbers with a multi-processor server.

-- 
Brad Knowles, <brad.knowles at skynet.be>

/*        efdtt.c  Author:  Charles M. Hannum <root at ihack.net>          */
/*       Represented as 1045 digit prime number by Phil Carmody         */
/*     Prime as DNS cname chain by Roy Arends and Walter Belgers        */
/*                                                                      */
/*     Usage is:  cat title-key scrambled.vob | efdtt >clear.vob        */
/*   where title-key = "153 2 8 105 225" or other similar 5-byte key    */

dig decss.friet.org|perl -ne'if(/^x/){s/[x.]//g;print pack(H124,$_)}'