deploying DNS in large ISP

Wed Jul 4 22:35:26 UTC 2001

Brad Knowles wrote:

> At 10:51 AM -0400 7/4/01, ray at doubleclick.net wrote:
> 
>>  Hm, perhaps Mr. Powers was asking about authoritative nameservice (and
>>  not caching resolvers)?

Well, actually no. We've got 3 authoritative nameservers, but right now 
they're not my primary concern (not to say they won't need attention 
later) I'm more concerned at this time with the caching resolvers.

> 
>>                           I setup the authoritative DNS system for my
>>  employer. We used a lot of smaller Sun systems (e.g. Netra T-1, E220R,
>>  etc.) with stripped-down O/S and running a single instance of BIND
>>  8.2.x per node; in front of each cluster of nameservers, we use a
>>  hardware load-balancer capable of handling UDP "transactions". The
>>  theory of operation: to use enough nodes per cluster, such that the
>>  failure of 1 or even 2 nodes would not render the cluster unusable
>>  (overloaded). So, each child node should be sized to handle 200% load,
>>  with a minimum cluster size of 4 nodes.
> 
> 
>     For authoritative nameservice, I do not believe that this kind of 
> operation is necessary.  Most recursive/caching nameservers out there 
> seem to handle failure of authoritative nameservers pretty well. With 
> BIND 8 on DEC Alpha hardware in 1996, you could easily sustain 2000 
> queries per second (which is about what the root nameservers were 
> sustaining, and IIRC, the DEC Alpha configuration was the genesis for 
> RFC 2010 "Operational Criteria for Root Name Servers"), and this was
> more than enough for most applications.
> 

wow. <g>

<snip>

>>  At first we tried to use a software product like Resonate Central
>>  Dispatch, but CD cannot load-balance UDP (so no good for DNS). Later,
>>  we tested Alteon equipment, but for some reason could not get this to
>>  work. Finally we settled on ArrowPoint CS-100/CS-200's (the company
>>  has since been acquired by Cisco, you can get the CS-200 or CS-800
>>  still, I believe). No problems with the CS-* series, except if you
>>  don't like IOS. Make sure to keep-up with the IOS updates!
> 
> 
>     Right.  Arrowpoint.  We tried those at Skynet.  There's a reason why 
> we had two of them sitting on the shelf as of the time I left.  A 
> friend/former co-worker of mine at AOL also did some testing on them. 
> His comment after just a few hours of testing was that "Oh, it has an 
> internal hard disk that it depends on for booting and operating -- poor 
> switch.  RIP."

Thanks for the update on this.

> 
>     I would encourage anyone who is serious about looking into L4 
> switches to take a close look at the Load Balancing Resources web site 
> at <http://www.vegan.net/lb/>, and the archives of the load-balancing 
> mailing list.
> 

Just as a follow-up, currently we are deploying resolvers into each of 
our pops, and using our radius system to assign the ip's to our 
customers. I don't nescessarily disagree with this approach, I just 
wonder if it's a better idea to build a much larger, more fault-tolerant 
load-balancing system, or actually, three, in geographically diverse 
areas, so that should one box be impacted (crash <g>) the services 
continue unimpaired - or at least minimally impaired, rather than having 
one pop substantially impacted. This is, of course, dependent on the 
resilience of the network over which the traffic will travel. I don't 
know that I like the idea of 20-30 ip's for resolvers for an area the 
size of California. just doesn't seem clean to me. What do you guys think?

Thanks for the input thus far, it's been quite helpful.

Duane Powers