Root zone timeout and workarounds?

Kevin Darcy kcd at daimlerchrysler.com
Wed Feb 21 04:02:08 UTC 2001


Okay, so you're talking about other nameservers on the Internet, not stub
resolvers, timing out trying to resolve names in your domain when 4 out of your 5
registered nameservers are unavailable. I suppose this isn't that surprising; 4
out of 5 is a serious outage. Of course, if the remote nameservers are running
BIND or something like it, they should quickly adapt to the outage. Have you tried
to *successive* queries to those remote nameservers? Do they eventually stop
timing out? Perhaps this is just a temporary effect.

If this temporary effect is unacceptable, then you may be able to increase your
availability by, paradoxically, reducing the number of registered nameservers. If,
for example, you reduced down to 3 nameservers in 2 different locations, then if
the larger location goes down -- thus making 2 of the nameservers unavailable --
convergence should be faster with 1/3 of your nameservers available than with only
1/5. Of course, with only 3 nameservers, you'll probably have more service
interruptions (because, network problems aside, any *machine* problem, including
just a reboot, will take out 1/3 of your nameserving capacity), plus you're not
spreading your normal query load over as many servers, so you may run into
capacity problems. It's a tradeoff.

Ultimately, of course, your best availability would be achieved by having
*every* registered nameserver be in a different location and/or on a different
network link. But that can be difficult to achieve economically and logistically.


- Kevin

denon wrote:

> At 09:00 PM 2/19/2001 -0500, you wrote:
>
> >When you say the "resolvers" are timing out, do you mean caching nameservers
> >doing recursive lookups, or do you mean stub resolvers?
>
> Excuse my lack of terminology .. but here's what's happening, hopefully I'm
> answering your question:
>
> say I have foo.com registered with NSI. I've also registered hosts ns, ns2,
> ns3, ns4, ns5.foo.com.
>
> They're listed on foo.com, at NSI, in that order. NS5 being the off-site,
> ns1-4 being the ones on our network.
>
> When I take ns1-4 down, I pick a random remote nameserver (say,
> ns.yahoo.com), one that I know doesn't have it cached/etc. Then I try to
> resolve SomeRandomArecord.foo.com off it. These resolves are what are
> timing out. It doesn't matter what remote NS I pick, I have similar results
> .. occasionally it'll resolve, usually it times out ..
>
> Am I making sense? I hope so ..
>
> >Perhaps you should consider putting
> >the remote server second or third in the list to reduce the possibility of
> >timeout.
>
> You're probably right, I guess I was under the impression that the root
> servers picked the nameservers at random (random, weighted by uptime past
> success, I guess).
>
> >In some versions of BIND 8 there was a "rotate" resolver option which
> >would cause the stub resolver to rotate the nameserver list for each
> >query. But
> >that option appears to be gone as of BIND 9, so I wouldn't rely on it.
>
> Is this an issue with the root servers? Surely they're not running generic
> bind8 .. :)
>
> Thanks for your ideas Kevin. I hope I've clarified things a little.
>
> >denon wrote:
> >
> > > I've been digging through the archives, usenet as well as a variety of
> > > other tech docs in search of the answer for my question.  I haven't come up
> > > with any results, but if this is a "frequently asked question", please
> > > don't be afraid to throw me to a url.
> > >
> > > Here's the situation we've got:  I have a situation, where I've got the
> > > need for a relatively highly redundant dns system (who doesn't? :). On an
> > > Internet domain, as a test, I've listed 5 nameservers. One of the
> > > nameservers is at a remote location, and the other 4 are at various places
> > > within our internal network.  Due to the fact that the internal network is
> > > all geographically in the same area, there's a "good chance" all 4 here
> > > would go down at the same time. We don't presently have the facilities for
> > > more than one off-site, but I think it's safe to rely on just one.
> > >
> > > The problem is this: When I take down the 4 internal nameservers (when I
> > > say take down, I mean ndc stop, not just drop the zone), the 5th nameserver
> > > outside responds just fine. However, I think most resolvers are timing out
> > > before it does. Shouldn't the root servers respond faster than the resolver
> > > times out? While the 4 are down, if you resolve something 10 times in a
> > > row, maybe 6 times it'll time out, and 4 times it'll resolve. (assuming you
> > > resolve something different from the same zone each time .. not
> > caching/etc.).
> > >
> > > Is this a common problem? If all 4 of the internal nameservers go down,
> > > will the 5th be of any use?
> > >
> > > I'd appreciate any insight you can give me, TIA.
> > >
> > > Best Regards.





More information about the bind-users mailing list