Root zone timeout and workarounds?

Kevin Darcy kcd at daimlerchrysler.com
Wed Feb 21 22:44:52 UTC 2001


denon wrote:

> At 12:36 AM 2/21/2001 -0500, you wrote:
>
> >denon wrote:
> >
> > > At 11:02 PM 2/20/2001 -0500, you wrote:
> > >
> > > >Okay, so you're talking about other nameservers on the Internet, not stub
> > > >resolvers, timing out trying to resolve names in your domain when 4 out of
> > > >your 5
> > > >registered nameservers are unavailable.
> > >
> > > Right.
> > >
> > > >I suppose this isn't that surprising; 4
> > > >out of 5 is a serious outage. Of course, if the remote nameservers are
> > running
> > > >BIND or something like it, they should quickly adapt to the outage. Have
> > > >you tried
> > > >to *successive* queries to those remote nameservers? Do they
> > eventually stop
> > > >timing out?
> > >
> > > Yes, but after way too many attempts ... I almost gave up trying myself,
> > > before I realized, "oh, it worked that time .. "
> >
> >Perhaps those remote nameservers don't have an adaptive algorithm (???) Do you
> >know for sure that they are running BIND?
>
> No clue, but I can't bank on anything. They're Internet users from all over
> the world .. using every nameserver imaginable.

Right, but most of those nameservers are BIND. Those that are not should at least
have an adaptive algorithm similar to BIND's. I would expect non-adaptive
nameservers to form a very small minority.

> > > >If this temporary effect is unacceptable, then you may be able to increase
> > > >your
> > > >availability by, paradoxically, reducing the number of registered
> > > >nameservers. If,
> > > >for example, you reduced down to 3 nameservers in 2 different locations,
> > > >then if
> > > >the larger location goes down -- thus making 2 of the nameservers
> > > >unavailable --
> > > >convergence should be faster with 1/3 of your nameservers available than
> > > >with only
> > > >1/5.
> > >
> > > Tacky, I know .. but one of the reasons we have 4 nameservers on-site, and
> > > one off-site, is due to the fact that we want a majority of the requests to
> > > come to our network. Namely, because there will be a lot of them, and we
> > > don't want to soak the remote link's bandwidth with dns requests.
> >
> >If the remote nameserver is answering significantly more slowly than the
> >others,
> >then other nameservers on the Net should adapt to that fact and send it less
> >queries. Of course, this assumes, yet again, that those other nameservers are
> >BIND or have an adaptive algorithm like BIND's.
>
> But each nameserver would have to 'fail' first before it learns, right?
> that's pretty unacceptable, considering it'd take thousands to fail before
> things stabled out for all the users.

The more-heavily-used nameservers would converge faster than the less-heavily-used
ones, so I think you're overstating the overall impact on the user community.

> Or are you talking the root servers?

No, the root servers would never ask your servers anything. They're non-recursive.


- Kevin

> > > >Ultimately, of course, your best availability would be achieved by having
> > > >*every* registered nameserver be in a different location and/or on a
> > different
> > > >network link. But that can be difficult to achieve economically and
> > > >logistically.
> > >
> > > Exactly, I wish we could pull it off economically .. but this project just
> > > doesn't merit it unfortunately.
> > >
> > > Thanks again for your time on this ... any ideas where I should head from
> > > here? Or any better way to weight requests with the root servers, so I can
> > > have less NSs listed?
> >
> >Not really. In a perfect world, this should all be adaptive, so that
> >wouldn't be
> >necessary.
> >
> >You could accomplish a certain degree of "weighting" by having the NS
> >records in
> >your zone be a superset of those in the parent's delegations. Nameservers
> >querying
> >your domain immediately after a restart/reload, or when your domain's NS
> >records
> >expire from their caches, will only know about the delegated nameservers,
> >therefore the delegated nameservers would tend to get more traffic
> >(assuming all
> >other things are equal, particularly, assuming that they all answer equally
> >quickly). But having an NS-set mismatch like that can sometimes cause
> >glue-record
> >problems, and, besides, I don't see that it would help in your situation,
> >since
> >leaving the remote nameserver out of your delegations would leave your domain
> >unresolvable if the network link to the other nameservers was unavailable.
> >
>
> Nod, if the internal network were to go down entirely, we'd still be dead
> in the water ..
>
> >- Kevin
> >
> > > >denon wrote:
> > > >
> > > > > At 09:00 PM 2/19/2001 -0500, you wrote:
> > > > >
> > > > > >When you say the "resolvers" are timing out, do you mean caching
> > > > nameservers
> > > > > >doing recursive lookups, or do you mean stub resolvers?
> > > > >
> > > > > Excuse my lack of terminology .. but here's what's happening,
> > hopefully I'm
> > > > > answering your question:
> > > > >
> > > > > say I have foo.com registered with NSI. I've also registered hosts
> > ns, ns2,
> > > > > ns3, ns4, ns5.foo.com.
> > > > >
> > > > > They're listed on foo.com, at NSI, in that order. NS5 being the
> > off-site,
> > > > > ns1-4 being the ones on our network.
> > > > >
> > > > > When I take ns1-4 down, I pick a random remote nameserver (say,
> > > > > ns.yahoo.com), one that I know doesn't have it cached/etc. Then I
> > try to
> > > > > resolve SomeRandomArecord.foo.com off it. These resolves are what are
> > > > > timing out. It doesn't matter what remote NS I pick, I have similar
> > results
> > > > > .. occasionally it'll resolve, usually it times out ..
> > > > >
> > > > > Am I making sense? I hope so ..
> > > > >
> > > > > >Perhaps you should consider putting
> > > > > >the remote server second or third in the list to reduce the
> > possibility of
> > > > > >timeout.
> > > > >
> > > > > You're probably right, I guess I was under the impression that the root
> > > > > servers picked the nameservers at random (random, weighted by
> > uptime past
> > > > > success, I guess).
> > > > >
> > > > > >In some versions of BIND 8 there was a "rotate" resolver option which
> > > > > >would cause the stub resolver to rotate the nameserver list for each
> > > > > >query. But
> > > > > >that option appears to be gone as of BIND 9, so I wouldn't rely on it.
> > > > >
> > > > > Is this an issue with the root servers? Surely they're not running
> > generic
> > > > > bind8 .. :)
> > > > >
> > > > > Thanks for your ideas Kevin. I hope I've clarified things a little.
> > > > >
> > > > > >denon wrote:
> > > > > >
> > > > > > > I've been digging through the archives, usenet as well as a
> > variety of
> > > > > > > other tech docs in search of the answer for my question.  I haven't
> > > > come up
> > > > > > > with any results, but if this is a "frequently asked question",
> > please
> > > > > > > don't be afraid to throw me to a url.
> > > > > > >
> > > > > > > Here's the situation we've got:  I have a situation, where I've
> > got the
> > > > > > > need for a relatively highly redundant dns system (who doesn't? :).
> > > > On an
> > > > > > > Internet domain, as a test, I've listed 5 nameservers. One of the
> > > > > > > nameservers is at a remote location, and the other 4 are at various
> > > > places
> > > > > > > within our internal network.  Due to the fact that the internal
> > > > network is
> > > > > > > all geographically in the same area, there's a "good chance"
> > all 4 here
> > > > > > > would go down at the same time. We don't presently have the
> > > > facilities for
> > > > > > > more than one off-site, but I think it's safe to rely on just one.
> > > > > > >
> > > > > > > The problem is this: When I take down the 4 internal nameservers
> > > > (when I
> > > > > > > say take down, I mean ndc stop, not just drop the zone), the 5th
> > > > nameserver
> > > > > > > outside responds just fine. However, I think most resolvers are
> > > > timing out
> > > > > > > before it does. Shouldn't the root servers respond faster than the
> > > > resolver
> > > > > > > times out? While the 4 are down, if you resolve something 10
> > times in a
> > > > > > > row, maybe 6 times it'll time out, and 4 times it'll resolve.
> > > > (assuming you
> > > > > > > resolve something different from the same zone each time .. not
> > > > > > caching/etc.).
> > > > > > >
> > > > > > > Is this a common problem? If all 4 of the internal nameservers
> > go down,
> > > > > > > will the 5th be of any use?
> > > > > > >
> > > > > > > I'd appreciate any insight you can give me, TIA.
> > > > > > >
> > > > > > > Best Regards.





More information about the bind-users mailing list