Root zone timeout and workarounds?

Tue Feb 20 02:00:15 UTC 2001

When you say the "resolvers" are timing out, do you mean caching nameservers
doing recursive lookups, or do you mean stub resolvers? Stub resolvers will by
default try the queries in the order of their nameserver list. If you have the 4
colocated servers first in that list, followed by the remote server, then I'm not
surprised that the queries are timing out. Perhaps you should consider putting
the remote server second or third in the list to reduce the possibility of
timeout. In some versions of BIND 8 there was a "rotate" resolver option which
would cause the stub resolver to rotate the nameserver list for each query. But
that option appears to be gone as of BIND 9, so I wouldn't rely on it.

If the "resolvers" to which you referred are caching nameservers, then I don't
understand why you're seeing the behavior you're seeing -- caching nameservers
should quickly penalize "down" servers and home in on the remaining "up" server,
even if it historically responds to queries much more slowly than the others. I'm
assuming, however, that all 5 of the servers are listed in the NS records for
each of the zones and that the remote nameserver is, in fact, responding
authoritatively for all of the zones it should be.

To further troubleshoot a lack of adaptive behavior between your caching
nameservers and your authoritative nameservers, you may have to crank up
debugging while in failure mode and laboriously interpret the debugging output.

Another thing just occurred to me: are these "resolvers" by any chance
*forwarding* servers running pre-BIND-8.2.3 code? There were shortcomings in the
forwarding code prior to BIND 8.2.3...

- Kevin

denon wrote:

> I've been digging through the archives, usenet as well as a variety of
> other tech docs in search of the answer for my question.  I haven't come up
> with any results, but if this is a "frequently asked question", please
> don't be afraid to throw me to a url.
>
> Here's the situation we've got:  I have a situation, where I've got the
> need for a relatively highly redundant dns system (who doesn't? :). On an
> Internet domain, as a test, I've listed 5 nameservers. One of the
> nameservers is at a remote location, and the other 4 are at various places
> within our internal network.  Due to the fact that the internal network is
> all geographically in the same area, there's a "good chance" all 4 here
> would go down at the same time. We don't presently have the facilities for
> more than one off-site, but I think it's safe to rely on just one.
>
> The problem is this: When I take down the 4 internal nameservers (when I
> say take down, I mean ndc stop, not just drop the zone), the 5th nameserver
> outside responds just fine. However, I think most resolvers are timing out
> before it does. Shouldn't the root servers respond faster than the resolver
> times out? While the 4 are down, if you resolve something 10 times in a
> row, maybe 6 times it'll time out, and 4 times it'll resolve. (assuming you
> resolve something different from the same zone each time .. not caching/etc.).
>
> Is this a common problem? If all 4 of the internal nameservers go down,
> will the 5th be of any use?
>
> I'd appreciate any insight you can give me, TIA.
>
> Best Regards.