nameserver fails to complete forwarded requests

Wed Oct 25 23:33:27 UTC 2000

I think this is just an artifact of BIND 8's lack of "query restart". In
simplistic terms, what this means is that named loses track of what answers
belong to what queries, and so it just relies on the client to re-send a
fresh query, at which time it can be answered directly from cache.
Inefficient, yes, but implementing query restart would have required a
herculean effort, given how convoluted the BIND 8 code is/was.

BIND 9's much-improved architecture supposedly implements query restart.

- Kevin

bobthepooch at my-deja.com wrote:

> I have the following name resolution topology:
>
> [Host] queries-> [Internal Nameserver (INS)] forwards-> [External
> Nameserver (ENS)] queries-> Internet
>
> INS=Solaris 2.6 with ISC Bind 8.2.2-P5
> ENS=Solaris 7 with ISC Bind 8.2.2-P5 (I've tried a solaris 2.6 box also)
>
> Occasionally the ENS appears to "forget" forwarded requests.  Using
> snoop, I determined the following:
>
> o Host sends A record query to INS
> o INS forwards to ENS
> o ENS sends request to root nameserver
> o root nameserver sends answer with 3 authoritative nameservers (and
> associated A records)
> o ENS sends to root nameserver A record requests for each of the 3
> nameservers
> o root nameserver sends answers to each of the 3 A record requests
>
> This all happens in less than a second.  But then the ENS does *NOTHING*
> regarding the original query.  I see no query sent to any of the 3
> authoritative nameservers.
>
> After about 59 seconds, the INS resends the request (same DNS ID).  The
> ENS then very promptly sends the request to one of the authoritative
> nameservers (learned a minute before!), receives an answer, and relays
> that back to the INS (which in turn sends a response back to the client
> host.)  By now, however, the application on the client host has given up
> waiting for an answer and has indicated that the hostname was not found.
>
> I have summarized some of the "turn-around" times of the ENS, and a
> majority are < 1 second, but many are ~59 seconds, meaning that it
> didn't respond until poked again by the INS.  I also have seen many in
> the 15-30 second range, which appear to be users retrying the website
> after receiving "host not found".  This forces a query from the INS to
> the ENS sooner than the normal timeout, and an almost immediate answer
> is given.
>
> I originally thought that the problem had to do with root nameservers,
> but if I remove forwarders from the INS configuration (so that it hits
> root nameservers directly), then the timeouts disappear entirely (and my
> users are much happier.)  And the packet capture is the real smoking
> gun.
>
> So why are the forward requests just dropped by the ENS???
>
> If I point my clients directly to the ENS, it also works, but this is
> because the original client is more persistent (retries after 1, 2, and
> 4 seconds in the case of NT) than a forwarding nameserver.  (I
> determined this from packet captures as well.)  The INS doesn't forward
> queries retransmitted from the client to the ENS, but instead sticks to
> its built-in timeouts for forwarding.  But when a client resolves
> directly from the ENS, it keeps querying several times in a short
> interval until it gets an answer.