bind-8.2.2p5 stops responding
Tom Throckmorton
tthrockmorton at hbs.edu
Tue Jan 4 20:58:00 UTC 2000
Dave,
I'm having a similiar problem w/ 8.2.2p5 on Sol 2.5.1, 2.6 and 7. After some
time, resolution of only *some* external hosts will fail. A restart seems to
remedy the problem. Last time it happened (this morning =8^p) i did a dumpdb
and here's what i found:
(before the restart - resolving fails)
...
169053: djinteractive 29986 IN NS dns1.djinteractive.com.
;Cr=auth
169054: 29986 IN NS dns2.djinteractive.com. ;Cr=auth
169055: 29986 IN A 207.50.249.32 ;Cr=auth
169056: 29986 IN A 207.50.249.33 ;Cr=auth
169057: 29986 IN A 207.50.249.34 ;Cr=auth
169058: 29986 IN A 207.50.249.31 ;Cr=auth
...
(after the restart - resolving works again)
...
13336: djinteractive 86395 IN NS dns1.djinteractive.com.
;Cr=auth
13337: 86395 IN NS dns2.djinteractive.com. ;Cr=auth
...
13643: $ORIGIN djinteractive.com.
13644: dns2 86395 IN A 207.50.249.2 ;Cr=addtnl
13645: dns1 86395 IN A 207.50.249.1 ;Cr=addtnl
13646: www 86395 IN A 207.50.249.34 ;Cr=auth
13647: 86395 IN A 207.50.249.31 ;Cr=auth
13648: 86395 IN A 207.50.249.32 ;Cr=auth
13649: 86395 IN A 207.50.249.33 ;Cr=auth
...
It seems that the cache is dropping the A record for dns1 and dns2 (it has a
2d expiration), but for some reason, can't re-fetch those....hmmm <head
scratching>... one more thing, reverse lookups on the nameserver itself
(207.50.249.2) always fail, *and* the authority for the reverse domain is
different than for the forward. Bingo, lameness!
So it looks as if the first time through, bind gets the addtnl info (A records
for the NS) back after following a normal resolution, then caches the answer
and will offer (non-authoritative) replies until expiry (2d), at which point
it tries to re-validate, but since it already has the NS, it tries to do a
reverse lookup on that name, then contact it directly, which always fails.
What's the solution? Adjust the negative cache? Contact them? Anyone?
Dave Wreski wrote:
> Hi all. I posted a message a week or so ago about bind-8.2.2p5 on Solaris
> 2.6 ceasing to respond to specific queries, and failing with "host unknown"
> preventing it from even falling over to another nameserver.
>
> It seems that if we have network problems to a specific domain, say, for
> example, yahoo.com, if a query is performed in the time the network
> connection is down, once it is brought back up, it can no longer resolve
> that domain until named is stopped and restarted.
>
> What could be the reason for this?
>
> Thanks,
> Dave
--
Tom Throckmorton
Harvard Business School
ITG, Network Operations Center
throck at hbs.edu
More information about the bind-users
mailing list