bind-8.2.2p5 stops responding

Tom Throckmorton tthrockmorton at hbs.edu
Tue Jan 4 20:58:00 UTC 2000


Dave,

I'm having a similiar problem w/ 8.2.2p5 on Sol 2.5.1, 2.6 and 7.  After some
time, resolution of only *some* external hosts will fail.  A restart seems to
remedy the problem.  Last time it happened (this morning =8^p) i did a dumpdb
and here's what i found:

(before the restart - resolving fails)

...
169053: djinteractive   29986   IN      NS      dns1.djinteractive.com.
;Cr=auth
169054:         29986   IN      NS      dns2.djinteractive.com. ;Cr=auth
169055:         29986   IN      A       207.50.249.32   ;Cr=auth
169056:         29986   IN      A       207.50.249.33   ;Cr=auth
169057:         29986   IN      A       207.50.249.34   ;Cr=auth
169058:         29986   IN      A       207.50.249.31   ;Cr=auth
...

(after the restart - resolving works again)

...
13336: djinteractive   86395   IN      NS      dns1.djinteractive.com.
;Cr=auth
13337:         86395   IN      NS      dns2.djinteractive.com. ;Cr=auth
...
13643: $ORIGIN djinteractive.com.
13644: dns2    86395   IN      A       207.50.249.2    ;Cr=addtnl
13645: dns1    86395   IN      A       207.50.249.1    ;Cr=addtnl
13646: www     86395   IN      A       207.50.249.34   ;Cr=auth
13647:         86395   IN      A       207.50.249.31   ;Cr=auth
13648:         86395   IN      A       207.50.249.32   ;Cr=auth
13649:         86395   IN      A       207.50.249.33   ;Cr=auth
...

It seems that the cache is dropping the A record for dns1 and dns2 (it has a
2d expiration), but for some reason, can't re-fetch those....hmmm <head
scratching>... one more thing, reverse lookups on the nameserver itself
(207.50.249.2) always fail, *and* the authority for the reverse domain is
different than for the forward.  Bingo, lameness!

So it looks as if the first time through, bind gets the addtnl info (A records
for the NS) back after following a normal resolution, then caches the answer
and will offer (non-authoritative) replies until expiry (2d), at which point
it tries to re-validate, but since it already has the NS, it tries to do a
reverse lookup on that name, then contact it directly, which always fails.

What's the solution?  Adjust the negative cache?  Contact them?  Anyone?

Dave Wreski wrote:

> Hi all.  I posted a message a week or so ago about bind-8.2.2p5 on Solaris
> 2.6 ceasing to respond to specific queries, and failing with "host unknown"
> preventing it from even falling over to another nameserver.
>
> It seems that if we have network problems to a specific domain, say, for
> example, yahoo.com, if a query is performed in the time the network
> connection is down, once it is brought back up, it can no longer resolve
> that domain until named is stopped and restarted.
>
> What could be the reason for this?
>
> Thanks,
> Dave

--
Tom Throckmorton
Harvard Business School
ITG, Network Operations Center
throck at hbs.edu





More information about the bind-users mailing list