Caching only nameserver fails to resolve external zones periodically

phn at icke-reklam.ipsec.nu phn at icke-reklam.ipsec.nu
Mon May 17 18:07:17 UTC 2004


Curtis Rempel <curtis at telus.net> wrote:
> Hi,

> I've got a caching name server which also handles a zone (.lan) on an
> internal 192.168.1.0/24 network.   Both internal and external lookups work
> fine as I have a forwarder entry defined in 
> /var/named/chroot/etc/named.conf

> That is, until "something" happens which causes the external lookups to
> fail.  The internal zone resolution still works, however, it seems as far
> as I can tell, that the forwarder entry does not respond and then it
> starts crawling through the root name servers and eventually gives up.

> Here's some sample output (from Fedora Core 1 Linux and bind 9.2.2.P3-9

> When everything is working (i.e. immediately after a 'service named
> restart' command), the following 'host' command works.  However, when
> things aren't working, I get the following output:

> [root at vault root]# host www.telus.net
> ;; connection timed out; no servers could be reached

> This can be rectified by restarting the name server as above, but only for
> awhile (which seems to vary), and then external lookups hang again.  The
> internal zone information can still be resolved.

> When the system is not responding to external zone lookups, a tcpdump
> looks like this with the above 'host' command:

> 15:51:01.996338 vault.lan.33305 > ns7so.cg.shawcable.net.domain:  35946+ [1au] A? www.telus.net. (42) (DF)
> 15:51:03.728476 vault.lan.33305 > f.root-servers.net.domain:  50741 PTR? 182.181.179.142.in-addr.arpa. (46) (DF)
> 15:51:06.008121 vault.lan.33305 > 198.41.0.4.domain:  14024 [1au] A? www.telus.net. (42) (DF)
> 15:51:07.747854 vault.lan.33305 > G.ROOT-SERVERS.NET.domain:  52631 PTR? 182.181.179.142.in-addr.arpa. (46) (DF)
> 15:51:10.027489 vault.lan.33305 > 128.9.0.107.domain:  65124 [1au] A? www.telus.net. (42) (DF)
> 15:51:11.767237 vault.lan.33305 > 128.63.2.53.domain:  65468 PTR? 182.181.179.142.in-addr.arpa. (46) (DF)
> 15:51:14.046919 vault.lan.33305 > 192.33.4.12.domain:  65502 A? www.telus.net. (31) (DF)
> 15:51:15.786573 vault.lan.33305 > 192.36.148.17.domain:  32751 PTR? 182.181.179.142.in-addr.arpa. (46) (DF)
> 15:51:18.066210 vault.lan.33305 > d.root-servers.net.domain:  55260 A? www.telus.net. (31) (DF)
> 15:51:19.038994 laser.lan.1024 > vault.lan.domain:  27316 A? fsa.cpsc.ucalgary.ca. (50)
> 15:51:19.805969 vault.lan.33305 > k.root-servers.net.domain:  13778 PTR? 182.181.179.142.in-addr.arpa. (46) (DF)
> 15:51:22.085587 vault.lan.33305 > E.ROOT-SERVERS.NET.domain:  3376 A? www.telus.net. (31) (DF)
> 15:51:23.825310 vault.lan.33305 > 202.12.27.33.domain:  1688 PTR? 182.181.179.142.in-addr.arpa. (46) (DF)
> 15:51:26.104947 vault.lan.33305 > f.root-servers.net.domain:  844 A? www.telus.net. (31) (DF)
> 15:51:27.844754 vault.lan.33305 > j.root-servers.net.domain:  33190 PTR? 182.181.179.142.in-addr.arpa. (46) (DF)
> 15:51:30.124317 vault.lan.33305 > G.ROOT-SERVERS.NET.domain:  49363 A? www.telus.net. (31) (DF)
> 15:51:31.864043 vault.lan.33305 > l.root-servers.net.domain:  18756 PTR? 182.181.179.142.in-addr.arpa. (46) (DF)
> 15:51:34.143694 vault.lan.33305 > 128.63.2.53.domain:  4724 A? www.telus.net. (31) (DF)
> 15:51:35.883596 vault.lan.33305 > ns7so.cg.shawcable.net.domain:  2362+ PTR? 182.181.179.142.in-addr.arpa. (46) (DF)
> 15:51:38.163051 vault.lan.33305 > 192.36.148.17.domain:  1181 A? www.telus.net. (31) (DF)
> 15:51:40.902620 vault.lan.33305 > 198.41.0.4.domain:  24263 PTR? 182.181.179.142.in-addr.arpa. (46) (DF)
> 15:51:42.182418 vault.lan.33305 > k.root-servers.net.domain:  22529 A? www.telus.net. (31) (DF)

> The first entry above (15:51:01) indicates that the requested is being
> forwarded to the "forwarders" entry which resolves to
> ns7so.cg.shawcable.net

> When external resolution is working, this is the last entry as
> ns7so.cg.shawcable.net provides the answer.

> In a "hung" lookup, the output is above, first stop is the forwarder entry
> and then the root servers and finally failure.

> Does anybody have any idea why this external name resolution is
> periodically failing like this?  Any suggestions for debugging info?

> It seems that external lookups can function fine for days and then quit,
> sometimes only minutes and then quit.

> Thanks!

> curtis at telus dot net (which the smarter spambots can likely figure out
> anyway...)

I see three issues here :

1/ the zone "telus.net" is badly configured on a number of issues ( where mismatch
between nameservers delegated to and the list of nameservers the servers say),
very short ttl on NS records etc.

2/ you are running a beta-version of bind. Why ? 9.2.3 has been available for
a long time.

3/ you state that you use forwarders. Why ? Failiure of the forwarders might
give the behaviour you observe.



-- 
Peter Håkanson         
        IPSec  Sverige      ( At Gothenburg Riverside )
           Sorry about my e-mail address, but i'm trying to keep spam out,
	   remove "icke-reklam" if you feel for mailing me. Thanx.


More information about the bind-users mailing list