Caching only nameserver fails to resolve external zones periodically

phn at icke-reklam.ipsec.nu phn at icke-reklam.ipsec.nu
Tue May 18 05:47:49 UTC 2004


Curtis Rempel <curtis at telus.net> wrote:
> On Mon, 17 May 2004 18:07:17 +0000, phn wrote:

>> Curtis Rempel <curtis at telus.net> wrote:
>>> Hi,
>> 
>>> I've got a caching name server which also handles a zone (.lan) on an
>>> internal 192.168.1.0/24 network.   Both internal and external lookups work
>>> fine as I have a forwarder entry defined in 
>>> /var/named/chroot/etc/named.conf
>> 
>>> That is, until "something" happens which causes the external lookups to
>>> fail.  The internal zone resolution still works, however, it seems as far
>>> as I can tell, that the forwarder entry does not respond and then it
>>> starts crawling through the root name servers and eventually gives up.
>> 
>>> Here's some sample output (from Fedora Core 1 Linux and bind 9.2.2.P3-9
>> 
>>> When everything is working (i.e. immediately after a 'service named
>>> restart' command), the following 'host' command works.  However, when
>>> things aren't working, I get the following output:
>> 
>>> [root at vault root]# host www.telus.net
>>> ;; connection timed out; no servers could be reached
>> 
>>> This can be rectified by restarting the name server as above, but only for
>>> awhile (which seems to vary), and then external lookups hang again.  The
>>> internal zone information can still be resolved.
>> 
>>> When the system is not responding to external zone lookups, a tcpdump
>>> looks like this with the above 'host' command:
>> 
>>> 15:51:01.996338 vault.lan.33305 > ns7so.cg.shawcable.net.domain:  35946+ [1au] A? www.telus.net. (42) (DF)
>>> 15:51:03.728476 vault.lan.33305 > f.root-servers.net.domain:  50741 PTR? 182.181.179.142.in-addr.arpa. (46) (DF)
>>> 15:51:06.008121 vault.lan.33305 > 198.41.0.4.domain:  14024 [1au] A? www.telus.net. (42) (DF)
>>> 15:51:07.747854 vault.lan.33305 > G.ROOT-SERVERS.NET.domain:  52631 PTR? 182.181.179.142.in-addr.arpa. (46) (DF)
>>> 15:51:10.027489 vault.lan.33305 > 128.9.0.107.domain:  65124 [1au] A? www.telus.net. (42) (DF)
>>> 15:51:11.767237 vault.lan.33305 > 128.63.2.53.domain:  65468 PTR? 182.181.179.142.in-addr.arpa. (46) (DF)
>>> 15:51:14.046919 vault.lan.33305 > 192.33.4.12.domain:  65502 A? www.telus.net. (31) (DF)
>>> 15:51:15.786573 vault.lan.33305 > 192.36.148.17.domain:  32751 PTR? 182.181.179.142.in-addr.arpa. (46) (DF)
>>> 15:51:18.066210 vault.lan.33305 > d.root-servers.net.domain:  55260 A? www.telus.net. (31) (DF)
>>> 15:51:19.038994 laser.lan.1024 > vault.lan.domain:  27316 A? fsa.cpsc.ucalgary.ca. (50)
>>> 15:51:19.805969 vault.lan.33305 > k.root-servers.net.domain:  13778 PTR? 182.181.179.142.in-addr.arpa. (46) (DF)
>>> 15:51:22.085587 vault.lan.33305 > E.ROOT-SERVERS.NET.domain:  3376 A? www.telus.net. (31) (DF)
>>> 15:51:23.825310 vault.lan.33305 > 202.12.27.33.domain:  1688 PTR? 182.181.179.142.in-addr.arpa. (46) (DF)
>>> 15:51:26.104947 vault.lan.33305 > f.root-servers.net.domain:  844 A? www.telus.net. (31) (DF)
>>> 15:51:27.844754 vault.lan.33305 > j.root-servers.net.domain:  33190 PTR? 182.181.179.142.in-addr.arpa. (46) (DF)
>>> 15:51:30.124317 vault.lan.33305 > G.ROOT-SERVERS.NET.domain:  49363 A? www.telus.net. (31) (DF)
>>> 15:51:31.864043 vault.lan.33305 > l.root-servers.net.domain:  18756 PTR? 182.181.179.142.in-addr.arpa. (46) (DF)
>>> 15:51:34.143694 vault.lan.33305 > 128.63.2.53.domain:  4724 A? www.telus.net. (31) (DF)
>>> 15:51:35.883596 vault.lan.33305 > ns7so.cg.shawcable.net.domain:  2362+ PTR? 182.181.179.142.in-addr.arpa. (46) (DF)
>>> 15:51:38.163051 vault.lan.33305 > 192.36.148.17.domain:  1181 A? www.telus.net. (31) (DF)
>>> 15:51:40.902620 vault.lan.33305 > 198.41.0.4.domain:  24263 PTR? 182.181.179.142.in-addr.arpa. (46) (DF)
>>> 15:51:42.182418 vault.lan.33305 > k.root-servers.net.domain:  22529 A? www.telus.net. (31) (DF)
>> 
>>> The first entry above (15:51:01) indicates that the requested is being
>>> forwarded to the "forwarders" entry which resolves to
>>> ns7so.cg.shawcable.net
>> 
>>> When external resolution is working, this is the last entry as
>>> ns7so.cg.shawcable.net provides the answer.
>> 
>>> In a "hung" lookup, the output is above, first stop is the forwarder entry
>>> and then the root servers and finally failure.
>> 
>>> Does anybody have any idea why this external name resolution is
>>> periodically failing like this?  Any suggestions for debugging info?
>> 
>>> It seems that external lookups can function fine for days and then quit,
>>> sometimes only minutes and then quit.
>> 
>>> Thanks!
>> 
>>> curtis at telus dot net (which the smarter spambots can likely figure out
>>> anyway...)
>> 
>> I see three issues here :
>> 
>> 1/ the zone "telus.net" is badly configured on a number of issues ( where mismatch
>> between nameservers delegated to and the list of nameservers the servers say),
>> very short ttl on NS records etc.
>> 
>> 2/ you are running a beta-version of bind. Why ? 9.2.3 has been available for
>> a long time.
>> 
>> 3/ you state that you use forwarders. Why ? Failiure of the forwarders might
>> give the behaviour you observe.

> Thanks for your reply.

> 1/ - telus.net is only one zone I happened to use for the example.  As it
> turns out, any external zone lookup fails.
Even if it's unlikley that the faults in telus.net affects your specific
problems, you are suposed to fix ít. Same goes with the other zones.


> 2/ - 9.2.2.P3-9 is what was "out of the box" on Fedora Core 1 and is the
> latest according to yum and rpmfind.net (latest RPM that is).  I am a
> little hesitant to download/compile bind from source for the latest, I
> would rather keep everything RPM if possible.
I don't care if it's "out-of-the-box" from a vendor, it's obsolete.

Bringing down comiling & anstalling bind is an easy and painless thing. The 
only thing you have to think about is wheer the vebndors binaries are located
so you can overwrite them.

> 3/ - This is my prime suspicion - that the forwarder IP is failing.
> However, here is some additional information I've since discovered: once
> named gets into this failed state where the host command does not respond
> correctly (i.e. it returns 'no servers could be reached'), I can specify
> the IP of the forwarder on the 'host' command as follows and query the
> forwarder directly and all works:

Have you removed the "forwarder" lines ? Seems to be a logical test to do .


> # host telus.net  64.59.135.133
> Using domain server:
> Name: 64.59.135.133
> Address: 64.59.135.133#53
> Aliases:
>  
> www.telus.net is an alias for cityweb.telus.net.
> cityweb.telus.net has address 198.161.157.214

> If I then use 'host telus.net' immediately after, it still fails.  The
> only way to get the caching name server working again is a 'service named
> restart'

> So, perhaps, the forwarder is not faulty at all but maybe bind is?

> If that suspicion is true, can you suggest some sort of logging I might
> enable to see if in fact bind is falling over?

> Thanks!


-- 
Peter Håkanson         
        IPSec  Sverige      ( At Gothenburg Riverside )
           Sorry about my e-mail address, but i'm trying to keep spam out,
	   remove "icke-reklam" if you feel for mailing me. Thanx.


More information about the bind-users mailing list