Bind 9.4.2 not resolving one domain

Chris Buxton cbuxton at menandmice.com
Thu Sep 4 21:39:47 UTC 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Sep 4, 2008, at 12:23 PM, caio wrote:
> and here the result of:
>
> # rndc flush
> # (10 secs)
> # dig @dns2.mydomain.com www.yahoo.com.ar +time=20
[...]
> ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 36748

That's interesting. So it's maybe not just a simple performance  
problem (a generally slow link to somewhere). I notice also that the  
SERVFAIL response was returned after almost 20 seconds.

> And after 2 minutes, I threw 2 parallels dig (with +norec, and  
> rec)..., and I do not know how can I explain it.., but both returns  
> successfull results.., but with differents query times...

That's perfectly normal. The recursive query has to wait for the  
server to finish looking up the data, which in this case took a  
reasonable amount of time (174 ms). The nonrecursive query only  
returns what is in cache, which returns in 0 ms (after rounding).

> If everything goes 'normally' the primary name server in a while  
> will start to fails, and the secondary name server will keep  
> resolving well..


That's interesting. It sounds like you're seeing intermittent, but  
relatively frequent, outages to one set of authoritative name servers  
or another that need to be queried to look up the name.

named will wait for several seconds for a response from an  
authoritative name server, although after a short delay it will start  
querying the other name servers in that set. This can therefore mean  
several parallel queries outstanding. I believe the timeout for any  
outbound query is 5 seconds, although I could easily be wrong.

The fact that you got a SERVFAIL after almost 20 seconds means that  
several steps (probably at least 3) in the chain were slow, and one  
step was very slow - too slow for named's patience.

You should run a packet sniffer on the server, or on a monitoring port  
of a switch that outbound queries would traverse, and keep trying  
until you get another such SERVFAIL. Then examine the packet log to  
see what remote servers are not answering. Something like dnscap would  
probably be ideal for this, although presently the dns-oarc.net server  
is down. When the website is working, you can find info here:
https://www.dns-oarc.net/tools/dnscap

More generally, a solution that can graph query response times and  
failed queries might be interesting to you. For example, Men & Mice  
makes a commercial solution for this, the Men & Mice DNS Performance  
Monitor. Contact me off-list if you are interested.

Chris Buxton
Professional Services
Men & Mice

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAkjAVaMACgkQ0p/8Jp6Boi14lQCfeLoj3BM/MnJOl/2ncLaPwr/v
UCIAoIynq1DCuPTinVY/1vPjLR6aphZH
=Kdo6
-----END PGP SIGNATURE-----


More information about the bind-users mailing list