Bind 9.4.2 not resolving one domain
Chris Buxton
cbuxton at menandmice.com
Thu Sep 4 21:39:47 UTC 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Sep 4, 2008, at 12:23 PM, caio wrote:
> and here the result of:
>
> # rndc flush
> # (10 secs)
> # dig @dns2.mydomain.com www.yahoo.com.ar +time=20
[...]
> ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 36748
That's interesting. So it's maybe not just a simple performance
problem (a generally slow link to somewhere). I notice also that the
SERVFAIL response was returned after almost 20 seconds.
> And after 2 minutes, I threw 2 parallels dig (with +norec, and
> rec)..., and I do not know how can I explain it.., but both returns
> successfull results.., but with differents query times...
That's perfectly normal. The recursive query has to wait for the
server to finish looking up the data, which in this case took a
reasonable amount of time (174 ms). The nonrecursive query only
returns what is in cache, which returns in 0 ms (after rounding).
> If everything goes 'normally' the primary name server in a while
> will start to fails, and the secondary name server will keep
> resolving well..
That's interesting. It sounds like you're seeing intermittent, but
relatively frequent, outages to one set of authoritative name servers
or another that need to be queried to look up the name.
named will wait for several seconds for a response from an
authoritative name server, although after a short delay it will start
querying the other name servers in that set. This can therefore mean
several parallel queries outstanding. I believe the timeout for any
outbound query is 5 seconds, although I could easily be wrong.
The fact that you got a SERVFAIL after almost 20 seconds means that
several steps (probably at least 3) in the chain were slow, and one
step was very slow - too slow for named's patience.
You should run a packet sniffer on the server, or on a monitoring port
of a switch that outbound queries would traverse, and keep trying
until you get another such SERVFAIL. Then examine the packet log to
see what remote servers are not answering. Something like dnscap would
probably be ideal for this, although presently the dns-oarc.net server
is down. When the website is working, you can find info here:
https://www.dns-oarc.net/tools/dnscap
More generally, a solution that can graph query response times and
failed queries might be interesting to you. For example, Men & Mice
makes a commercial solution for this, the Men & Mice DNS Performance
Monitor. Contact me off-list if you are interested.
Chris Buxton
Professional Services
Men & Mice
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
iEYEARECAAYFAkjAVaMACgkQ0p/8Jp6Boi14lQCfeLoj3BM/MnJOl/2ncLaPwr/v
UCIAoIynq1DCuPTinVY/1vPjLR6aphZH
=Kdo6
-----END PGP SIGNATURE-----
More information about the bind-users
mailing list