Repeatable resolving failure

Scott Haneda lists at newgeo.com
Thu Oct 28 02:51:03 UTC 2004


Hello, I am having a tough time tracking down some DNS problems.  I am
relatively new to using bind and its options.  I am equally new to the use
of dig, I do however have a fair understanding of how DNS in general works.
Forgive me if I do not use the correct terminology.

Network Background
Several OS X servers running in a colo cabinet, I have full control of this
cabinet, the router is managed by the colo provider.  I am assured that as
of now for my testing the 2 machines I am testing are wide open in regards
to ports.

My "workstation" is connected via Comcast, though I have access to other
machines in other states as well.

Problem.
On my workstation, I have my DNS server, ns1.hostwizard.com listed in my
tcp/ip settings, it is this machine that use to resolve hosts locally.  When
I type in a hostname in my browser (Safari, but tested others) it thinks
about it for 10 seconds or so, then fails.  If I simply re-load that
hostname, it is then fine.

I have this issue when on my Comcast connection or if I TB2/VNC into my colo
machines and try it internally on the same subnet as the DNS server.  The
DNS server sees little load, 2-5 queries per second.

named -v
BIND 9.2.3
I am running this with QuickDNS as the front end to configuring it.

If I use a alternate NS in my tcp/ip settings, of course the issue goes
away, so using camcast's DHCP supplied NS's or some friends, always resolves
this issue.

As a test, I installed bind 9.2.3 on another server, I did not add any
zones, just turned it on, I then used that new servers IP as my tcp/ip DNS
setting, the problem persists as well.

I think I have ruled out it being a machine specific issue with that test.
I also think I have ruled out this being a Comcast issue as I have tested
this on various other networks and as long as my DNS is used, the problem
rears its head.

I seem to be able to load in Safari any websites I host just fine.  So of
the few hundred zones I have, those all seem to work out and resolve nice
and fast.


If I watch the bind logs, I can see sort of what is happening, as the
browser is looking up the hostname, I see the request for the A record come
in, and I see it attempted many times.

Oct 27 19:41:53.614 queries: info: client 24.5.47.136#50508: query:
www.redlightcafe.com IN A
Oct 27 19:41:54.325 queries: info: client 24.5.47.136#50508: query:
www.redlightcafe.com IN A
Oct 27 19:41:55.025 queries: info: client 24.5.47.136#50508: query:
www.redlightcafe.com IN A
Oct 27 19:41:55.731 queries: info: client 24.5.47.136#50508: query:
www.redlightcafe.com IN A
Oct 27 19:41:57.248 queries: info: client 24.5.47.136#50511: query:
www.redlightcafe.com IN A
Oct 27 19:41:57.726 queries: info: client 24.5.47.136#50512: query:
www.redlightcafe.com IN AAAA

At this point, the browser alerts me to trouble, a reload then reports this:
Oct 27 19:42:31.803 queries: info: client 24.5.47.136#50516: query:
www.redlightcafe.com IN A

At which point the page is rendered.

The only thing I have been able to learn about all this is the +trace option
to dig, which I think may lead to a answer, not entirely sure.

Rlc.net is a domain my DNS should have never seen before, this is tested via
me ssh'd into the alternate DNS machine I just set up, which gets no other
traffic at all.

dig rlc.net +trace

; <<>> DiG 9.2.2 <<>> rlc.net +trace
;; global options:  printcmd
.                       511574  IN      NS      E.ROOT-SERVERS.NET.
.                       511574  IN      NS      F.ROOT-SERVERS.NET.
.                       511574  IN      NS      G.ROOT-SERVERS.NET.
.                       511574  IN      NS      H.ROOT-SERVERS.NET.
.                       511574  IN      NS      I.ROOT-SERVERS.NET.
.                       511574  IN      NS      J.ROOT-SERVERS.NET.
.                       511574  IN      NS      K.ROOT-SERVERS.NET.
.                       511574  IN      NS      L.ROOT-SERVERS.NET.
.                       511574  IN      NS      M.ROOT-SERVERS.NET.
.                       511574  IN      NS      A.ROOT-SERVERS.NET.
.                       511574  IN      NS      B.ROOT-SERVERS.NET.
.                       511574  IN      NS      C.ROOT-SERVERS.NET.
.                       511574  IN      NS      D.ROOT-SERVERS.NET.
;; Received 340 bytes from 64.84.37.40#53(64.84.37.40) in 4 ms

net.                    172800  IN      NS      E.GTLD-SERVERS.net.
net.                    172800  IN      NS      F.GTLD-SERVERS.net.
net.                    172800  IN      NS      G.GTLD-SERVERS.net.
net.                    172800  IN      NS      H.GTLD-SERVERS.net.
net.                    172800  IN      NS      I.GTLD-SERVERS.net.
net.                    172800  IN      NS      J.GTLD-SERVERS.net.
net.                    172800  IN      NS      K.GTLD-SERVERS.net.
net.                    172800  IN      NS      L.GTLD-SERVERS.net.
net.                    172800  IN      NS      M.GTLD-SERVERS.net.
net.                    172800  IN      NS      A.GTLD-SERVERS.net.
net.                    172800  IN      NS      B.GTLD-SERVERS.net.
net.                    172800  IN      NS      C.GTLD-SERVERS.net.
net.                    172800  IN      NS      D.GTLD-SERVERS.net.
;; Received 510 bytes from 192.203.230.10#53(E.ROOT-SERVERS.NET) in 10 ms

rlc.net.                172800  IN      NS      ns1.musictoday.com.
rlc.net.                172800  IN      NS      ns1.rlcom.net.
rlc.net.                172800  IN      NS      ns2.musictoday.com.
rlc.net.                172800  IN      NS      ns2.rlcom.net.
;; Received 181 bytes from 192.12.94.30#53(E.GTLD-SERVERS.net) in 25 ms

dig: Couldn't find server 'ns1.musictoday.com': No address associated with
nodename
paris:~ haneda$ 

I seem to get repeated errors like this with the +trace option on the first
try, the second one normally works, a third try almost always works, and I
have never had it take more than four tries.

At the same time I ran the +trace above, I also tail -f'd on the bind logs
and got this:

Oct 27 19:45:30.959 queries: info: client 64.84.37.40#63231: query: . IN NS
Oct 27 19:45:30.970 queries: info: client 64.84.37.40#63232: query:
E.ROOT-SERVERS.NET IN A
Oct 27 19:45:30.994 queries: info: client 64.84.37.40#63233: query:
E.ROOT-SERVERS.NET IN A
Oct 27 19:45:30.996 queries: info: client 64.84.37.40#63234: query:
E.ROOT-SERVERS.NET IN AAAA
Oct 27 19:45:31.006 queries: info: client 64.84.37.40#63235: query:
E.ROOT-SERVERS.NET.hostwizard.com IN AAAA
Oct 27 19:45:31.071 queries: info: client 64.84.37.40#63237: query:
E.GTLD-SERVERS.net IN A
Oct 27 19:45:31.092 queries: info: client 64.84.37.40#63238: query:
E.GTLD-SERVERS.net IN A
Oct 27 19:45:31.095 queries: info: client 64.84.37.40#63239: query:
E.GTLD-SERVERS.net IN AAAA
Oct 27 19:45:31.113 queries: info: client 64.84.37.40#63240: query:
E.GTLD-SERVERS.net.hostwizard.com IN AAAA
Oct 27 19:45:31.168 queries: info: client 64.84.37.40#63242: query:
ns1.musictoday.com IN A
Oct 27 19:45:31.873 queries: info: client 64.84.37.40#63242: query:
ns1.musictoday.com IN A
Oct 27 19:45:32.579 queries: info: client 64.84.37.40#63242: query:
ns1.musictoday.com IN A
Oct 27 19:45:33.284 queries: info: client 64.84.37.40#63242: query:
ns1.musictoday.com IN A
Oct 27 19:45:33.990 queries: info: client 64.84.37.40#63243: query:
ns1.musictoday.com.hostwizard.com IN A
Oct 27 19:45:34.025 queries: info: client 64.84.37.40#63244: query:
ns1.musictoday.com IN A
Oct 27 19:45:34.733 queries: info: client 64.84.37.40#63244: query:
ns1.musictoday.com IN A
Oct 27 19:45:35.269 queries: info: client 64.84.37.40#63245: query:
ns1.musictoday.com IN AAAA
Oct 27 19:45:35.974 queries: info: client 64.84.37.40#63245: query:
ns1.musictoday.com IN AAAA
Oct 27 19:45:36.680 queries: info: client 64.84.37.40#63245: query:
ns1.musictoday.com IN AAAA
Oct 27 19:45:37.396 queries: info: client 64.84.37.40#63245: query:
ns1.musictoday.com IN AAAA
Oct 27 19:45:38.102 queries: info: client 64.84.37.40#63246: query:
ns1.musictoday.com.hostwizard.com IN AAAA

Some of it I just don't get, like why did it try to resolve
ns1.musictoday.com.hostwizard.com, since I never asked that and that is not
a hostname I would ever deal with.

At this point, I am at a 100% loss as to what the issue is and how to fix
it.  I have had about 10 clients that I allow to use my DNS for local
lookups that were essentially knocked offline as a result of this.  I have
since had them fall back on comcast, verizon etc for NS's, however, these
people often make many DNS changes and what to see those changes quickly,
which is why I generally add them to my allow list of people who are able to
use my DNS for queries.

Sorry for the length of this email, I felt it would help to give as much
detail as possible, hopefully one of you can shed some insight into what is
happening.

Thanks again
-- 
-------------------------------------------------------------
Scott Haneda                                Tel: 415.898.2602
<http://www.newgeo.com>                     Fax: 313.557.5052
<scott at newgeo.com>                          Novato, CA U.S.A.




More information about the bind-users mailing list