Help with unresolvable domain (subdomain, actually)
Kevin Darcy
kcd at chrysler.com
Tue Mar 1 22:25:44 UTC 2011
I got a trouble ticket on this too.
From the looks of things, Cisco is using GSSes to load-balance this
site. GSSes return SERVFAIL if all of the resources behind the
load-balancer are down (which it determines via a heartbeat mechanism).
So I think this is a "simple" case of a website (or cluster) going down.
It was down earlier today, then up again, as of this writing, it is down
again.
DNS doesn't really have a response code of "requested resource not
available", so SERVFAIL is Cisco's closest approximation. It has the
drawback, however, of often making other sorts of problems appear to be
DNS problems. That's just a cross that we DNS admins have to bear...
- Kevin
On 3/1/2011 4:08 PM, Mike Bernhardt wrote:
> I should add that tools.cisco.com was resolvable at one time, so either
> Cisco's behavior has changed, or our firewall's behavior has changed. We
> obviously haven't upgraded our BIND version in a while (9.4.3P3), so I don't
> think the problem is BIND.
>
> -----Original Message-----
> From: Mike Bernhardt [mailto:bernhardt at bart.gov]
> Sent: Tuesday, March 01, 2011 12:40 PM
> To: bind-users at lists.isc.org
> Subject: Help with unresolvable domain (subdomain, actually)
>
> For some reason, we can no longer resolve tools.cisco.com. there are several
> clues to the problem but I can't put them together. Here is some dig output.
> I know that the time stamps don't all match up below, but the results are
> typical:
>
> [root at ns1 ~]# dig +trace -b 148.165.3.10 tools.cisco.com
>
> ;<<>> DiG 9.4.3-P3<<>> +trace -b 148.165.3.10 tools.cisco.com
> ;; global options: printcmd
> . 90550 IN NS i.root-servers.net.
> . 90550 IN NS h.root-servers.net.
> . 90550 IN NS e.root-servers.net.
> . 90550 IN NS d.root-servers.net.
> . 90550 IN NS j.root-servers.net.
> . 90550 IN NS k.root-servers.net.
> . 90550 IN NS l.root-servers.net.
> . 90550 IN NS g.root-servers.net.
> . 90550 IN NS f.root-servers.net.
> . 90550 IN NS a.root-servers.net.
> . 90550 IN NS m.root-servers.net.
> . 90550 IN NS c.root-servers.net.
> . 90550 IN NS b.root-servers.net.
> ;; Received 512 bytes from 148.165.3.10#53(148.165.3.10) in 0 ms
>
> com. 172800 IN NS l.gtld-servers.net.
> com. 172800 IN NS e.gtld-servers.net.
> com. 172800 IN NS k.gtld-servers.net.
> com. 172800 IN NS i.gtld-servers.net.
> com. 172800 IN NS m.gtld-servers.net.
> com. 172800 IN NS j.gtld-servers.net.
> com. 172800 IN NS a.gtld-servers.net.
> com. 172800 IN NS g.gtld-servers.net.
> com. 172800 IN NS c.gtld-servers.net.
> com. 172800 IN NS f.gtld-servers.net.
> com. 172800 IN NS b.gtld-servers.net.
> com. 172800 IN NS d.gtld-servers.net.
> com. 172800 IN NS h.gtld-servers.net.
> ;; Received 505 bytes from 198.41.0.4#53(a.root-servers.net) in 13 ms
>
> cisco.com. 172800 IN NS ns1.cisco.com.
> cisco.com. 172800 IN NS ns2.cisco.com.
> ;; Received 101 bytes from 192.54.112.30#53(h.gtld-servers.net) in 154 ms
>
> tools.cisco.com. 86400 IN NS
> rcdn9-14p-dcz05n-gss1.cisco.com.
> tools.cisco.com. 86400 IN NS rtp5-dmz-gss1.cisco.com.
> tools.cisco.com. 86400 IN NS sjck-dmz-gss1.cisco.com.
> tools.cisco.com. 86400 IN NS
> cax01-bb14-dcz01n-gss1.cisco.com.
> ;; Received 226 bytes from 64.102.255.44#53(ns2.cisco.com) in 75 ms
>
> ;; Received 33 bytes from 72.163.4.28#53(rcdn9-14p-dcz05n-gss1.cisco.com) in
> 47 ms
>
> Now, focusing in on rtp5-dmz-gss1.cisco.com for further analysis (just
> picked it out of the group):
> [root at ns1 ~]# dig -b 148.165.3.10 @rtp5-dmz-gss1.cisco.com tools.cisco.com
>
> ;<<>> DiG 9.4.3-P3<<>> -b 148.165.3.10 @rtp5-dmz-gss1.cisco.com
> tools.cisco.com
> ; (1 server found)
> ;; global options: printcmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 5165
> ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
> ;; WARNING: recursion requested but not available
>
> ;; QUESTION SECTION:
> ;tools.cisco.com. IN A
>
> ;; Query time: 75 msec
> ;; SERVER: 64.102.246.5#53(64.102.246.5)
> ;; WHEN: Tue Mar 1 12:22:57 2011
> ;; MSG SIZE rcvd: 33
>
>
> Here is the output of tcpdump on my server, querying the same server via
> nslookup elsewhere:
> [root at ns1 ~]# tcpdump host -i bond0 64.102.246.5 -n -p -vvv
> tcpdump: listening on bond0, link-type EN10MB (Ethernet), capture size 96
> bytes
> 12:14:53.373614 IP (tos 0x0, ttl 64, id 45237, offset 0, flags [none],
> proto: UDP (17), length: 61) 148.165.3.10.18673> 64.102.246.5.domain: [bad
> udp cksum a78b!] 26095 A? tools.cisco.com. (33)
> 12:14:53.455684 IP (tos 0x0, ttl 54, id 7623, offset 0, flags [DF], proto:
> UDP (17), length: 61) 64.102.246.5.domain> 148.165.3.10.18673: [udp sum ok]
> 26095 ServFail- q: A? tools.cisco.com. 0/0/0 (33)
>
> Lastly, I see on our firewall log that we have a Checkpoint Smart Defense
> log entry due to it's belief that Cisco is sending us a malformed query
> packet, and it's being dropped. I don't know why they're sending the query
> in the first place.
> Number: 2595791
> Date: 1Mar2011
> Time: 12:22:53
> Type: Log
> Action: Drop
> Service: domain-udp (53)
> Source Port: domain-udp
> Source: rtp5-dmz-gss1.cisco.com
> Destination: ns
> Protocol: udp
> Information: Packet info: Packet data size: 28
> Attack: Malformed Packet
> Attack Information: UDP length error
>
>
> Any ideas as to where the problem lies so I can pursue it further?
>
>
>
> _______________________________________________
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
>
>
>
More information about the bind-users
mailing list