Seemingly random ServFail issues on a caching server

Florian CROUZAT gentoo at floriancrouzat.net
Thu Aug 25 12:48:40 UTC 2011


Hi list,

On a few domains (we'll consider only one domain for this example) I
encounter sometimes (seemingly randoms) ServFails while resolving domain
names.
A client (192.168.147.2) asks my caching server (192.168.151.100) to resolve
a target (www.leclercdrive.fr)

Here are the relevant logs:

Aug 24 17:14:19 ns named[24929]: 24-Aug-2011 17:14:19.377 queries: info:
client 192.168.147.2#34502: view internal: query: www.leclercdrive.fr IN A +
Aug 24 17:14:19 ns named[24929]: 24-Aug-2011 17:14:19.380 queries: info:
client 192.168.147.2#34502: view internal: query: www.leclercdrive.fr IN A +
Aug 24 17:14:19 ns named[24929]: 24-Aug-2011 17:14:19.382 queries: info:
client 192.168.147.2#34502: view internal: query: www.leclercdrive.fr IN A +


A tcpdump on the local side of the NS server shows the A request and the
instant ServFail.
A tcpdump on the external side of the NS server shows no traffic at all in
this case meaning it fails internally and doesn't even try to forward the A
request to the Internet.

17:14:19.377608 IP 192.168.147.2.34502 > 192.168.151.100.53: 26340+ A?
www.leclercdrive.fr. (37)
17:14:19.378845 IP 192.168.151.100.53 > 192.168.147.2.34502: 26340 ServFail
0/0/0 (37)
17:14:19.380607 IP 192.168.147.2.34502 > 192.168.151.100.53: 52628+ A?
www.leclercdrive.fr. (37)
17:14:19.381383 IP 192.168.151.100.53 > 192.168.147.2.34502: 52628 ServFail
0/0/0 (37)
17:14:19.382605 IP 192.168.147.2.34502 > 192.168.151.100.53: 58933+ A?
www.leclercdrive.fr. (37)
17:14:19.383406 IP 192.168.151.100.53 > 192.168.147.2.34502: 58933 ServFail
0/0/0 (37)

A few minutes before, or later, it worked just fine, see:

17:15:58.736177 IP 192.168.147.2.34502 > 192.168.151.100.53: 49610+ A?
www.leclercdrive.fr. (37)
17:15:58.784470 IP 192.168.151.100.53 > 192.168.147.2.34502: 49610 3/3/6
CNAME[|domain]

The TTL of the www.leclercdrive.fr entry is 300 - which seems short to me -
maybe the ServFail happens when a request is treated at the exact time of
the TTL reaching zero and the cache entry beeing flushed ? I tried flushing
the cache using rndc but the first request after that worked just fine (of
course...)

Any ideas/hints are welcome.

The DNS server runs 1:9.5.1.dfsg.P3-1+lenny1
cat /etc/debian_version => 5.0.4
(I have no control on the version of the tools)

Thank you.


----
Florian






More information about the bind-users mailing list