bind 8 slow when resolving new domains!
dap99 at i-55.com
dap99 at i-55.com
Fri May 7 01:38:39 UTC 2004
On Thu, 06 May 2004 21:59:53 +0100, Simon Waters
<Simon at wretched.demon.co.uk> wrote:
>dap99 at i-55.com wrote:
>> I am having a big problem with slow internal DNS (named 8.3.7-REL on
>> FreeBSD 4.9).
>What no BIND 9?
We are currently using bind 8. No, no bind 9 at this time. :)
>> Also, we are using their two DNS servers as forwarders.
>
>"Red alert, captain".
Someone else mentioned that. I have since removed the forwarding
options in options {}.
>> The colo promises it's not them, but frankly I can't see how it's us.
>>
>> # tcpdump -n host ns2 and \( icmp or udp \)
>> 10:07:37.832611 192.168.42.78.53 > isp-dns1.53: 4240+ [1au] A?
>> www.altavista.com. (46)
>> 10:07:51.013213 192.168.42.78.53 > isp-dns2.53: 4240+ [1au] A?
>> www.altavista.com. (46)
>> 10:07:51.074160 isp-dns2.53 > 192.168.42.78.53: 4240 2/9/10
>> CNAME[|domain] (DF)
>> 10:07:51.074476 192.168.42.78.53 > isp-dns1.53: 17509+ [1au] A?
>> avatw.search.yahoo2.akadns.net. (59)
>> 10:07:51.131568 isp-dns1.53 > 192.168.42.78.53: 17509 1/9/10 (393)
>> (DF)
>
...
>with queries from thousands of clients (or even one or two busy email
>servers), you may save a few tenths of a second per query by using them,
>but at the cost of slow responses if and when things go wrong (and more
>to go wrong).
Good to know. Thanks!
>
>> forward only; // added while troubleshooting
>> forward first; // added while troubleshooting
>
>One of these only.... forward-first, if allowed by the firewall, always
>seemed the smarter option to me.
Removed entirely anyway per your and other's recommendations.
>> ns2# nslookup www.looser.com
>
>Is "dig" broken on BSD ;)
No. Well, yes.
dig +trace doesn't show anything extra with FreeBSD. Man I could
really use +trace right now! I installed bind9 on a test system and
didn't see any change. (dig comes with bind9 as far as I know.)
>> Any ideas? Also, why so many FormErr (am I sending out bunk DNS
>> queries?).
>
>EDNS0 is my first guess - although you can double check the tcpdump
>after reading the docs.
EDNS0 is "Extension Mechanisms for DNS"
(http://www.dns.pl/dnssec/rfc2671.txt). In the RFC I see this note:
5.3. Responders who do not understand these protocol extensions are
expected to send a response with RCODE NOTIMPL, FORMERR, or
SERVFAIL. Therefore use of extensions should be "probed" such
that
a responder who isn't known to support them be allowed a retry
with
no extensions if it responds with such an RCODE. If a
responder's
capability level is cached by a requestor, a new probe should be
sent periodically to test for changes to responder capability.
But I'm using a stock bind8 with a routine enough options {} section.
Why would I be sending out unsupported query types!?
>> I would be happy to show selected output from named -d 3.
>
>"{r}ndc querylog" is friendlier and easier to understand than "-d 3" or
>"tcpdump", even if it does eat disk space on busy servers.
That's not producing much debug output (unless I'm missing something).
Here is a query:
# dig @ns2 www.help.com
; <<>> DiG 8.3 <<>> @ns2 www.help.com
; (1 server found)
;; res options: init recurs defnam dnsrch
;; res_nsend: Operation timed out
And my log:
May 6 20:35:04 ns2 named[61979]: XX+/192.168.42.70/help.com/A/IN
And then another attempt a few seconds later:
# dig @ns2 www.help.com
; <<>> DiG 8.3 <<>> @ns2 www.help.com
; (1 server found)
;; res options: init recurs defnam dnsrch
;; got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50738
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 2, ADDITIONAL: 2
;; QUERY SECTION:
;; www.help.com, type = A, class = IN
;; ANSWER SECTION:
www.help.com. 4m53s IN CNAME
abv-sfo1-x-redirect-vip.cnet.com.
abv-sfo1-x-redirect-vip.cnet.com. 5M IN CNAME
abv-sfo1-x-redirect-rr.cnet.com.
abv-sfo1-x-redirect-rr.cnet.com. 5M IN A 206.16.0.29
abv-sfo1-x-redirect-rr.cnet.com. 5M IN A 206.16.0.28
;; AUTHORITY SECTION:
cnet.com. 1d23h59m53s IN NS ns.cnet.com.
cnet.com. 1d23h59m53s IN NS ns2.cnet.com.
;; ADDITIONAL SECTION:
ns.cnet.com. 1d23h59m1s IN A 216.239.126.10
ns2.cnet.com. 1d23h59m1s IN A 206.16.0.71
;; Total query time: 5067 msec
;; FROM: server-box to SERVER: 192.168.42.78
;; WHEN: Thu May 6 20:35:42 2004
;; MSG SIZE sent: 30 rcvd: 221
And the log:
May 6 20:35:22 ns2 named[61979]: XX+/192.168.42.70/www.help.com/A/IN
I just don't see what's going on. Why is it timing out then working?
I'm thinking that it's taking a long time for my bind to get a
response, and my resolver times out first. The second try (after 5
seconds) must have been lucky as the domain was resolved just in time.
What I can't determine is *why* this is happening.
More information about the bind-users
mailing list