Recursive bind becomes unresponsive with high load
Michael Brunnbauer
brunni at netestate.de
Thu Mar 31 15:57:00 UTC 2016
hi all,
I am using bind on a server that does massive crawling with a multithreaded
Java app. This server occasionally has to do lookups for hosts in our local
zone netestate.de - for which it is not authoritative - and those lookups tend
to fail when the load is high (e.g. >1000 recursing clients). This suggests
some kind of congestion.
I have verified that the authoritative name servers for our local zone are not
hammered with requests from the bind instance in question (adding . to every
hostname is important :-) I also have verified that lookups from the crawlers
for the local zone on the lo interface are not excessive. The problem occurs
even before max-cache-size is reached.
Here is my setup:
max-cache-size 1610612736;
recursive-clients 6000;
minimal-responses yes;
Mar 31 14:04:51 bardolino named[1506]: starting BIND 9.10.3-P2 <id:f9be8b2> -t /etc/namedroot -u named
Mar 31 14:04:51 bardolino named[1506]: built with '--prefix=/usr/local/bind' '--with-openssl=/usr/lib/ssl' '--enable-threads' '--with-tuning=large'
Mar 31 14:04:51 bardolino named[1506]: ----------------------------------------------------
Mar 31 14:04:51 bardolino named[1506]: BIND 9 is maintained by Internet Systems Consortium,
Mar 31 14:04:51 bardolino named[1506]: Inc. (ISC), a non-profit 501(c)(3) public-benefit
Mar 31 14:04:51 bardolino named[1506]: corporation. Support and training for BIND 9 are
Mar 31 14:04:51 bardolino named[1506]: available at https://www.isc.org/support
Mar 31 14:04:51 bardolino named[1506]: ----------------------------------------------------
Mar 31 14:04:51 bardolino named[1506]: adjusted limit on open files from 65536 to 1048576
Mar 31 14:04:51 bardolino named[1506]: found 4 CPUs, using 4 worker threads
Mar 31 14:04:51 bardolino named[1506]: using 2 UDP listeners per interface
Mar 31 14:04:51 bardolino named[1506]: using up to 21000 sockets
/etc/resolv.conf:
domain netestate.de
nameserver 127.0.0.1
options timeout:10 attempts:1
The problem also occurs with unchanged options (timeout:5 attempts:2).
I can control the number of DNS-threads of my crawling app and have tested it
with up to ca. 3500 recursing clients which results in a number of queries/s
of the same magnitude. With that setup, lookup errors for the local zone
occur very often (the TTL for the local zone is 10 minutes).
I would be grateful for advice on where to search or what to adjust.
Here is a statistics dump while running with ca. 1000 recursing clients. A
high number of failing queries may be natural - we have a high number of
chinese link farms in our database.
+++ Statistics Dump +++ (1459439461)
++ Incoming Requests ++
7329332 QUERY
++ Incoming Queries ++
7261964 A
1357 NS
4 CNAME
635 PTR
7 MX
65365 AAAA
++ Outgoing Queries ++
[View: default]
15552970 A
2022 NS
78 CNAME
30 PTR
7 MX
28796 AAAA
[View: _bind]
++ Name Server Statistics ++
7329332 IPv4 requests received
192360 requests with EDNS(0) received
4 TCP requests received
605 auth queries rejected
1 recursive queries rejected
7327981 responses sent
5 truncated responses sent
192358 responses with EDNS(0) sent
6063138 queries resulted in successful answer
6386951 queries resulted in non authoritative answer
115630 queries resulted in nxrrset
940424 queries resulted in SERVFAIL
208183 queries resulted in NXDOMAIN
6756330 queries caused recursion
3 duplicate queries received
348 queries dropped
606 other query failures
1000 recursing clients
7328722 UDP queries received
4 TCP queries received
++ Zone Maintenance Statistics ++
++ Resolver Statistics ++
[Common]
33 mismatch responses received
999 UDP queries in progress
1 TCP queries in progress
[View: default]
15583903 IPv4 queries sent
6182728 IPv4 responses received
201626 NXDOMAIN received
14456 SERVFAIL received
46 FORMERR received
138 EDNS(0) query failures
19648 truncated responses received
379 lame delegations received
8550865 query retries
9401889 query timeouts
15859 IPv4 NS address fetches
581 IPv4 NS address fetch failed
242332 queries with RTT < 10ms
307416 queries with RTT 10-100ms
5575709 queries with RTT 100-500ms
46819 queries with RTT 500-800ms
1560 queries with RTT 800-1600ms
8729 queries with RTT > 1600ms
1000 active fetches
523 bucket size
39623 REFUSED received
[View: _bind]
523 bucket size
++ Cache Statistics ++
[View: default]
7790005 cache hits
14 cache misses
660920 cache hits (from query)
6766805 cache misses (from query)
0 cache records deleted due to memory exhaustion
4272624 cache records deleted due to TTL expiration
1665005 cache database nodes
1064959 cache database hash buckets
446003517 cache tree memory total
365453290 cache tree memory in use
365453458 cache tree highest memory in use
360239104 cache heap memory total
6836800 cache heap memory in use
7262784 cache heap highest memory in use
[View: _bind (Cache: _bind)]
0 cache hits
0 cache misses
0 cache hits (from query)
0 cache misses (from query)
0 cache records deleted due to memory exhaustion
0 cache records deleted due to TTL expiration
0 cache database nodes
64 cache database hash buckets
284496 cache tree memory total
25096 cache tree memory in use
25096 cache tree highest memory in use
262144 cache heap memory total
576 cache heap memory in use
576 cache heap highest memory in use
++ Cache DB RRsets ++
[View: default]
1370430 A
119502 NS
12152 CNAME
3024 AAAA
347 DS
25980 RRSIG
24925 NSEC
28986 !A
5444 !AAAA
110806 NXDOMAIN
[View: _bind (Cache: _bind)]
++ ADB stats ++
[View: default]
1021 Address hash table size
6266 Addresses in hash table
1021 Name hash table size
7209 Names in hash table
[View: _bind]
1021 Address hash table size
1021 Name hash table size
++ Socket I/O Statistics ++
16276052 UDP/IPv4 sockets opened
19652 TCP/IPv4 sockets opened
1 Raw sockets opened
16275047 UDP/IPv4 sockets closed
26470 TCP/IPv4 sockets closed
711791 UDP/IPv4 socket bind failures
15564255 UDP/IPv4 connections established
12444 TCP/IPv4 connections established
6824 TCP/IPv4 connections accepted
163 UDP/IPv4 recv errors
1005 UDP/IPv4 sockets active
6829 TCP/IPv4 sockets active
1 Raw sockets active
++ Per Zone Query Statistics ++
--- Statistics Dump --- (1459439461)
Regards,
Michael Brunnbauer
--
++ Michael Brunnbauer
++ netEstate GmbH
++ Geisenhausener Straße 11a
++ 81379 München
++ Tel +49 89 32 19 77 80
++ Fax +49 89 32 19 77 89
++ E-Mail brunni at netestate.de
++ http://www.netestate.de/
++
++ Sitz: München, HRB Nr.142452 (Handelsregister B München)
++ USt-IdNr. DE221033342
++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20160331/882bbf33/attachment.bin>
More information about the bind-users
mailing list