Recursive bind becomes unresponsive with high load

Michael Brunnbauer brunni at netestate.de
Thu Mar 31 15:57:00 UTC 2016


hi all,

I am using bind on a server that does massive crawling with a multithreaded 
Java app. This server occasionally has to do lookups for hosts in our local
zone netestate.de - for which it is not authoritative - and those lookups tend
to fail when the load is high (e.g. >1000 recursing clients). This suggests 
some kind of congestion.

I have verified that the authoritative name servers for our local zone are not
hammered with requests from the bind instance in question (adding . to every
hostname is important :-) I also have verified that lookups from the crawlers
for the local zone on the lo interface are not excessive. The problem occurs
even before max-cache-size is reached.

Here is my setup:

max-cache-size 1610612736;
recursive-clients 6000;
minimal-responses yes;

Mar 31 14:04:51 bardolino named[1506]: starting BIND 9.10.3-P2 <id:f9be8b2> -t /etc/namedroot -u named
Mar 31 14:04:51 bardolino named[1506]: built with '--prefix=/usr/local/bind' '--with-openssl=/usr/lib/ssl' '--enable-threads' '--with-tuning=large'
Mar 31 14:04:51 bardolino named[1506]: ----------------------------------------------------
Mar 31 14:04:51 bardolino named[1506]: BIND 9 is maintained by Internet Systems Consortium,
Mar 31 14:04:51 bardolino named[1506]: Inc. (ISC), a non-profit 501(c)(3) public-benefit
Mar 31 14:04:51 bardolino named[1506]: corporation.  Support and training for BIND 9 are
Mar 31 14:04:51 bardolino named[1506]: available at https://www.isc.org/support
Mar 31 14:04:51 bardolino named[1506]: ----------------------------------------------------
Mar 31 14:04:51 bardolino named[1506]: adjusted limit on open files from 65536 to 1048576
Mar 31 14:04:51 bardolino named[1506]: found 4 CPUs, using 4 worker threads
Mar 31 14:04:51 bardolino named[1506]: using 2 UDP listeners per interface
Mar 31 14:04:51 bardolino named[1506]: using up to 21000 sockets

/etc/resolv.conf:

 domain netestate.de
 nameserver 127.0.0.1
 options timeout:10 attempts:1

The problem also occurs with unchanged options (timeout:5 attempts:2).

I can control the number of DNS-threads of my crawling app and have tested it
with up to ca. 3500 recursing clients which results in a number of queries/s
of the same magnitude. With that setup, lookup errors for the local zone 
occur very often (the TTL for the local zone is 10 minutes).

I would be grateful for advice on where to search or what to adjust.

Here is a statistics dump while running with ca. 1000 recursing clients. A
high number of failing queries may be natural - we have a high number of
chinese link farms in our database.

+++ Statistics Dump +++ (1459439461)
++ Incoming Requests ++
             7329332 QUERY
++ Incoming Queries ++
             7261964 A
                1357 NS
                   4 CNAME
                 635 PTR
                   7 MX
               65365 AAAA
++ Outgoing Queries ++
[View: default]
            15552970 A
                2022 NS
                  78 CNAME
                  30 PTR
                   7 MX
               28796 AAAA
[View: _bind]
++ Name Server Statistics ++
             7329332 IPv4 requests received
              192360 requests with EDNS(0) received
                   4 TCP requests received
                 605 auth queries rejected
                   1 recursive queries rejected
             7327981 responses sent
                   5 truncated responses sent
              192358 responses with EDNS(0) sent
             6063138 queries resulted in successful answer
             6386951 queries resulted in non authoritative answer
              115630 queries resulted in nxrrset
              940424 queries resulted in SERVFAIL
              208183 queries resulted in NXDOMAIN
             6756330 queries caused recursion
                   3 duplicate queries received
                 348 queries dropped
                 606 other query failures
                1000 recursing clients
             7328722 UDP queries received
                   4 TCP queries received
++ Zone Maintenance Statistics ++
++ Resolver Statistics ++
[Common]
                  33 mismatch responses received
                 999 UDP queries in progress
                   1 TCP queries in progress
[View: default]
            15583903 IPv4 queries sent
             6182728 IPv4 responses received
              201626 NXDOMAIN received
               14456 SERVFAIL received
                  46 FORMERR received
                 138 EDNS(0) query failures
               19648 truncated responses received
                 379 lame delegations received
             8550865 query retries
             9401889 query timeouts
               15859 IPv4 NS address fetches
                 581 IPv4 NS address fetch failed
              242332 queries with RTT < 10ms
              307416 queries with RTT 10-100ms
             5575709 queries with RTT 100-500ms
               46819 queries with RTT 500-800ms
                1560 queries with RTT 800-1600ms
                8729 queries with RTT > 1600ms
                1000 active fetches
                 523 bucket size
               39623 REFUSED received
[View: _bind]
                 523 bucket size
++ Cache Statistics ++
[View: default]
             7790005 cache hits
                  14 cache misses
              660920 cache hits (from query)
             6766805 cache misses (from query)
                   0 cache records deleted due to memory exhaustion
             4272624 cache records deleted due to TTL expiration
             1665005 cache database nodes
             1064959 cache database hash buckets
           446003517 cache tree memory total
           365453290 cache tree memory in use
           365453458 cache tree highest memory in use
           360239104 cache heap memory total
             6836800 cache heap memory in use
             7262784 cache heap highest memory in use
[View: _bind (Cache: _bind)]
                   0 cache hits
                   0 cache misses
                   0 cache hits (from query)
                   0 cache misses (from query)
                   0 cache records deleted due to memory exhaustion
                   0 cache records deleted due to TTL expiration
                   0 cache database nodes
                  64 cache database hash buckets
              284496 cache tree memory total
               25096 cache tree memory in use
               25096 cache tree highest memory in use
              262144 cache heap memory total
                 576 cache heap memory in use
                 576 cache heap highest memory in use
++ Cache DB RRsets ++
[View: default]
             1370430 A
              119502 NS
               12152 CNAME
                3024 AAAA
                 347 DS
               25980 RRSIG
               24925 NSEC
               28986 !A
                5444 !AAAA
              110806 NXDOMAIN
[View: _bind (Cache: _bind)]
++ ADB stats ++
[View: default]
                1021 Address hash table size
                6266 Addresses in hash table
                1021 Name hash table size
                7209 Names in hash table
[View: _bind]
                1021 Address hash table size
                1021 Name hash table size
++ Socket I/O Statistics ++
            16276052 UDP/IPv4 sockets opened
               19652 TCP/IPv4 sockets opened
                   1 Raw sockets opened
            16275047 UDP/IPv4 sockets closed
               26470 TCP/IPv4 sockets closed
              711791 UDP/IPv4 socket bind failures
            15564255 UDP/IPv4 connections established
               12444 TCP/IPv4 connections established
                6824 TCP/IPv4 connections accepted
                 163 UDP/IPv4 recv errors
                1005 UDP/IPv4 sockets active
                6829 TCP/IPv4 sockets active
                   1 Raw sockets active
++ Per Zone Query Statistics ++
--- Statistics Dump --- (1459439461)

Regards,

Michael Brunnbauer

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail brunni at netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20160331/882bbf33/attachment.bin>


More information about the bind-users mailing list