Request for review of performance advice
John Thurston
john.thurston at alaska.gov
Wed Jul 8 16:39:00 UTC 2020
On 7/7/2020 5:57 PM, Victoria Risk wrote:
> A while ago we created a KB article with tips on how to improve your
> performance with our Kea dhcp server. The tips were fairly obvious to
> our developers and this was pretty successful. We would like to do
> something similar for BIND, provide a dozen or so tips for how to
> maximize your throughput with BIND. However, as usual, everything is
> more complicated with BIND.
This is an excellent idea.
If it comes to fruition, I ask there be some guidance offered on when
such optimizations are useful. I've seen places where such a guide-sheet
is followed when the guidelines were suitable for a business with 10X or
100X the traffic the customer sees.
That is, a configuration which benefits an organization seeing 100,000
qps may be excessively complex (or brittle) for one seeing 100 qps.
--
Do things because you should, not just because you can.
John Thurston 907-465-8591
John.Thurston at alaska.gov
Department of Administration
State of Alaska
>
> Can those of you who care about performance, who have worked to improve
> your performance, share some of your suggestions that have the most
> impact? Please also comment if you think any of these ideas below are
> stupid or dangerous. I have combined advice for resolvers and for
> authoritative servers, I hope it is clear which is which...
>
> The ideas we have fall into four general categories:
>
> System design
> 1a) Use a load balancerto specialize your resolvers and maximize your
> cache hit ratio. A load balancer is traditionally designed to spread
> the traffic out evenly among a pool of servers, but it can also be used
> to concentrate related queries on one server to make its cache as hot as
> possible. For example, if all queries for domains in .info are sent to
> one server in a pool, there is a better chance that an answer will be in
> the cache there.
>
> 1b) If you have a large authoritative system with many servers, consider
> dedicating some machines to propagate transfers. These machines, called
> transfer servers, would not answer client queries, but just send
> notifies and process IXFR requests.
>
> 1c) Deploy ghost secondaries. If you store copies of authoritative
> zones on resolvers (resolvers as undelegated secondaries), you can avoid
> querying those authoritative zones. The most obvious uses of this would
> be mirroring the root zone locally or mirroring your own authoritative
> zones on your resolver.
>
> we have other system design ideas that we suspect would help, but we are
> not sure, so I will wait to see if anyone suggests them.
>
> OS settings and the system environment
> 2a) Run on bare metal if possible, not on virtual machines or in the
> cloud. (any idea how much difference this makes? the only reference we
> can cite is pretty out of date -
> https://indico.dns-oarc.net/event/19/contributions/234/attachments/217/411/DNS_perf_OARC_Apr_14.pdf
> <https://urldefense.com/v3/__https://indico.dns-oarc.net/event/19/contributions/234/attachments/217/411/DNS_perf_OARC_Apr_14.pdf__;!!J2_8gdp6gZQ!7sRXGLQDm9waSVfgufc44e2-G1iYoLGoT_iBOLgmPYx3xAW8jKIAFbCB5OVJYYfEBpbu8w$>
> )
>
> 2b) Consider using with-tuning-large. (https://kb.isc.org/docs/aa-01314
> <https://urldefense.com/v3/__https://kb.isc.org/docs/aa-01314__;!!J2_8gdp6gZQ!7sRXGLQDm9waSVfgufc44e2-G1iYoLGoT_iBOLgmPYx3xAW8jKIAFbCB5OVJYYdvKmJFZQ$>)
> This is a compile time option, so not something you can switch on and
> off during production.
>
> 2c) Consider which R/W lock choice you want to use -
> https://kb.isc.org/docs/choosing-a-read-write-lock-implementation-to-use-with-named
> <https://urldefense.com/v3/__https://kb.isc.org/docs/choosing-a-read-write-lock-implementation-to-use-with-named__;!!J2_8gdp6gZQ!7sRXGLQDm9waSVfgufc44e2-G1iYoLGoT_iBOLgmPYx3xAW8jKIAFbCB5OVJYYftHIt-qg$>
> For the highest tested query rates (> 100,000 queries per second),
> pthreads read-write locks with hyper-threading /enabled/seem to be the
> best-performing choice by far.
>
> 2d) Pay attention to your choice of NIC cards. We have found wide
> variations in their performance. (Can anyone suggest what specifically
> to look for?)
>
> 2e) Make sure your socket send buffers are big enough. (not sure if this
> is obsolete advice, do we need to tell people how to tell if their
> buffers are causing delays?)
>
> 2f) When the number of CPUs is very large (32 or more), the increase in
> UDP listeners may not provide any performance improvement and might
> actually reduce throughput slightly due to the overhead of the
> additional structures and tasks. We suggest trying different values of
> -U to find the optimal one for your production environment.
>
>
> named Features
> 3a) Minimize logging. Query logging is expensive (can cost you 20% or
> more of your throughput) so don’t do it unless you are using the logs
> for something. Logging with dnstap is lower impact, but still fairly
> expensive. Don’t run in debug mode unless necessary.
>
> 3b) Use named.conf option minimal-responses yes; to reduce the amount of
> work that named needs to do to assemble the query response as well as
> reducing the amount of outbound traffic
>
> 3c) Disable synth-from-dnssec. While this seemed like a good idea, it
> turns out, in practice it does not improve performance.
>
> 3d) Tune your zone transfers. (https://kb.isc.org/docs/aa-00726
> <https://urldefense.com/v3/__https://kb.isc.org/docs/aa-00726__;!!J2_8gdp6gZQ!7sRXGLQDm9waSVfgufc44e2-G1iYoLGoT_iBOLgmPYx3xAW8jKIAFbCB5OVJYYe98KMFqg$>)
>
> When tuning the behavior of the primary, there are several factors that
> you can control:
>
> - The rate of notifications of changes to secondary servers
> (serial-query-rate and notify-delay)
>
> - Limits on concurrent zone transfers (transfers-out, tcp-clients,
> tcp-listen-queue, reserved-sockets)
>
> - Efficiency/management options (max-transfer-time-out,
> max-transfer-idle-out, transfer-format)
>
> The most important options to focus on are transfers-out,
> serial-query-rate, tcp-clients and tcp-listen-queue.
>
> 4e) If you use RPZ, consider using qnane-wait-recurse. We have had
> issues with RPZ transfers impacting query performance in resolvers. In
> general, more smaller RPZ zones will transfer faster than a few very
> large RPZ zones.
>
> 4f) Consider enabling prefetch on your resolver, unless you are running
> 9.10 (which is EOL) https://kb.isc.org/docs/aa-01122
> <https://urldefense.com/v3/__https://kb.isc.org/docs/aa-01122__;!!J2_8gdp6gZQ!7sRXGLQDm9waSVfgufc44e2-G1iYoLGoT_iBOLgmPYx3xAW8jKIAFbCB5OVJYYcf-H7ZBg$>
>
> Fix your transport network.
> Transport network issues cause BIND to keep retrying, which is a
> performance drain.
> 4a) Disable (in some cases, completely remove in order to prevent
> ongoing interference) outbound firewalls/packet-filters (particularly
> that maintain state on connections). These are a frequent cause of
> problems in the DNS that can cause your DNS server to do a lot of extra
> work.
>
> 4b) Set an appropriate MTU for your network. Ensure that your network
> infrastructure supports EDNS and large UDP responses up to 4096. Ensure
> that your network infrastructure allows transit for and reassembly of
> fragmented UDP packets (these will be large query responses if you are
> DNSSEC signing)
>
> 4c) Ensure that your network infrastructure allows DNS over TCP.
>
> 4d) Check for, and eliminate any incomplete IPv6 interface set-up (what
> can go wrong here is that BIND thinksthat it can use IPv6 authoritative
> servers, but actually the sends silently fail, leaving named waiting
> unnecessarily for responses)
>
> Any further suggestions, corrections or warnings are very welcome.
>
> Thank you!
> Vicky
>
> ---------
>
> Victoria Risk
> Product Manager
> Internet Systems Consortium
> vicky at isc.org <mailto:vicky at isc.org>
>
>
>
>
>
>
> _______________________________________________
> Please visit https://urldefense.com/v3/__https://lists.isc.org/mailman/listinfo/bind-users__;!!J2_8gdp6gZQ!7sRXGLQDm9waSVfgufc44e2-G1iYoLGoT_iBOLgmPYx3xAW8jKIAFbCB5OVJYYflfQafZw$ to unsubscribe from this list
>
> ISC funds the development of this software with paid support subscriptions. Contact us at https://urldefense.com/v3/__https://www.isc.org/contact/__;!!J2_8gdp6gZQ!7sRXGLQDm9waSVfgufc44e2-G1iYoLGoT_iBOLgmPYx3xAW8jKIAFbCB5OVJYYd9ITf9ow$ for more information.
>
>
> bind-users mailing list
> bind-users at lists.isc.org
> https://urldefense.com/v3/__https://lists.isc.org/mailman/listinfo/bind-users__;!!J2_8gdp6gZQ!7sRXGLQDm9waSVfgufc44e2-G1iYoLGoT_iBOLgmPYx3xAW8jKIAFbCB5OVJYYflfQafZw$
>
More information about the bind-users
mailing list