resolv.conf question / timeout behaviour

Wed Mar 31 16:00:53 UTC 2021

Tom Preissler <tom at preissler.co.uk> wrote:
>
> at my work place we have a three resolver setup in /etc/resolv.conf.
>
> We had sometimes, though rarely, response times for DNS like 14000ms,
> due to the fact that the *first* listed resolver is down for maintenance
> reasons.

Sadly the traditional unix stub resolver behaves REALLY BADLY if any of
its servers are unavailable. It does not keep enough information about
server performance and isn't really designed to be able to do that. The
resolv.conf tuning options are too coarse to help in any meaningful way.

Because of this, if it's important for you to avoid multi-second DNS
lookup times (and it usually is!), you need to design your system so that
the libc resolver never tries to talk to a DNS server that isn't
available.

As Matus Uhlar said, one way is to run a resolver daemon (e.g. BIND
configured to forward to your recursive servers) on each machine. Resolver
daemons are better able to keep track of which server is up, and they are
less likely to be unavailable when the client software needs them since
they are on the same machine. Most operating systems have resolver daemons
now; it's bascially only oldskool unix that needs extra setup.

Another way is a high availability setup for your recursive servers. I use
keepalived (my servers are on a resilient layer 2 network that spans
multiple locations); or you can use anycast if you need to do failover at
layer 3.

Of course, you can do both :-)

Tony.
-- 
f.anthony.n.finch  <dot at dotat.at>  https://dotat.at/
Faeroes: North backing west 5 or 6, decreasing 3 or 4 for a time.
Moderate or rough. Fair. Good.