Client DNS Cache

Tue Nov 21 03:02:07 UTC 2000

Clients typically won't be querying your nameservers directly. Instead,
they will be querying local nameservers which will then be querying your
nameservers and caching the results of those queries. So you should be
worrying a lot more about the behavior of other (caching) nameservers than
of clients, which generally just use "stub" resolvers which don't cache at
all.

So how do other nameservers pick which nameserver to ask about a particular
name? BIND, and I believe other nameserver implementations, select
nameservers by *speed*, i.e. if a particular domain has nameservers A and
B, and nameserver A has been known to answer queries faster than nameserver
B, then preference will be given to A. There is no way to *force* all other
nameservers to use a particular nameserver out of many; the assumption is
that *every* nameserver is listed in a domain's delegations is available to
ask about names in that domain, and will answer identically to any other
delegated nameserver for the domain. So this part of your scheme won't
work, unfortunately. You'll get some "leakage" to the backup nameserver --
and therefore the backup webserver -- even if you go to the drastic step of
artificially slowing it.

As for a general discussion about implementing webserver redundancy using
DNS, here's my standard blurb:

To do this with DNS, there's no really *good* way until all clients know
about
"SRV" records. These relatively-new resource records add a layer of
indirection
to the resource-location process, and have "preference" values associated
with
them, so clients using them know exactly in what order servers should be
tried.
Unfortunately, SRV-aware web clients are probably years away. [UPDATE:
some of us are trying to lobby Mozilla to include SRV support. See
http://bugzilla.mozilla.org/show_bug.cgi?id=14328].

The simple-minded DNS-based approach is to just change the A record for the

website when the main box goes down (possibly using some sort of automated
script, possibly making the change via Dynamic Update). The main problem
with
the simple-minded approach is caching: anyone who has the old record cached
in
some intermediate nameserver will still go to the "dead" server even if the
A
record has been changed on the master and the change has been propagated to
all
of the slaves. Unlike NOTIFY in the master/slave context, there's no
practical
way to tell every caching nameserver on the 'Net that a particular A record
has
changed, and that they should all come and fetch the new value.

A refinement of the simple-minded approach is to have the name of the
website
resolve to *both* addresses, and arrange for them to always be given out in
the
appropriate order, i.e. main website first, backup second. This can be
achieved
by specifying a "fixed" rrset-order on the master and all slaves for the
zone
(you need at least BIND 8.2 for this, and for security reasons that means
you'd
be wanting to run BIND 8.2.2 patchlevel 7). Some applications are smart
enough
to automatically failover, so for those clients this means more
availability
(after a short failover delay) during a failure of the main box, even if
they
get a "stale" RRset. Then again, some clients are *not* smart enough to do
this
failover, so it's a partial solution at best. Moreover, caching complicates

things here as well: for multi-valued A records, intermediate caching
servers
will tend to randomize/round-robin the order of the answers they give out.
This
re-ordering effect actually *helps* you in failure mode -- it means that
approximately 50% of the clients getting a stale RRset will still be able
to
connect without any failover delay -- but the flip side is that under
normal
circumstances, when the main box is up, it means that there will be a
certain
amount of "leakage" to the backup webserver. If you don't want to
continually
mirror the website contents, you can of course deal with this leakage
traffic
straightforwardly via a web redirect on the backup webserver, but that will
add
latency. When the main website fails, then remove its A record from the
RRset
and turn off the redirect (if any).

Note that with either approach, you can mitigate -- but not completely
eliminate -- the effects of intermediate caching servers by making the
address
records volatile (by reducing their TTL values). But this will greatly
increase
the traffic to your nameservers for that name, not to mention the extra
work
you'll cause for all other nameservers on the 'Net to constantly re-query
the
name. Overall, it's a Bad Thing, but many folks resort to it nonetheless.

There are non-DNS solutions to this problem, of course. The obvious one is
to
just have lots of redundant network paths so that access is "never" (never
say
never) lost to the main webserver, and of course the webserver itself
should be
clustered or high-availability so that it "never" goes down, but this
"brute
force" approach can get expensive. And then there are specialized
hardware/software failover solutions, which also tend to be expensive. Some
of
these make *both* servers look like the same IP address, and the failover
happens transparently. Some of them also integrate dynamic load-balancing
between webservers, which is probably something you want if you outgrow
your
current webserver capacity.

- Kevin
smr at hotmail.com wrote:

> Hi,
>
> I have a question. The scenario is, I have a primary DNS server and a
> secondary. For redundancy purpose, they are located at a different
> location
> (physical site). Similarly, there is one web server at each site (on
> different networks) and there is no inter connectivity between them.
>
> The idea is, the backup site should not be used at all unless the
> primary
> site is down.

> Lets say, the primary server resolves 10.10.10.10 for the web server and
> the secondary DNS server resolves 20.20.20.20 for the web server (each
> one
> resolving for at the local site).
>
> As per DNS work methodology, if the primary DNS server is not reachable,
> the client would try to reach to the secondary DNS. The assumption is,
> if
> the Primary site is down, the primary DNS would not be reachable. Hence
> the
> client will try to reach to the secondary DNS, which will in turn
> resolve
> to a webserver that is residing at the backup location. Now, the worry
> is,
> what if the client is caching the WEBSERVER address? In this case, how
> will
> the DNS client behave? Will this behaviour differ depending upon the
> client? I mean, WIN95, Solaris, Linux, NT, etc.. If the web server IP
> address is picked up from the client cache, and if it fails, will it
> always
> return "host unrechable" message or will it try to reach to the primary
> or
> the seconday DNS?

-- Attached file included as plaintext by Listar --
-- File: fallback.txt

To do this with DNS, there's no really *good* way until all clients know about
"SRV" records. These relatively-new resource records add a layer of indirection
to the resource-location process, and have "preference" values associated with
them, so clients using them know exactly in what order servers should be tried.
Unfortunately, SRV-aware web clients are probably years away.

The simple-minded DNS-based approach is to just change the A record for the
website when the main box goes down (possibly using some sort of automated
script, possibly making the change via Dynamic Update). The main problem with
the simple-minded approach is caching: anyone who has the old record cached in
some intermediate nameserver will still go to the "dead" server even if the A
record has been changed on the master and the change has been propagated to all
of the slaves. Unlike NOTIFY in the master/slave context, there's no practical
way to tell every caching nameserver on the 'Net that a particular A record has
changed, and that they should all come and fetch the new value.

A refinement of the simple-minded approach is to have the name of the website
resolve to *both* addresses, and arrange for them to always be given out in the
appropriate order, i.e. main website first, backup second. This can be achieved
by specifying a "fixed" rrset-order on the master and all slaves for the zone
(you need at least BIND 8.2 for this, and for security reasons that means you'd
be wanting to run BIND 8.2.2 patchlevel 7). Some applications are smart enough 
to automatically failover, so for those clients this means more availability
(after a short failover delay) during a failure of the main box, even if they
get a "stale" RRset. Then again, some clients are *not* smart enough to do this
failover, so it's a partial solution at best. Moreover, caching complicates
things here as well: for multi-valued A records, intermediate caching servers
will tend to randomize/round-robin the order of the answers they give out. This
re-ordering effect actually *helps* you in failure mode -- it means that
approximately 50% of the clients getting a stale RRset will still be able to
connect without any failover delay -- but the flip side is that under normal
circumstances, when the main box is up, it means that there will be a certain
amount of "leakage" to the backup webserver. If you don't want to continually
mirror the website contents, you can of course deal with this leakage traffic
straightforwardly via a web redirect on the backup webserver, but that will add
latency. When the main website fails, then remove its A record from the RRset
and turn off the redirect (if any).

Note that with either approach, you can mitigate -- but not completely
eliminate -- the effects of intermediate caching servers by making the address
records volatile (by reducing their TTL values). But this will greatly increase
the traffic to your nameservers for that name, not to mention the extra work
you'll cause for all other nameservers on the 'Net to constantly re-query the
name. Overall, it's a Bad Thing, but many folks resort to it nonetheless.

There are non-DNS solutions to this problem, of course. The obvious one is to
just have lots of redundant network paths so that access is "never" (never say
never) lost to the main webserver, and of course the webserver itself should be
clustered or high-availability so that it "never" goes down, but this "brute
force" approach can get expensive. And then there are specialized
hardware/software failover solutions, which also tend to be expensive. Some of
these make *both* servers look like the same IP address, and the failover
happens transparently. Some of them also integrate dynamic load-balancing
between webservers, which is probably something you want if you outgrow your
current webserver capacity.