What would be happen if one of two dns was down?
Kevin Darcy
kcd at chrysler.com
Wed Aug 13 03:37:45 UTC 2008
MontyRee wrote:
> sorry for non-txt based previous e-mail. sending again.
>
>
>
> So thanks for kind and concrete answers.
>
> and addtional questions are...
>
>
> -. others can use other resolvers like windows based or other bind version.
> so this program works well as you said without exception?
>
>
> -. in the point of high-availability of service,
> what it better two authorative dns servers or two master dns servers using L4 switch?
>
>
>
> So thanks again.
>
>
> Regards.
>
>
>
>
>> Subject: RE: What would be happen if one of two dns was down?
>> From: chris_cox at stercomm.com
>> To: bind-users at isc.org
>> Date: Tue, 12 Aug 2008 10:44:02 -0500
>>
>> On Tue, 2008-08-12 at 06:42 +0000, MontyRee wrote:
>>
>>> So thanks for kind answer.
>>>
>>>
>>> Additional questions below.
>>>
>>>
>>>
>>>>> Hello, all.
>>>>>
>>>>>
>>>>> I have operated two dns(primary and secondary) for one domain like below.
>>>>>
>>>>>
>>>>> example.com IN NS ns1.example.com
>>>>> example.com IN NS ns2.example.com
>>>>>
>>>>>
>>>>> and there was a event that ns1.example.com dns was down.
>>>>> As I know, if ns1 dns is down, all requests go to the ns2.example.com.
>>>>>
>>>> Depending on what 'down' means, it could take some time before
>>>> the request is sent to ns2. So there will likely be a delay, even
>>>> if not much (it will feel like forever to some users).
>>>>
>>> my 'down' means that system down so can't ping to server.
>>>
>>>
>>>
>>>>> But when ns1.example.com dns was down, actually some people can't lookup the domain.
>>>>>
>>>> Sounds like a configuration issue. However realize there is a zone
>>>> cache and if ns2 is slaving zones of ns2 (typical bind master slave
>>>> scenario) and the zone cache expires, then ns2 will refuse to
>>>> trust the slaved zone it had... and thus nothing works.
>>>>
>>> Sorry, I can't understand what you said.
>>> actually the master dns server(system) down time was just a hour and slave dns
>>> works well without any problem, but at that time some can connect but some said that
>>> they can't resolve the domain at all.
>>>
>> The slave will answer queries for the zone until the zone TTL expires
>> in which case if cannot contact the master, the zone will go effectively
>> dead.
>>
>> I think I used some bad "terms" in my explanation. Basically
>> there is an expiration ttl for which a slave will consider its
>> data to be good. After that, it will need to hit the master.
>>
>> (I trip up on using the right words)
>>
>> The value is often set to 2 weeks or more. But if the master is
>> down for a LONG time... you'll lose it all eventually (the slave
>> won't answer for that zone anymore).
>>
>> If you're seeing this problem after a short period of time, that's
>> likely NOT the cause unless somebody set the expiry in the SOA
>> to something really small.
>>
>> Caching in DNS is a wonderful thing, but can cause scenarios where
>> resolution is working for one and not for another (because of
>> the various Time To Live values and the time of last query/cache).
>>
>> Can you give us a feel for the amount of time between the failure
>> and the problem? Is it almost immediate? If so, then it's some
>> other kind of configuration issue (unless, as I said the zone was
>> just totally miconfigured). Can you post the SOA for the zone?
>>
>>
>>> It means, dns failover doesn't works well?
>>> and some resolver or some bind version, insist querying for the downed dns server?
>>>
>> Usually the client resolver is looking to query multiple nameservers, if
>> the first one is down, it moves onto the next and so on. Failover works
>> fine in this style (normally). Of course, a client might NOT be aware
>> of more than one nameserver... in which case there is no failover (duh).
>>
>>
>> ...
>>
>>> So thanks for your help again..
>>>
>> Did I explain it better this time?
>>
>>
>>
Let me try to explain this from a high level:
1) The NS records that are published for a zone are for the consumption
of other nameservers (technically, "iterative resolvers"). If one of the
nameservers listed as an NS for a zone becomes unavailable, failover is
very quick to the other NS(es). So quick as to usually be unnoticeable
by ordinary users. Iterative resolvers also *remember* which nameservers
are down, or slow, so they are very adaptive to failures.
2) The nameservers that are defined for a "stub resolver", like your
typical end-user PC, are tried *in*sequence*, so if the first one is
down, there may be a delay before the second one is tried, and if that
one is down an ever longer delay before the third one is tried, and so
forth. The delay is often quite noticeable, and impatient applications
may actually time out before a working nameserver is found. Stub
resolvers typically don't *remember* that a particular nameserver is
down, either, so in case of a failure, all queries are likely to be slow
until the failure is corrected.
3) Between masters and slaves, there is a REFRESH interval defined for
each replicated zone, which governs how often the slave checks the
master for updates, and then an EXPIRE interval after which the slave
considers the zone "bad" and will no longer give useful answers for
names in the zone. As mentioned previously in the thread, while REFRESH
can be as low as an hour or more, EXPIRE is typically on the order of
weeks, if not months. If a slave can't talk to the master for weeks,
chances are it's a permanent condition and the right thing to do is
"expire" the zone so that clients aren't given stale information. In
enterprises with a large number of slave servers (like ours), for
redundancy it is common to have multiple tiers of slaves, and the slaves
at a given tier to list multiple "masters" (i.e. sources of zone data)
from higher tiers, so that even if a single intermediate "master" dies
or becomes unavailable, changes still propagate out to the edges
everywhere. Note that there is an inherent problem in having servers at
the *same* tier list each other as "masters" reciprocally or in a
circular fashion, because then slaved zones can become "immortal" (i.e.
even if they're deleted from the primary master, the slaves in that
particular tier keep refreshing it from each other indefinitely).
So, your questions are
a) "others can use other resolvers like windows based or other bind
version."
Depends on what you mean by "resolver". If you mean the "resolver" part
of a nameserver implementation like BIND, configured for iterative
resolution (i.e. based on published NS records), then the failover is
very fast.
If, on the other hand, you mean a "stub resolver", like a typical
end-user PC client, then the failure of the first nameserver in the
resolver list can cause noticeable delays for every query. Note that on
some platforms it's possible to tune the delays (e.g. libresolv on some
Unix/Linux platforms understands some /etc/resolv.conf options which
govern timeouts and retries).
In the case of a "forwarding resolver", such as, e.g. BIND configured
with a "forwarders" statement, it depends on the exact implementation.
Even in its forwarding mode, BIND, for instance, still maintains a
cache, so on that basis alone it can be expected to perform reasonably
well even in the case of failures (unless the TTLs of the records being
looked up are very low). Modern versions of BIND also keep track of
up/down/slowness of its upstream forwarders, so it can adapt to failures
in the same way that it does when resolving iteratively (older versions
of BIND are not as adaptive in forwarding mode, trying each forwarder in
sequence, so they degenerated to the performance level of stub resolvers
+ caching). Other packages/implementations of forwarding resolvers may
cope well with failures, or not so well. It really depends.
b) "in the point of high-availability of service, what it better two
authorative dns servers or two master dns servers using L4 switch?"
I'm not 100% sure what you mean by "L4 switch". Do you mean a
load-balancer? The Internet standards mandate at least 2 nameservers for
each zone, so you don't technically have the option of putting 2 DNS
servers behind a single, load-balanced VIP. We have 2 VIPs defined for
our Internet-facing DNS zones and then each VIP has multiple nameservers
behind it. This conforms to standards, and not only gives us an
acceptable level of availability in the face of unplanned outages, but
also the flexibility to perform maintenance, upgrades, etc.
transparently to Internet DNS clients.
There's also the "anycast" approach, which is routing-layer-based, but
since we don't use that here, and I haven't researched it at all, and in
any case don't have a strong background in network routing, I'll defer
to others to explain how that works.
What, by the way, do you mean by two "master" DNS servers? The term
"master" is usually used in DNS in two different ways:
1) relationally, when talking about replication (as I do above), the
master is the provider of the zone data, and the slave is the consumer.
Within a multi-level replication hierarchy, a given server might be
"master" with respect to other servers in the hierarchy, and "slave" to
others.
2) When viewing the hierarchy as a whole, in the classic DNS replication
model (i.e. based on point-to-point AXFR/IXFR transfers), there is
really only 1 "master", i.e. the origin of the zone data, whether that
be from a flat file, a database backend, or whatever. All other
nameservers in the hierarchy are "slaves", in that they obtain the zone
data from other nameservers, rather than a source external to DNS
itself. Sometimes the term "primary master" is used for this kind of
"master", to distinguish it from "master", as used in the relational
sense in #1 above.
In neither sense of the term "master" do I understand how one could have
multiple "masters" behind a load-balancer, unless you're i) talking
about putting load-balancers between servers in the replication
hierarchy (in which case they're all "authoritative" anyway and there's
no difference between the options you presented), ii) deviating from the
classic DNS replication model (e.g. Microsoft's "multi-master"
architecture for Active Directory integrated DNS, where the backend is a
replicated LDAP database), or iii) simply using the term incorrectly.
- Kevin
More information about the bind-users
mailing list