off-site slave servers? advice?

Barry Finkel b19141 at achilles.ctd.anl.gov
Fri Jul 2 15:11:48 UTC 2004


loren jan wilson <loren at uchicago.edu> wrote:

>>>Right now, my domain has 5 on-site nameservers (1 master, 4 slaves) 
>>>and 1 off-site slave on a different campus. There has been talk of
>>>adding yet another off-site slave in a different location, which
>>>brings up this topic.
>>>
>>>We run bind 9.2.1, and we've noticed (since upgrading from bind 8)
>>>that many domains that used to resolve now do not resolve. When 
>>>looking for the reason why, it always turns out that one of the
>>>non-resolving domains has a nameserver that isn't responding or
>>>that responds incorrectly. It seems that bind 8 was much more lenient
>>>in this regard, is that true?
>>>
>>>So, here's the reason for my worry:
>>>We want another off-site nameserver because we want people to be
>>>able to reach our main website (which will be mirrored elsewhere)
>>>if we're down for an extended period of time. However, since bind 9
>>>doesn't seem to want to resolve domains where one of the nameservers
>>>doesn't respond, will having one off-site nameserver that is responding
>>>even help us if the other 5 are down?
>>>
>>>Can anybody explain in detail how this is supposed to work, and give
>>>their opinion about whether or not we should be looking into this?

Barry Margolin <barmar at alum.mit.edu> replied:

>> I can imagine problems that would occur if one of the servers is 
>> responding *incorrectly*, since this might not trigger failover.  
>> Different nameserver implementations do indeed have different criteria 
>> for when to try another nameserver if they get a failure response from 
>> the first one.
>> 
>> But if one of the servers simply stops responding, failover should 
>> always occur.  That's the whole point of listing multiple nameservers: 
>> to provide redundancy when nameservers or networks fail.

loren jan wilson <loren at uchicago.edu> replied:

>I understand, but since we moved to bind 9, that hasn't been the
>case. A good example is what we went through with the .cn (china)
>domain last month...we lost network connectivity to parts of 
>china, and users started complaining that chinese websites stopped
>coming up. We checked, and it turned out that our nameservers were
>having intermittent difficulties resolving hosts in china, even though
>we could get to some the webservers by ip. This condition stayed the
>length of the outage, which was over a week.
>
>Can anybody else make a statement that would clarify matters?
>When we first upgraded (over a year ago), I asked the bind 9 list
>why we couldn't resolve a particular domain all of a sudden,
>and somebody answered by pointing out that one of the domains'
>nameservers wasn't responding to queries. I understand that it's
>supposed to break if the nameservers don't respond identically
>for a domain, but why does it seem to break when one of the nameservers
>goes down?
>
>Loren

Loren, please give us the domains/nodenames that you can not resolve
so that we can check to see what the problems might be.

What do you mean by

     non-resolving domains has a nameserver that isn't responding or
     that responds incorrectly.

With an actual case, we may be able to look at the DNS packets and
determine what is happening.
----------------------------------------------------------------------
Barry S. Finkel
Computing and Instrumentation Solutions Division
Argonne National Laboratory          Phone:    +1 (630) 252-7277
9700 South Cass Avenue               Facsimile:+1 (630) 252-4601
Building 222, Room D209              Internet: BSFinkel at anl.gov
Argonne, IL   60439-4828             IBMMAIL:  I1004994



More information about the bind-users mailing list