off-site slave servers? advice?

loren jan wilson loren at uchicago.edu
Thu Jul 1 20:24:30 UTC 2004


On Thu, Jul 01, 2004 at 03:27:58PM -0400, Barry Margolin wrote:
> I can imagine problems that would occur if one of the servers is 
> responding *incorrectly*, since this might not trigger failover.  
> Different nameserver implementations do indeed have different criteria 
> for when to try another nameserver if they get a failure response from 
> the first one.
> 
> But if one of the servers simply stops responding, failover should 
> always occur.  That's the whole point of listing multiple nameservers: 
> to provide redundancy when nameservers or networks fail.

I understand, but since we moved to bind 9, that hasn't been the
case. A good example is what we went through with the .cn (china)
domain last month...we lost network connectivity to parts of 
china, and users started complaining that chinese websites stopped
coming up. We checked, and it turned out that our nameservers were
having intermittent difficulties resolving hosts in china, even though
we could get to some the webservers by ip. This condition stayed the
length of the outage, which was over a week.

Can anybody else make a statement that would clarify matters?
When we first upgraded (over a year ago), I asked the bind 9 list
why we couldn't resolve a particular domain all of a sudden,
and somebody answered by pointing out that one of the domains'
nameservers wasn't responding to queries. I understand that it's
supposed to break if the nameservers don't respond identically
for a domain, but why does it seem to break when one of the nameservers
goes down?

Loren


More information about the bind-users mailing list