Automatically reconnect to failover peer?

David W. Hankins David_Hankins at isc.org
Wed May 31 15:32:24 UTC 2006


On Thu, Jun 01, 2006 at 12:47:32AM +1000, Glenn Satchell wrote:
> We ran a disaster recovery test the other day. This involved
> disconnecting the network between the two sites that the failover peers
> are in. The disconnect was noticed and they moved to
> communications-interrupted, but upon reconnecting the networks about 3
> hours later the two did not automatically detect each other and return
> to normal mode. We're running 3.0.3, but I am sure this was something
> that was fixed as it did work when we did this test about a year ago
> (3.0.2 perhaps?).

It's hard to say if this is the old bug infoblox sent me a patch for
or a new one...

It could just be you got lucky and excercised a different code path.

> Is this an old bug that has come back, or a different problem
> altogether. We did wait about 40 minutes or so to see if they
> reconnected. During this time we were snooping for traffic, but there
> was nothing on the failover ports.

The retry interval should be more like 90 seconds.


If you have the time, try defining "DEBUG_FAILOVER_TIMING", which will
print out a message prior to every add_timeout() call.  Then look
at the syslogs.

The last log lines that look failover related before the failover
timing debug logs is hopefully where the problem lies.

-- 
David W. Hankins		"If you don't do it right the first time,
Software Engineer			you'll just have to do it again."
Internet Systems Consortium, Inc.		-- Jack T. Hankins


More information about the dhcp-users mailing list