Automatically reconnect to failover peer?
David W. Hankins
David_Hankins at isc.org
Wed May 31 15:32:24 UTC 2006
On Thu, Jun 01, 2006 at 12:47:32AM +1000, Glenn Satchell wrote:
> We ran a disaster recovery test the other day. This involved
> disconnecting the network between the two sites that the failover peers
> are in. The disconnect was noticed and they moved to
> communications-interrupted, but upon reconnecting the networks about 3
> hours later the two did not automatically detect each other and return
> to normal mode. We're running 3.0.3, but I am sure this was something
> that was fixed as it did work when we did this test about a year ago
> (3.0.2 perhaps?).
It's hard to say if this is the old bug infoblox sent me a patch for
or a new one...
It could just be you got lucky and excercised a different code path.
> Is this an old bug that has come back, or a different problem
> altogether. We did wait about 40 minutes or so to see if they
> reconnected. During this time we were snooping for traffic, but there
> was nothing on the failover ports.
The retry interval should be more like 90 seconds.
If you have the time, try defining "DEBUG_FAILOVER_TIMING", which will
print out a message prior to every add_timeout() call. Then look
at the syslogs.
The last log lines that look failover related before the failover
timing debug logs is hopefully where the problem lies.
--
David W. Hankins "If you don't do it right the first time,
Software Engineer you'll just have to do it again."
Internet Systems Consortium, Inc. -- Jack T. Hankins
More information about the dhcp-users
mailing list