Failover. Reesetablishing contact after communications-interrupted

Wed May 30 16:28:53 UTC 2007

On Wed, May 30, 2007 at 08:22:29AM +0200, Martin Ericsson wrote:
> 2007/5/29, Glenn Satchell <Glenn.Satchell at uniq.com.au>:
> > >May 23 16:16:22 nadrdir01 dhcpd: failover: listener: no matching state

This log line indicates that the failover peer {} stanza in
your config doesn't match how the remote peer presented itself.

In a CONNECT failover message, the peers indicate the name of the
session they are presiding over ('failover peer "(name)" { }').

The failover draft says to use this to match running failover
states.

In 3.0.x, we ignored this (for expeditious reasons, I think), and
instead matched the address the peer used to connect a TCP socket
against the "peer address" configuration parameter.  So, the remote
address on the TCP socket must equal what is evaluated from this
parameter.

In 3.1.x, the CONNECT message's name is matched to running failover
states.  So only the relationship name needs to match.

> > The "partner state normal" is only set when the clients first conenct
> > and is then never updated, so you can't rely on it. I asked about this
> > once before.

I think the partner's STOS (the time after 'normal') is never
updated (or possibly only in older versions, don't know if we
changed that).  We also don't use the partner's STOS for anything
either.  It doesn't enter into any failover calculation.

The actual peer state, however, should be updated, and reflects the
"last known state the partner was in."  I think this is used to
detect potential conflicts.

-- 
Ash bugud-gul durbatuluk agh burzum-ishi krimpatul.
				 https://secure.isc.org/store/t-shirt/
-- 
David W. Hankins	"If you don't do it right the first time,
Software Engineer		     you'll just have to do it again."
Internet Systems Consortium, Inc.		-- Jack T. Hankins