"Peer holds all free leases"

Wed Sep 1 20:40:21 UTC 2010

On Thu, Aug 26, 2010 at 05:09:58AM -0600, Pooja Gada wrote:
> Now when i bring down the primary server , secondary is not able to allocate all of the IP addresses which are free in the pool, after a certain time , it logs the message  
> "Peer holds all free leases". 

There is an often overlooked intentional design of the failover
protocol in that it segregates the ideas of 'communications-interrupted'
and 'partner-down'.

Simply being disconnected from the partner server does not imply the
server is offline.  It only means the two servers can't communicate.
Due to the natural complexities of network design, it's fully possible
that both servers can communicate to clients and give them addresses
even though they can't communicate to each other.

To resolve the possibility of addressing conflicts in this state, the
failover protocol divides available leases into the 'free' (for the
primary) and 'backup' (for the secondary) explicit states.

Consequently, if a server is truly offline, the failover protocol
requires the operator manually intervene and transition the surviving
server to partner-down.  After STOS+MCLT, the server will permit
itself to allocate from its peer's free pool.

During communications-interrupted, then, the server's free pool gives
it "endurance" to last hopefully through its peer's outage at least
until an operator can attend to it.  It does not guarantee lifelong
runtime, in fact, it admits the surviving server has a limited time to
live.

In some specific circumstances, failover servers may properly
understand their peer is down as opposed to simply disconnected.  For
example, in 'HA' environments where the servers are connected by a
heartbeat cable interface-to-interface rather than via a network or
switch infrastructure.  In this case, if you can't communicate and
the server process is running it would be truly unusual.  For these
deployments specifically, we added a configuration option to
automatically transition a server to partner-down after a timeout
after entering communications-interrupted.  Note even in this case
that the server must still wait STOS+MCLT in order to avoid address
assignment conflicts.

We also added a "optimization" documented in the failover protocol
specification, but we avoided implementing it until our implementation
was more reliable.  The optimization permits a failover server to
rewind a lease to the previous state it advertised to its peer.  This
permits a server in communications-interrupted state to take a free
lease, activate it, and return it to the free state to allocate to
another client, or to expire an active lease and return it to active
so long as it is only allocated to the same client.  This greatly
enhances the server's endurance in communications-interrupted, and I
suspect in some environments it may even permit a server to operate
in communications-interrupted indefinitely.  It is vastly preferable
to rely on this feature, if you can, than to enable auto-partner-down
as that bears the risk of requiring conflict resolution.

Both of these new features I mention were added in 4.2.0.  In previous
versions the only hope is to manually intervene to set the server
partner-down, or for HA style networks script the transition.

-- 
David W. Hankins	BIND 10 needs more DHCP voices.
Software Engineer		There just aren't enough in our heads.
Internet Systems Consortium, Inc.		http://bind10.isc.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <https://lists.isc.org/pipermail/dhcp-users/attachments/20100901/6d8d298d/attachment.bin>