DHCP Failover Complexity?

Tue May 19 17:18:19 UTC 2009

On Tue, May 19, 2009 at 09:13:19AM +0100, Matt Causey wrote:
> *  Server A or server B goes down for an extended period of time.  The
> remaining server is left in 'communications-interrupted'.  If the
> technician forgets to put the remaining server in partner-down, then
> the site experiences an outage, because eventually the server hands
> out all its leases.

It's not just that it hands out all its leases (if that is the case,
then your pools are insufficiently sized to survive a peer outage).
The situation is exhasperated because in failover, the server cannot
reallocate an expired lease until it has negotiated that expiration
with its peer.  This is because it cannot know what actions its peer
has taken while it was out of communication.  The failover
documentation mentions a safe optimization that approaches this
without reaching it (the ability to rewind a lease to the last state
the peer negotiated successfully), but we haven't implemented it yet.

> Of course, if the technician puts the system in partner_down, this is
> written in the leases file, and the state is persisted across bounces
> of the server.  Therefore, said technician must also remember to take
> the system -out- of partner_down, or else:

There should be no such need...

> * After a power event or something affecting the physical servers,
> server A is in recover_wait, server B in partner_down.  The server
> wants to wait MCLT before giving out addresses.  This means the
> technician engaged to recover the dhcp environment must manually place
> the remaining server in recover_done to get things moving along.

In the general case when a server is in partner_down and the peer
comes back online, they negotiate updated lease bindings and move
automatically to normal, rather rapidly.  This happens regularly
when one server is "softly" shutdown; when the peer receives the
advertisement of the move to shutdown, it enters partner-down.

If you've got recover_wait, then you've got a lost lease database,
and in this case the recover_wait state is very simple:  Wait for
STOS+MCLT to expire, and then transition to recover_done.  This is
automatic, no manual intervention is required, and the wait is
necessary in order to avoid a condition where an addressing collision
may ocurr.  It is sometimes not necessary (such as on an initial
startup when no leases have been allocated anyway), but as it turns
out, there is no good algorithm to determine if you lost your lease
database, or if it was never there to start with...

The other possibility of a partner-down healing event is for the
servers to enter potential-conflict (if for example one was in
partner-down and the other was in comms-interrupted).  In this case,
the servers (re)transmit lease bindings to each other one at a time,
entering conflict-done when they've sent the last of their own
bindings.  When both servers enter conflict-done, they automatically
transition to normal.  During the conflict resolution process,
neither server answers clients.  It is wise to avoid this, but it
also usually goes by pretty quickly.

So all these healing scenarios should be automatic.

> <snip>
[automatic partner-down script]
> </snip>
> 
> Is this a bad idea?  If so, why?  What other conditions should I be looking for?

It's impossible to say.  If your servers are connected e.g. by a
heartbeat cable, then there's no reason why you shouldn't enter
partner-down immediately (and in fact, a feature to do this is in
review for 4.2.0).  A server in partner-down won't allocate leases
in the partner's free state until STOS+MCLT expires anyway.  The
'risk' that there will be an addressing collision exists, but it
is next to zero (heartbeat cable fail), and you have MCLT seconds
to discover the condition and repair it anyway.  This sort of
contextual condition isn't something we can detect inside the failover
software.

If your servers are not connected in some way that communications
failures are not a reliable indicator the remote system is actually
offline or incapable of answering clients, then it is unsafe.  When
STOS+MCLT expires, the servers will start using each other's free
lease pools, and has a good chance to result in conflicting
allocations.

Although some DHCP software deployments might rest on either end of
those two boundaries of the scale (heartbeat clusters versus
geographical diversity systems), realistic deployments probably sit
somewhere inbetween that spectrum, and it's up to you to decide which
end of the pool you favor.

-- 
David W. Hankins	"If you don't do it right the first time,
Software Engineer		     you'll just have to do it again."
Internet Systems Consortium, Inc.		-- Jack T. Hankins
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <https://lists.isc.org/pipermail/dhcp-users/attachments/20090519/b94e3d66/attachment.bin>