problem with dhcp failover

Fri Dec 8 10:54:57 UTC 2017

Nicolas Ecarnot <nicolas at ecarnot.net> wrote:

> According what I remember (I've setup a complex bunch of failovers peers years ago), when a failover pool is setup, both servers share the pool of ip addresses and manage its half whatever happens.
> When a pair if failing, the half pool it was managing is not available anymore : it is not assignable. The remaining server has to welcome new requests in its remaining half pool.
> Clients from the lost pool will have to issue a new query that will be managed by the live server, but their ip will change.
> Knowing this, one has to provide a large enough pool forecasting such a case.

Well it's a bit more complicated than that.
During normal operations, the servers will balance the FREE IPs between them - so it's quite possible for the two servers to have a very imbalanced number of active leases. But you are correct, when one peer "dies", the other will go into "communications interrupted" state because it has no way of knowing if the other server is "gone" or just "not reachable" (there are a number of topologies and failure modes which could allow both peers to respond to clients but not reach each other).

So any server that cannot communicate with it's peer will ONLY deal with addresses it "owns". You need to allow enough free IP space to allow for clients unable to renew with one server to get addresses from the other - initially.
The answer is to put the remaining partner into "partner down" state - at which point it will now behave (as far as clients are concerned) as if failover was not being used, handling all the IP space. As above, the admin has to do this manually because it's not possible to automatically determine the difference between down and unreachable - though AIUI there is now a config option to make this automatic for admins that like to live dangerously ;-)

When the peer comes back up, they will re-establish communications, sync their leases, and eventually get back to normal operations.