Failure of dhcp server failover

Fri May 6 16:33:35 UTC 2016

On May 4, 2016, at 7:40 AM, Simon Hobson <dhcp1 at thehobsons.co.uk> wrote:
> Eugene Grosbein <eugen at grosbein.net> wrote:
> 
>> I've switched my single address pools to non-failover mode and the problem has gone. Case closed.
> 
> Just to finish this up for the benefit of anyone searching the archives later ...
> 
> What I think was happening in this case is that with only one address in a pool, failover just "doesn't work properly". Only one server can hold that lease, and if it decides to load balance* the query to the other server then neither will reply to the client - one server doesn't reply because it's going to leave the other one to do it, but the other one doesn't reply because it hasn't got a free lease.

I believe DHCP failover was designed to give you redundancy assuming you can configure sufficient extra addresses.
It’s designed to continue working even if the two servers lose contact with each other and no longer
know whether the other is up, even if clients are still talking to them them both, and despite this,
the servers still never ever give out the same address to two different clients.  To do this, they used
the strategy of assigning a server to “own” any particular unused address at any time.

You can’t have all this without IP address slack.  They could have chosen other factors as primary,
but they chose: “never give the same IP” plus “work even if the servers lose touch”.  The cost is in having
more addresses in the pools than the number you need to keep working during partial outages.

Twice the number of clients the pool will ever see should be sufficient, but usually overkill.  A rough size guess
would be “double the pool's maximum number of clients” minus “minimum number of clients the pool ever sees”.

John Wobus
Cornell IT