Catastrophic failure and recovery

Mon Jun 25 17:48:18 UTC 2018

The way you describe is how it would work if you didn't have failover setup at all.  With failover setup, the "new" server, when it connects to the existing, will get a list of all the current leases and such.  It will then enter the "recover" period where it won't hand any leases out.  "Recover" is the length of MCLT (from the failover configuration).  Once that period is passed, both servers will operate as normal.

----- Original Message -----
> From: "Gregory Sloop" <gregs at sloop.net>
> To: "Users of ISC DHCP" <dhcp-users at lists.isc.org>
> Sent: Monday, June 25, 2018 1:29:59 PM
> Subject: Catastrophic failure and recovery

> Catastrophic failure and recovery So, in the case I'm interested in here, I've
> got a pair of peers [failover].
> [ISC/We really should pick a different name than failover, because it's
> essentially load-balancing with redundancy, but I digress :) ]

> Now while I'm using two peers, I think the question I'm asking about will be the
> same regardless of peers or a single server...

> So, lets assume the DHCP server [or a peer] dies. Assume we lost a disk.
> Assume I've got configs, but no leases file.

> What's the best recovery method?

> ---
> I assume we'll simply put the configurations back on a "new" server. [or peer]
> Turn it on and bring it up. [In the peer setup, let it communicate with the
> other peer.]

> Since it won't have a record of any leases [that the dead-peer/old-server
> actually leased] we'll have a bit of a mess.
> But, we'd hope that most machines would already have a lease, and would ask for
> renewal of that lease.
> The server, I think, would generally grant that lease renewal on the same IP.
> [Even though it has no record of it initially.]

> "New" machines just powered up, may/will ask for new addresses, and may "steal"
> a lease from an active client. ...BUT...
> However, if the DHCP server can [and is set to use ping-check] AND the station
> isn't firewalled or otherwise prevented from receiving/responding to the
> ping-check, then the DHCP server will realize there's an active client using
> the address and will avoid leasing that address.

> If the active lease is on a machine that's off and returns to the network
> [before the end of the lease] I'm not sure of the result. I *think* it will
> attempt to confirm the lease when it comes back on, will get a NAK and be
> forced to get a new lease.

> Thus, generally, using best practices, the result of a catastrophic loss of a
> DHCP server shouldn't be too disruptive.
> [Provided it can be replaced fairly quickly before too many machines lose their
> current lease.]
> [ mailto:gregs at sloop.net ]
> The above setup will be a lot cleaner if there's not much/any IP address churn -
> in that, for a particular pool, there's enough addresses to give every machine
> an address simultaneously. If there's a lot of churn it will be substantially
> more messy, but machines will see far less stability in IP address assignment
> [But there wasn't a lot of stability to start with, so we've probably only
> increased the churn rate some.]

> Does that sound about right?
> I'm sure there's use cases I'm not considering because I don't have those
> configurations - but am I missing anything serious?

> ---
> On a side note - is it worth capturing [backing up] the leases file, say at a
> rate of 0.5 times the lease length? [The idea would be to have a reasonably
> current leases file that might be 80%+ right. Or is this likely to cause more
> problems than no leases file at all.]

> Pointers to FAQ/Docs etc gladly accepted!

> TIA
> -Greg.
> _______________________________________________
> dhcp-users mailing list
> dhcp-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/dhcp-users