DHCP peer failure and pool exhaustion...

Mon Sep 9 19:32:41 UTC 2013

On 9 September 2013 20:13, Simon Hobson <dhcp1 at thehobsons.co.uk> wrote:
> I believe part of the reason for the current state of affairs is from a viewpoint that there are network topologies that could mean the peers are unable to communicate with each other, but both of them can communicate with their clients. If you were to put both peers into partner down state - then chaos would ensue as they proceeded to issue duplicate leases.

That's precisely my reasoning for it being a "bad thing".

Putting a peer into partner-down when it's not actually down causes
chaos, and if both systems were put into partner-down then you can end
up in the situation where neither peer is issuing leases for MCLT
(which I believe someone on the list has had in the past IIRC), your
network ends up in more sh*t than it already was in, at least some
clients could get online, now none can.

Unless you have a 100% guarantee that your script is flawless and can
only trigger partner-down when a peer is actually dead then the only
other method is human intervention.

And Greg, yes it's a sucky answer, but that's only because it's the
answer you didn't want to hear. At some point you need to deal with
the legacy crap you've been left with and fix it, the tools can only
go so far to assist. DHCP failover isn't perfect, no-one said it was,
and it does have it's gotchas, sadly you've ran into this one.

As a temporary measure you could have your monitoring system alert you
when a peer goes down, DHCPD isn't running, or
"COMMUNICATIONS-INTERRUPTED" appears in syslog so that you can then
access the systems, see if it is really down and apply a band aid (set
partner-down) before it has a detrimental impact on production
systems.

Steve