ISC-dhcp subnet limit?

Chuck Anderson cra at WPI.EDU
Fri Jan 29 00:18:57 UTC 2016


You said "I have inherited 2 dhcp servers" so I thought whomever you
inherited them from may have put server into partner-down.  It /was/
in that state, so something or someone must have done it.  Are you
sure you didn't do it?  Are you sure you don't have
"auto-partner-down" set somewhere in your config files?  Does Ubuntu
do something silly and turn that on by default in their build of ISC
DHCP?  It looks like they may have:

https://bugs.launchpad.net/precise-backports/+bug/1072354

The reason this is so dangerous, is that you MUST be sure the partner
is really DOWN and unable to hear DHCP traffic from clients, and not
just isolated from the other failover peer, BEFORE putting a server
into "partner-down" state.  If both servers end up in state where they
can both hear DHCP traffic from clients but cannot communicate between
themselves over the failover TCP channel AND you force one/both of
them into "partner-down" (or rely on "auto-partner-down" to do this
for you), then you will have duplicate leases and other badness.

If this scenario had happened in the past, that could explain all the
duplicate lease messages you see.  It may also have caused the lease
files to get in a funny state that they cannot recover from
automatically, although this is just speculation on my part because
that code path probably isn't as well exercised since no one should
really every let that happen.

"dhcp-failover: ignored (recovering)" means the secondary is
reconciling lease data to/from the primary and will not answer clients
until this is completed.  Maybe with the over 1,000,000 leases you
have, this will take awhile.  Maybe it is stuck due to lease file
corruption.  Who knows.  Try waiting a while to see if it fully
recovers, or capture the traffic on the failover TCP stream to see
what is actually going on.

A simple way to get out of this mess may be to cut your losses on the
lease data on dhcp-2 and start with a blank slate there, letting the
recovery process rebuild dhcp-2's lease file from the primary.  To do
that, stop the dhcp-2 dhcpd service, delete dhcp-2's lease file, then
restart.  Make sure dhcp-1 is still running and answering first.

I still don't know how long you should expect 1,000,000 leases to take
to sync, so syncing from scratch may cause this process to start over
and take a long time...  Or maybe a million leases isn't really a
problem and it will be okay.  But it does seem like a lot to me:

Jan 27 14:45:03 dhcp-1 dhcpd: Wrote 1169142 leases to leases file.
Jan 27 15:29:21 dhcp-1 dhcpd: Wrote 1169401 leases to leases file.
Jan 27 16:17:35 dhcp-1 dhcpd: Wrote 1169721 leases to leases file.
Jan 27 15:50:25 dhcp-1 dhcpd: peer dhcp-failover: disconnected
Jan 27 16:19:38 dhcp-1 dhcpd: peer dhcp-failover: disconnected

Jan 27 16:16:39 dhcp-2 dhcpd: peer dhcp-failover: disconnected
Jan 27 16:18:55 dhcp-2 dhcpd: peer dhcp-failover: disconnected
Jan 27 14:15:51 dhcp-2 dhcpd: Wrote 0 leases to leases file.
Jan 27 15:28:38 dhcp-2 dhcpd: Wrote 29890 leases to leases file.
Jan 27 15:35:41 dhcp-2 dhcpd: Wrote 29920 leases to leases file.
Jan 27 15:50:28 dhcp-2 dhcpd: Wrote 29920 leases to leases file.


On Thu, Jan 28, 2016 at 07:46:17AM -0500, Rob Morin wrote:
> Hey Chuck, sorry for late reply, i fell asleep, lol
> 
> No body works on these servers other than myself, so no one put the
> peer in partner down mode...
> 
> i tested the 647 ports in both directions with a simple telnet
> command and both respond in both directions...
> 
> As of this morning(est time) the secondary shows the following still
> 
> Jan 27 23:50:37 dhcp-2 dhcpd: Wrote 661863 leases to leases file.
> Jan 28 00:03:53 dhcp-2 dhcpd: dhcp-failover: ignored (recovering)
> Jan 28 00:15:51 dhcp-2 dhcpd: dhcp-failover: ignored (recovering)
> Jan 28 00:16:37 dhcp-2 dhcpd: dhcp-failover: ignored (recovering)
> Jan 28 07:34:44 dhcp-2 dhcpd: DHCPREQUEST for 10.39.175.168
> (172.30.129.9) from f4:f1:e1:e5:14:1f via 10.39.175.1: not
> responding (recovering)
> 100's of the below
> Jan 28 07:43:08 dhcp-2 dhcpd: uid lease 10.35.166.59 for client
> a4:b8:05:8a:c3:82 is duplicate on 10.35.166.0/24
> 
> 
> Primary has this..
> i do not think this is a big issue for the moment as we do not care
> about resolution for the moment, should i explicitly indicate this
> in the conf file?
> 100's of the below...
> dhcp-1 dhcpd: bind update on 10.40.44.115 got ack from
> dhcp-failover: xid mismatch.
> Jan 28 07:41:35 dhcp-1 dhcpd: uid lease 10.38.115.215 for client
> 98:fe:94:85:75:3d is duplicate on 10.38.115.0/24
> 
> Thanks for your help s far...
> 
> 
> Rob
> Montreal Canada
> 
> On 2016-01-28 1:46 AM, Chuck Anderson wrote:
> >"partner-down" state must NEVER be entered unless the failover peer
> >server is really down.  Typically, partner-down is only entered
> >manually by server administrator action (via OMAPI or by carefully
> >editing the lease file while the server is stopped) or automatically
> >if you specifically enabled the dangerous "auto-partner-down" option
> >in the config (don't do that).
> >
> >Given that I don't see the "auto-partner-down" statement configured in
> >the bits you have posted, is it possible that someone at some point in
> >the past put the server into partner-down manually?  It should come out
> >of that state automatically once contact is re-established with the
> >failover peer.  Is there a firewall or iptables rules blocking port
> >647 communication between the two servers, preventing failover from
> >working correctly?
> >
> >>Jan 27 16:17:37 dhcp-1 dhcpd: failover peer dhcp-failover: I move from partner-down to startup
> >>Jan 27 16:17:46 dhcp-1 dhcpd: failover peer dhcp-failover: I move from startup to partner-down
> >>Jan 27 16:17:37 dhcp-1 dhcpd: failover peer dhcp-failover: I move from partner-down to startup
> >>Jan 27 16:17:46 dhcp-1 dhcpd: failover peer dhcp-failover: I move from startup to partner-down
> >>
> >>And now from dhcp-2
> >>Jan 27 16:17:19 dhcp-2 dhcpd: failover: link startup timeout
> >>Jan 27 16:17:56 dhcp-2 dhcpd: failover peer dhcp-failover: peer moves from partner-down to partner-down
> >>Jan 27 16:17:56 dhcp-2 dhcpd: failover peer dhcp-failover: peer moves from partner-down to partner-down
> >>Jan 27 16:28:41 dhcp-2 dhcpd: failover peer dhcp-failover: peer moves from partner-down to partner-down


More information about the dhcp-users mailing list