DHCP Failover Complexity?

Matt Causey matt.causey at gmail.com
Thu May 21 18:29:26 UTC 2009


Thanks for the detailed reply!  It's quite helpful.

> It's not just that it hands out all its leases (if that is the case,
> then your pools are insufficiently sized to survive a peer outage).

What's the recommendation, then then for pool allocation?  We have at
largest, /22 network allocations - with very close to 1000 devices
attached.  We expect that a lease pool of most of that /22 should be
available and shared between the dhcp servers.  For broadcast domains
with this many hosts - should we be allocating more IP address space
to give dhcp failover some breathing room?

> The situation is exhasperated because in failover, the server cannot
> reallocate an expired lease until it has negotiated that expiration
> with its peer.  This is because it cannot know what actions its peer
> has taken while it was out of communication.

So, what does it mean if a single server in the pair crashes?  Is it
expected that SysAdmin intervene in order to make sure that the
remaining server can serve leases for all clients at the site?

>> Of course, if the technician puts the system in partner_down, this is
>> written in the leases file, and the state is persisted across bounces
>> of the server.  Therefore, said technician must also remember to take
>> the system -out- of partner_down, or else:
>
> There should be no such need...

Well, perhaps we're doing something wrong.  But on our sites, if I
cleanly (kill -15) take down a dhcp server in the pair, the other
server goes to 'communications-interrupted'.  Are you saying that the
expected behavior is that the remaining node should move to
partner-down?

In any event, if a server in the pair goes down unexpectedly (i.e. the
remaining server has no hope of gracefully moving to partner-down), we
want the remaining server to service the leases owned by the other
server without the need for an engineer to intervene.

>> <snip>
> [automatic partner-down script]
>> </snip>
>>
>> Is this a bad idea?  If so, why?  What other conditions should I be looking for?
>
> It's impossible to say.  If your servers are connected e.g. by a
> heartbeat cable, then there's no reason why you shouldn't enter
> partner-down immediately (and in fact, a feature to do this is in
> review for 4.2.0).  A server in partner-down won't allocate leases
> in the partner's free state until STOS+MCLT expires anyway.  The
> 'risk' that there will be an addressing collision exists, but it
> is next to zero (heartbeat cable fail), and you have MCLT seconds
> to discover the condition and repair it anyway.

Can you elaborate on 'detect and repair' an address conflict?  Do you
mean within dhcpd?

> This sort of
> contextual condition isn't something we can detect inside the failover
> software.

And I don't expect the software to do that, so we're wrapping some
additional site-specific logic around the software to prioritize dhcp
service availability over existing leases integrity.


Thanks!

--
Matt



More information about the dhcp-users mailing list