DHCP Failover Complexity?
Matt Causey
matt.causey at gmail.com
Thu May 21 18:29:26 UTC 2009
Thanks for the detailed reply! It's quite helpful.
> It's not just that it hands out all its leases (if that is the case,
> then your pools are insufficiently sized to survive a peer outage).
What's the recommendation, then then for pool allocation? We have at
largest, /22 network allocations - with very close to 1000 devices
attached. We expect that a lease pool of most of that /22 should be
available and shared between the dhcp servers. For broadcast domains
with this many hosts - should we be allocating more IP address space
to give dhcp failover some breathing room?
> The situation is exhasperated because in failover, the server cannot
> reallocate an expired lease until it has negotiated that expiration
> with its peer. This is because it cannot know what actions its peer
> has taken while it was out of communication.
So, what does it mean if a single server in the pair crashes? Is it
expected that SysAdmin intervene in order to make sure that the
remaining server can serve leases for all clients at the site?
>> Of course, if the technician puts the system in partner_down, this is
>> written in the leases file, and the state is persisted across bounces
>> of the server. Therefore, said technician must also remember to take
>> the system -out- of partner_down, or else:
>
> There should be no such need...
Well, perhaps we're doing something wrong. But on our sites, if I
cleanly (kill -15) take down a dhcp server in the pair, the other
server goes to 'communications-interrupted'. Are you saying that the
expected behavior is that the remaining node should move to
partner-down?
In any event, if a server in the pair goes down unexpectedly (i.e. the
remaining server has no hope of gracefully moving to partner-down), we
want the remaining server to service the leases owned by the other
server without the need for an engineer to intervene.
>> <snip>
> [automatic partner-down script]
>> </snip>
>>
>> Is this a bad idea? If so, why? What other conditions should I be looking for?
>
> It's impossible to say. If your servers are connected e.g. by a
> heartbeat cable, then there's no reason why you shouldn't enter
> partner-down immediately (and in fact, a feature to do this is in
> review for 4.2.0). A server in partner-down won't allocate leases
> in the partner's free state until STOS+MCLT expires anyway. The
> 'risk' that there will be an addressing collision exists, but it
> is next to zero (heartbeat cable fail), and you have MCLT seconds
> to discover the condition and repair it anyway.
Can you elaborate on 'detect and repair' an address conflict? Do you
mean within dhcpd?
> This sort of
> contextual condition isn't something we can detect inside the failover
> software.
And I don't expect the software to do that, so we're wrapping some
additional site-specific logic around the software to prioritize dhcp
service availability over existing leases integrity.
Thanks!
--
Matt
More information about the dhcp-users
mailing list