Is there anything new on the DHCP Failover Horizon?

Sun Apr 17 14:38:04 UTC 2011

Something like this will do the partner down. In a previous life I used 
a cron job to run a ping every 30 minutes and run this script if the 
ping failed. Partner down doesn't have to be immediate, just needs to be 
done before you run out of free leases (so the time will depend on the 
average number of free leases, lease duration, and how often new clients 
come along. "name" below is the failover peer name in dhcpd.conf.

#!/bin/sh
# set local serve into partner-down mode
omshell << EOF
server localhost
key itsomkey *<super-duper-secret key here>*
connect
new failover-state
set name = "name"
open
set local-state = 4
update
EOF

-- 
regards,
-glenn
--
Glenn Satchell                            |  Miss 9: What do you
Uniq Advances Pty Ltd, Sydney Australia   |  do at work Dad?
mailto:glenn.satchell at uniq.com.au         |  Miss 6: He just
http://www.uniq.com.au tel:0409-458-580   |  types random stuff.

On 04/17/11 13:05, Chris Buxton wrote:
> When it happens, use omshell to put the remaining server into
> partner-down state, instead of just communications-interrupted.
>
> Write a script that polls dhcpd every few minutes (via omshell)
> looking for communications-interrupted, and reacts by putting it into
> partner-down. Probably want to put some extra failsafe logic in there,
> too, to make sure the problem isn't something else instead.
>
> Regards,
> Chris Buxton
> BlueCat Networks
>
>
> On 4/15/11, Martin McCormick<martin at dc.cis.okstate.edu>  wrote:
>> 	We have been using DHCP failover for several years and
>> like the fact that one server can die and no phones ring for a
>> while, at least.
>>
>> 	The problems have been when the real world that never
>> fails to prove the old saying that if anything can go wrong, it
>> will, springs one of its little surprises such as wireless
>> controllers that end up sending different data to both servers
>> or, as we had today, weather-induced power hits that appear to
>> have brought down one server while the other one stayed up but
>> in a "peer holds all free leases" lock-down state.
>>
>> 	That has caused some staff members to ask whether
>> failover really buys us any redundancy. I know it does when
>> things fail cleanly such as when one server's dhcpd process dies
>> or the power goes away cleanly from a box but I was asked to see
>> if there are any other failover strategies that might be in
>> consideration that self-heal a bit faster.
>>
>> 	The discussions almost always start when we discover
>> that one server in the pair hagone in to "peer holds all free
>> leases" condition and people are not getting leases. Is there a
>> rapid way to clear it, when discovered?
>>
>> 	Of course, the real cure is to not send different data
>> to both servers. We serve around 10,000 clients here and can go
>> for months without a single "peer holds all" messages, but when
>> something goes wrong, we get a situation that at least with one
>> DHCP server does not monkey-wrench the whole subnet.
>>
>> 	Telling everyone that it is only that one subnet, etc,
>> is a hard sell. The voices just get louder and the questions
>> more probing.
>>
>> 	Any ideas as to the best way to handle "no free leases"
>> are appreciated as I have never found anything that was really
>> clean since the condition that causes it is by nature an error
>> and dhcpd is simply trying to be as safe as possible.