Manual Updates Under Failover

Martin McCormick martin at dc.cis.okstate.edu
Wed Nov 22 15:03:53 UTC 2006


	Thanks to help from the list, I have a completely
different strategy which works somewhat better in manually
updating dhcpd.  I wrote an expect script called dhcpshutdown
which uses omshell and successfully kills a running dhcpd server.
The script, however, is deliberately designed to hang just after
committing the murder.  What I do is fire it off as a background
process and then read syslog output until I see the last peer
message about the other failover peer going in to partner-down
mode.  At that point, my main script kills off the backgrounded
omhshell process because it has done its job and then it restarts
dhcpd since it now has a new configuration.

	This worked perfectly on a pair of servers on one of our
remote campuses which has a much smaller lease data base and a
much lower level of activity.

	When I tried it here, it worked fine on the secondary
dhcp server although, as one might expect, it took longer since
we have over 2,000 leases to write and things are very busy.  On
the primary server, however, it gets it shut down and restarted
properly, but the server has come up in that dreaded recover-wait
state each time.  It does send the update all request and does
get an answer.

	I read some earlier archived discussion in which the
response was that this was right.  I have seen this not to be the
case on the smaller system and I have been told by a very
reliable source that it shouldn't happen that the system comes up
in 
recover-wait mode if things were cleanly taken down earlier.

	When killing dhcpd, I look for the string
"I move from shutdown to recover"

	When restarting dhcpd, I look for
"I move from recover-done to normal"

Of the two pairs in question, 3 of the servers appear to reset
properly each time.  The small pair does it in 15 seconds and the
secondary of the very busy pair does so in about 80 seconds.

	Should I be looking for anything in the pair whose
primary always seems to come up in recover-wait mode?  Both
servers in this pair do have an empty /var/db/dhcpd/dhcpd.leases
file owned by dhcpd and a very active /var/db/dhcpd.leases file.


More information about the dhcp-users mailing list