DHCP failover and restarting

Tue Feb 22 13:15:10 UTC 2011

A couple of (hopefully quick) questions:

 a)  Is there a "best practice" for restarting a pair of failover DHCP
     servers? Is it a bad thing to restart the second server immediately
     after the first? What's a good "grace period" between the restarts?

 b)  When just doing a "simple" restart (e.g. to activate new
     configuration) I had assumed the two servers communicate
     "automatically" with no need to explicitly (via OMAPI) tell the
     remaining server about it. Restarting is done via CentOS 5
     init-scripts which uses TERM. According to syslog the other server
     ends up in "communications-interrupted" instead of "partner-down",
     which doesn't seem right. Example snippet from the log:

Jan 31 10:29:20 dijkstra dhcpd: peer rmnet-failover: disconnected
Jan 31 10:29:20 dijkstra dhcpd: failover peer rmnet-failover: I move from normal to communications-interrupted
Jan 31 10:29:32 dijkstra dhcpd: failover peer rmnet-failover: peer moves from normal to recover
Jan 31 10:29:32 dijkstra dhcpd: failover peer rmnet-failover: I move from communications-interrupted to partner-down
Jan 31 10:29:32 dijkstra dhcpd: failover peer rmnet-failover: peer moves from recover to recover
Jan 31 10:29:32 dijkstra dhcpd: Update request all from rmnet-failover: sending update
...

 c)  Is there a place other than draft-ietf-dhc-failover-12 to learn
     about how failover works from an operator perspective?

Thank you in advance.

-- 
Peter