Secondary server in failover fails to come out of recover state

Wed Apr 24 21:40:27 UTC 2013

We have two servers in a failover relationship, both running 4.1-ESV-R7. 
  After a reload of dhcpd on the secondary, it has not come out of the 
recover state after almost an hour.  We've had this happen with 3.1.3 
and recently upgraded to this version.  The only thing we've been able 
to do is stop both instances of dhcpd and remove "my state" and "partner 
state" from dhcpd.leases.

Here's the timeline of what happened.

1.  A change was made to the configuration of the primary and dhcpd 
reloaded at 15:39:14.
2. The primary moved back to a "normal" state at 15:43:42

Apr 24 15:39:14 primary-dhcp dhcpd: failover peer dhcp: I move from 
normal to shutdown
Apr 24 15:39:15 primary-dhcp dhcpd: failover peer dhcp: peer moves from 
normal to partner-down
Apr 24 15:39:15 primary-dhcp dhcpd: failover peer dhcp: I move from 
shutdown to recover
Apr 24 15:40:18 primary-dhcp dhcpd: failover peer dhcp: I move from 
recover to startup
Apr 24 15:40:18 primary-dhcp dhcpd: failover peer dhcp: I move from 
startup to recover
Apr 24 15:43:42 primary-dhcp dhcpd: failover peer dhcp: peer update 
completed.
Apr 24 15:43:42 primary-dhcp dhcpd: failover peer dhcp: I move from 
recover to recover-done
Apr 24 15:43:42 primary-dhcp dhcpd: failover peer dhcp: peer moves from 
partner-down to normal
Apr 24 15:43:42 primary-dhcp dhcpd: failover peer dhcp: I move from 
recover-done to normal
Apr 24 15:44:53 primary-dhcp dhcpd: failover peer dhcp: peer moves from 
normal to shutdown
Apr 24 15:44:53 primary-dhcp dhcpd: failover peer dhcp: I move from 
normal to partner-down
Apr 24 15:44:54 primary-dhcp dhcpd: peer dhcp: disconnected
Apr 24 15:45:59 primary-dhcp dhcpd: failover peer dhcp: peer moves from 
shutdown to recover
Apr 24 15:45:59 primary-dhcp dhcpd: failover peer dhcp: peer moves from 
recover to recover

3.  The corresponding change was made on the secondary and it was 
reloaded at 15:44:53

4.  At 15:44:54 it came back up into recover, then moved from recover to 
startup, then from startup to recover.  That's where it's been ever since.

Apr 24 15:44:53 secondary-dhcp dhcpd: failover peer dhcp: I move from 
normal to shutdown
Apr 24 15:44:53 secondary-dhcp dhcpd: failover peer dhcp: peer moves 
from normal to partner-down
Apr 24 15:44:54 secondary-dhcp dhcpd: failover peer dhcp: I move from 
shutdown to recover
Apr 24 15:45:56 secondary-dhcp dhcpd: failover peer dhcp: I move from 
recover to startup
Apr 24 15:45:59 secondary-dhcp dhcpd: failover peer dhcp: I move from 
startup to recover

Here's dhcpd.conf for the primary:

option domain-name-servers 192.168.50.41, 192.168.50.40 ;
option ntp-servers 192.168.50.40, 192.168.50.41;
default-lease-time 86400;
max-lease-time 86400;
one-lease-per-client true;
ddns-update-style ad-hoc;
ddns-updates off;
authoritative;
if substring (option dhcp-client-identifier, 0, 5) = 01:52:41:53:20 {
         deny booting;
}
option voip-tftp-server-address code 150 = array of ip-address ;
set vendor-string = option vendor-class-identifier;
failover peer "dhcp" {
          primary;
          address 192.168.100.2;
          port 520;
          peer address 192.168.101.2;
          peer port 520;
          max-response-delay 60;
          max-unacked-updates 10;
          mclt 120;
          split 255;
          load balance max seconds 5;
        }
subnet 192.168.100.0 netmask 255.255.255.224 {
         }
include "/dhcpd/dhcpd.network.conf";

and the /dhcpd/dhcpd.network.conf file holds the scope definitions. 
Both servers sync time through ntp and have the same exact time.

Any information would be appreciated.