One server of a failover pair is stuck in its state

Oscar Ricardo Silva oscars at mail.utexas.edu
Tue Aug 25 05:15:14 UTC 2009


We have a pair of 3.1.1 servers running in failover mode.  The 
configuration of the primary is:




	authoritative;

	#failover definition
	failover peer "pub-dhcp" {
		primary;
		address 192.168.200.34;
		port 520;
		peer address 192.168.201.34;
		peer port 520;
		max-response-delay 60;
  		max-unacked-updates 10;
		mclt 120;
		split 255;
		load balance max seconds 5;
	}

	include "/dhcpd/network-definitions.conf";



We recently added 8,000 addresses (15 different network definitions) and 
after restarting the daemon, we noticed the second server stayed in 
"recovering" mode and never moved from there.  The primary showed its 
local-state to be partner-down while it's partner-state was set to 
recovering.  The problem is that even when the dhcpd process was stopped 
on the peer, the primary still showed the same states.  Using omshell I 
manually tried to change the partner-state to down but it never took.

Is this a known bug or behavior?  Is there a way of un-sticking the 
primary?  Do I just need to restart the dhcpd process?  I'm wary of 
doing this since the failover peer is still "recovering" and restarting 
the primary will cause an outage.



More information about the dhcp-users mailing list