DHCP failover problems - still

Robert Blayzor rblayzor.bulk at inoc.net
Wed May 6 18:25:43 UTC 2009


FreeBSD 6.4 - amd64
ISC DHCPD 3.1.2

We've been having a problem for quite some time with failover and  
DHCPD server.  For weeks at a time the servers will run absolutely  
great... then suddenly they just "lose connection" to each other and  
NEVER try to reconnect.

The servers are sitting right next to each other on the same Cisco Gig- 
E switch, both servers are identical software run diskless via NFS...  
no other network service problems, no errors, nothing.

Suddenly, one day all of our leases are consumed and the servers stop  
handing out new leases.

After more research we found that the failover connection between the  
two servers has been "interrupted".  Even though the logs claim that  
the connection was interrupted, both servers are running perfectly  
independent of each other on the same LAN.

So question #1 is I'm not sure why connections are interrupted in the  
first place...  The LAN never lost carrier, the servers sit on a  
private low traffic network.  According to the syslog....

May  4 01:37:10 dhcp1 dhcpd: timeout waiting for failover peer dhcp- 
failover
May  4 01:37:10 dhcp1 dhcpd: peer dhcp-failover: disconnected
May  4 01:37:10 dhcp1 dhcpd: failover peer dhcp-failover: I move from  
normal to communications-interrupted
May  4 01:37:30 dhcp0 dhcpd: timeout waiting for failover peer dhcp- 
failover
May  4 01:37:30 dhcp0 dhcpd: peer dhcp-failover: disconnected
May  4 01:37:30 dhcp0 dhcpd: failover peer dhcp-failover: I move from  
normal to communications-interrupted


Then nothing in the logs at all about failover until we stopped the  
servers on May 6th.

The second question is, why don't they attempt to "reconnect"?


Ideas?

TIA!

-- 
Robert Blayzor, BOFH
INOC, LLC
rblayzor at inoc.net
http://www.inoc.net/~rblayzor/






More information about the dhcp-users mailing list