DHCP failover problems - still
Robert Blayzor
rblayzor.bulk at inoc.net
Wed May 6 18:25:43 UTC 2009
FreeBSD 6.4 - amd64
ISC DHCPD 3.1.2
We've been having a problem for quite some time with failover and
DHCPD server. For weeks at a time the servers will run absolutely
great... then suddenly they just "lose connection" to each other and
NEVER try to reconnect.
The servers are sitting right next to each other on the same Cisco Gig-
E switch, both servers are identical software run diskless via NFS...
no other network service problems, no errors, nothing.
Suddenly, one day all of our leases are consumed and the servers stop
handing out new leases.
After more research we found that the failover connection between the
two servers has been "interrupted". Even though the logs claim that
the connection was interrupted, both servers are running perfectly
independent of each other on the same LAN.
So question #1 is I'm not sure why connections are interrupted in the
first place... The LAN never lost carrier, the servers sit on a
private low traffic network. According to the syslog....
May 4 01:37:10 dhcp1 dhcpd: timeout waiting for failover peer dhcp-
failover
May 4 01:37:10 dhcp1 dhcpd: peer dhcp-failover: disconnected
May 4 01:37:10 dhcp1 dhcpd: failover peer dhcp-failover: I move from
normal to communications-interrupted
May 4 01:37:30 dhcp0 dhcpd: timeout waiting for failover peer dhcp-
failover
May 4 01:37:30 dhcp0 dhcpd: peer dhcp-failover: disconnected
May 4 01:37:30 dhcp0 dhcpd: failover peer dhcp-failover: I move from
normal to communications-interrupted
Then nothing in the logs at all about failover until we stopped the
servers on May 6th.
The second question is, why don't they attempt to "reconnect"?
Ideas?
TIA!
--
Robert Blayzor, BOFH
INOC, LLC
rblayzor at inoc.net
http://www.inoc.net/~rblayzor/
More information about the dhcp-users
mailing list