Failover dhcpd pair stuck in partner-down/shutdown state

Eugene Grosbein eugen at grosbein.net
Tue Dec 25 09:23:52 UTC 2018


Hi!

I run two ISC DHCP Servers version 4.3.5 in failover mode.

They have been running just fine for several years being upgraded from time to time
until recently I found that first one runs in "partner-down" state
and second in "shutdown" state despite of tcp/647 control connection
in perfectly working state and data running over it according to tcpdump.

They were running in such state for very long time (over a year) and
I have no old logs to check due to log rotation. At the moment,
second server added "not responding (shut down)" to DHCPDISCOVER/DHCPREQUEST
lines written to its log.

I tried to resolve the issue by stopping second dhcpd completely
and starting it again. At start, it wrote to the log:

dhcpd: failover peer default: I move from shutdown to startup

Then it connected its control connection tcp/647 to second server,
exchanged some data over the connection, appended to dhcpd.leases file:

        failover peer "default" state {
          my state shutdown at 4 2017/03/30 02:17:13;
          partner state partner-down at 4 2017/03/30 02:17:13;
          mclt 60;
        }

Then it wrote to the log:

dhcpd: failover peer default: I move from startup to shutdown

And things settle again in same state.

Restart of first server did not help either.

I was forced to stop both of servers for short time, manually delete all
"failover" records quoted above from both dhcpd.leases files
and start servers again. Only then both servers got to "normal" state
(editing only one of dhcpd.leases files did not help).

My question: why did servers stuck in partner-down/shutdown state "forever"
and could not get from it without manual intervention despite of perfectly working
control TCP connection? Is this problem fixed in recent versions?

Here is dhcpd.conf of first server:

# default ports tcp/647

failover peer "default" {
        primary;
        address 62.231.191.161;
        peer address 62.231.191.174;
        max-response-delay 60;
        max-unacked-updates 10;
        mclt 60;
        split 128;
        auto-partner-down 60;
        load balance max seconds 3;
}

subnet 62.231.191.160 netmask 255.255.255.252 {}
include "/usr/local/etc/dhcpd.master";

Second server uses same configuraton except of IP addresses
and it uses identical dhcpd.master file containin rest of configuration.


More information about the dhcp-users mailing list