Failover dhcpd pair stuck in partner-down/shutdown state

Wed Jan 2 20:08:38 UTC 2019

Hello:

I suspect that at some point in the past one of the servers was put into 
the shutdown state by setting it's state to shutdown (8) via omshell.  
This caused the other server to toggle to partner-down (4).  They 
servers will stay that way until you take them through recovery by 
setting the partner-down peer's state to recover (6).  When a server is 
set to shutdown state it remains that until you intervene.  This is 
intended to allow you to do maintenance and what not with minimal issues.

Regards,

Thomas Markwalder
ISC Software Engineering

On 12/25/18 4:23 AM, Eugene Grosbein wrote:
> Hi!
>
> I run two ISC DHCP Servers version 4.3.5 in failover mode.
>
> They have been running just fine for several years being upgraded from time to time
> until recently I found that first one runs in "partner-down" state
> and second in "shutdown" state despite of tcp/647 control connection
> in perfectly working state and data running over it according to tcpdump.
>
> They were running in such state for very long time (over a year) and
> I have no old logs to check due to log rotation. At the moment,
> second server added "not responding (shut down)" to DHCPDISCOVER/DHCPREQUEST
> lines written to its log.
>
> I tried to resolve the issue by stopping second dhcpd completely
> and starting it again. At start, it wrote to the log:
>
> dhcpd: failover peer default: I move from shutdown to startup
>
> Then it connected its control connection tcp/647 to second server,
> exchanged some data over the connection, appended to dhcpd.leases file:
>
>          failover peer "default" state {
>            my state shutdown at 4 2017/03/30 02:17:13;
>            partner state partner-down at 4 2017/03/30 02:17:13;
>            mclt 60;
>          }
>
> Then it wrote to the log:
>
> dhcpd: failover peer default: I move from startup to shutdown
>
> And things settle again in same state.
>
> Restart of first server did not help either.
>
> I was forced to stop both of servers for short time, manually delete all
> "failover" records quoted above from both dhcpd.leases files
> and start servers again. Only then both servers got to "normal" state
> (editing only one of dhcpd.leases files did not help).
>
> My question: why did servers stuck in partner-down/shutdown state "forever"
> and could not get from it without manual intervention despite of perfectly working
> control TCP connection? Is this problem fixed in recent versions?
>
> Here is dhcpd.conf of first server:
>
> # default ports tcp/647
>
> failover peer "default" {
>          primary;
>          address 62.231.191.161;
>          peer address 62.231.191.174;
>          max-response-delay 60;
>          max-unacked-updates 10;
>          mclt 60;
>          split 128;
>          auto-partner-down 60;
>          load balance max seconds 3;
> }
>
> subnet 62.231.191.160 netmask 255.255.255.252 {}
> include "/usr/local/etc/dhcpd.master";
>
> Second server uses same configuraton except of IP addresses
> and it uses identical dhcpd.master file containin rest of configuration.
> _______________________________________________
> dhcp-users mailing list
> dhcp-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/dhcp-users