Secondary appears to not recover after restart

Oscar Ricardo Silva oscars at mail.utexas.edu
Wed Sep 19 19:29:43 UTC 2012


It was one of the first things I checked.  Both servers have the same 
exact time.


To the list as a whole:  am I missing something or is there a way to 
decode the "failover" messages that are being sent between the servers? 
  As far as I know, the communication between the two servers is 
encrypted so I can't just look at the packets.  While I can look at the 
logs and see that a peer changed state, I have no idea what exactly one 
server is "telling" the other.




Oscar



On 09/18/2012 12:08 PM, dhcp-users-request at lists.isc.org wrote:
> Message: 5
> Date: Tue, 18 Sep 2012 15:32:36 +0000
> From: Randall C Grimshaw<rgrimsha at syr.edu>
> To: Users of ISC DHCP<dhcp-users at lists.isc.org>
> Subject: RE: Secondary appears to not recover after restart
> Message-ID:
> 	<E026853FAE2E5E47BE78B287F89DAF9E3C225809 at SUEX10-mbx-03.ad.syr.edu>
> Content-Type: text/plain; charset="us-ascii"
>
> not specifically this... and offered is a slightly educated guess... but if the clocks are out of sync they may appear to be - not communicating.
>
> Randall Grimshawrgrimsha at syr.edu
> ________________________________________
> From:dhcp-users-bounces+rgrimsha=syr.edu at lists.isc.org  [dhcp-users-bounces+rgrimsha=syr.edu at lists.isc.org] on behalf of Oscar Ricardo Silva [oscars at mail.utexas.edu]
> Sent: Tuesday, September 18, 2012 11:18 AM
> To:dhcp-users at lists.isc.org
> Subject: Secondary appears to not recover after restart
>
> We've recently had three incidents where the secondary appears to not
> recover after it's been restarted.  After the third time I took a closer
> look at the state of the primary after the secondary was completely
> shutdown and saw it didn't change.  Even with it's peer completely down,
> the primary still said its partner was in a recover state.
>
> Here are the logs and I'm including the configs at the end of the
> message.  The servers were OK, no high CPU, nothing out of the ordinary
> with network traffic, etc.  We're running V3.1.3.  I'm aware the
> statement "split 255;" in the config is not standard but we've been
> running this way for years and is legal according to the documents.
>
> Anyone seen this before?  Any ideas on why the primary is "stuck" in a
> particular state when its peer is definitely down?
>
>
>
> A configuration change is made and the primary is restarted at 11:00:52.
>    It comes back, communicates with the secondary, syncs and then moves
> to a normal state at 11:01:01.  The secondary is then restarted at
> 11:01:59 and then stays in recover mode.
>
>
> Primary server:
> 11:00:52 primary-dhcp dhcpd: failover peer dhcp: I move from normal to
> shutdown
> 11:00:52 primary-dhcp dhcpd: failover peer dhcp: peer moves from normal
> to partner-down
> 11:00:53 primary-dhcp dhcpd: failover peer dhcp: I move from shutdown to
> recover
> 11:00:55 primary-dhcp dhcpd: failover peer dhcp: I move from recover to
> startup
> 11:00:55 primary-dhcp dhcpd: failover peer dhcp: I move from startup to
> recover
> 11:01:01 primary-dhcp dhcpd: failover peer dhcp: peer update completed.
> 11:01:01 primary-dhcp dhcpd: failover peer dhcp: I move from recover to
> recover-done
> 11:01:01 primary-dhcp dhcpd: failover peer dhcp: peer moves from
> partner-down to normal
> 11:01:01 primary-dhcp dhcpd: failover peer dhcp: I move from
> recover-done to normal
>
> 11:01:59 primary-dhcp dhcpd: failover peer dhcp: peer moves from normal
> to shutdown
> 11:01:59 primary-dhcp dhcpd: failover peer dhcp: I move from normal to
> partner-down
> 11:02:00 primary-dhcp dhcpd: peer dhcp: disconnected
> 11:02:10 primary-dhcp dhcpd: failover peer dhcp: peer moves from
> shutdown to recover
> 11:02:10 primary-dhcp dhcpd: failover peer dhcp: peer moves from recover
> to recover
>
>
>
> Secondary server:
> 11:00:52 secondary-dhcp dhcpd: failover peer dhcp: peer moves from
> normal to shutdown
> 11:00:52 secondary-dhcp dhcpd: failover peer dhcp: I move from normal to
> partner-down
> 11:00:53 secondary-dhcp dhcpd: peer dhcp: disconnected
> 11:00:55 secondary-dhcp dhcpd: failover peer dhcp: peer moves from
> shutdown to recover
> 11:00:55 secondary-dhcp dhcpd: failover peer dhcp: peer moves from
> recover to recover
> 11:01:01 secondary-dhcp dhcpd: failover peer dhcp: peer moves from
> recover to recover-done
> 11:01:01 secondary-dhcp dhcpd: failover peer dhcp: I move from
> partner-down to normal
> 11:01:01 secondary-dhcp dhcpd: failover peer dhcp: peer moves from
> recover-done to normal
>
> 11:01:59 secondary-dhcp dhcpd: failover peer dhcp: I move from normal to
> shutdown
> 11:01:59 secondary-dhcp dhcpd: failover peer dhcp: peer moves from
> normal to partner-down
> 11:02:00 secondary-dhcp dhcpd: failover peer dhcp: I move from shutdown
> to recover
> 11:02:05 secondary-dhcp dhcpd: failover peer dhcp: I move from recover
> to startup
> 11:02:10 secondary-dhcp dhcpd: failover peer dhcp: I move from startup
> to recover
>
>
>
>
>
> Primary dhcpd.conf:
>
> option domain-name-servers 1.2.3.11, 1.2.3.10 ;
> option ntp-servers 1.2.3.10, 1.2.3.11;
> default-lease-time 172800;
> max-lease-time 172800;
> one-lease-per-client true;
> ddns-update-style ad-hoc;
> ddns-updates off;
> authoritative;
> failover peer "dhcp" {
>            primary;
>            address 192.168.10.2;
>            port 520;
>            peer address 192.168.11.2;
>            peer port 520;
>            max-response-delay 60;
>            max-unacked-updates 10;
>            mclt 120;
>           split 255;
>            load balance max seconds 5;
>          }
> subnet 192.168.10.0 netmask 255.255.255.0 {
>          }
> ... network definitions
>
>
>
>
>
> Secondary dhcpd.conf:
>
> option domain-name-servers 1.2.3.11, 1.2.3.10 ;
> option ntp-servers 1.2.3.10, 1.2.3.11;
> default-lease-time 172800;
> max-lease-time 172800;
> one-lease-per-client true;
> ddns-update-style ad-hoc;
> ddns-updates off;
> authoritative;
> failover peer "dhcp" {
>            secondary;
>            address 192.168.11.2;
>            port 520;
>            peer address 192.168.10.2;
>            peer port 520;
>            max-response-delay 60;
>            max-unacked-updates 10;
>            load balance max seconds 5;
>          }
> subnet 192.168.11.0 netmask 255.255.255.0 {
> }
> ... network definitions
> _______________________________________________
> dhcp-users mailing list
> dhcp-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/dhcp-users
>
> ------------------------------



More information about the dhcp-users mailing list