Secondary server in failover fails to come out of recover state

Steven Carr sjcarr at gmail.com
Sat May 11 11:51:35 UTC 2013


What happens in the logs at 15:20 when the servers should then have
passed the MCLT time and come out of recovery?

Why put the system into partner-down at 15:14, what was the reasoning
behind this given both servers were online?

On 10 May 2013 22:00, Oscar Ricardo Silva <osilva at scuff.cc.utexas.edu> wrote:
> I have changed the split value to 128 and raised the MCLT to 300.  After the
> change, both servers were reloaded and came up normally.  Twenty minutes
> later, someone on staff made a change and the primary returned to a normal
> state but then the secondary stayed in recover mode as we've seen before.
>
> Here's the logs, the configuration files (including some of the pool
> statements).  The primary is taken down at 15:09:13 and returns to normal at
> 15:14:13.  The secondary is then taken down at 15:14:44 but then the last
> message was received at 15:15:47 (the logs were copied at 15:33:00)
>
>
>
> Logs from primary:
>
> 15:09:13 primary-dhcp dhcpd: failover peer dhcp: I move from shutdown to
> recover
> 15:10:13 primary-dhcp dhcpd: failover peer dhcp: I move from recover to
> startup
> 15:10:13 primary-dhcp dhcpd: failover peer dhcp: I move from startup to
> recover
> 15:13:31 primary-dhcp dhcpd: failover peer dhcp: peer update completed.
> 15:13:31 primary-dhcp dhcpd: failover peer dhcp: I move from recover to
> recover-wait
> 15:14:13 primary-dhcp dhcpd: failover peer dhcp: I move from recover-wait to
> recover-done
> 15:14:13 primary-dhcp dhcpd: failover peer dhcp: peer moves from
> partner-down to normal
> 15:14:13 primary-dhcp dhcpd: failover peer dhcp: I move from recover-done to
> normal
> 15:14:44 primary-dhcp dhcpd: failover peer dhcp: peer moves from normal to
> shutdown
> 15:14:44 primary-dhcp dhcpd: failover peer dhcp: I move from normal to
> partner-down
> 15:14:45 primary-dhcp dhcpd: peer dhcp: disconnected
> 15:15:47 primary-dhcp dhcpd: failover peer dhcp: peer moves from shutdown to
> recover
> 15:15:47 primary-dhcp dhcpd: failover peer dhcp: peer moves from recover to
> recover
>
>
>
>
> Logs from secondary:
>
> 15:09:12 secondary-dhcp dhcpd: failover peer dhcp: peer moves from normal to
> shutdown
> 15:09:12 secondary-dhcp dhcpd: failover peer dhcp: I move from normal to
> partner-down
> 15:09:13 secondary-dhcp dhcpd: peer dhcp: disconnected
> 15:10:13 secondary-dhcp dhcpd: failover peer dhcp: peer moves from shutdown
> to recover
> 15:10:13 secondary-dhcp dhcpd: failover peer dhcp: peer moves from recover
> to recover
> 15:13:31 secondary-dhcp dhcpd: failover peer dhcp: peer moves from recover
> to recover-wait
> 15:14:13 secondary-dhcp dhcpd: failover peer dhcp: peer moves from
> recover-wait to recover-done
> 15:14:13 secondary-dhcp dhcpd: failover peer dhcp: I move from partner-down
> to normal
> 15:14:13 secondary-dhcp dhcpd: failover peer dhcp: peer moves from
> recover-done to normal
> 15:14:44 secondary-dhcp dhcpd: failover peer dhcp: I move from normal to
> shutdown
> 15:14:44 secondary-dhcp dhcpd: failover peer dhcp: peer moves from normal to
> partner-down
> 15:14:45 secondary-dhcp dhcpd: failover peer dhcp: I move from shutdown to
> recover
> 15:15:47 secondary-dhcp dhcpd: failover peer dhcp: I move from recover to
> startup
> 15:15:47 secondary-dhcp dhcpd: failover peer dhcp: I move from startup to
> recover
>
>
>
>
>
> Primary:
>
>
> option domain-name-servers 192.168.50.41, 192.168.50.40 ;
> option ntp-servers 192.168.50.40, 192.168.50.41;
> default-lease-time 172800;
> max-lease-time 172800;
> one-lease-per-client true;
> ddns-update-style ad-hoc;
> ddns-updates off;
> authoritative;
> key-off-mac-address true;
> if substring (option dhcp-client-identifier, 0, 5) = 01:52:41:53:20 {
>         deny booting;
> }
> option voip-tftp-server-address code 150 = array of ip-address ;
> set vendor-string = option vendor-class-identifier;
> failover peer "dhcp" {
>          primary;
>          address 192.168.100.2;
>          port 520;
>          peer address 192.168.101.2;
>          peer port 520;
>          max-response-delay 60;
>          max-unacked-updates 10;
>          mclt 300;
>          split 128;
>
>          load balance max seconds 5;
>        }
> subnet 192.168.100.0 netmask 255.255.255.224 {
>         }
>
>
> subnet 192.168.75.128 netmask 255.255.255.128 {
>                 pool {
>                         range 192.168.75.130 192.168.75.254;
>                         deny dynamic bootp clients ;
>                         failover peer "dhcp" ;
>                 }
>         option subnet-mask 255.255.255.128;
>         option broadcast-address 255.255.255.255;
>         option routers 192.168.75.129;
> }
>
> subnet 192.168.235.0 netmask 255.255.255.128 {
>                 pool {
>                         range 192.168.235.13 192.168.235.126;
>
>                         deny dynamic bootp clients ;
>                         failover peer "dhcp" ;
>                 }
>         option subnet-mask 255.255.255.128;
>         option broadcast-address 255.255.255.255;
>         option routers 192.168.235.1;
> }
>
>
>
>
>
> Secondary:
>
> option domain-name-servers 192.168.50.40, 192.168.50.41 ;
>
> option ntp-servers 192.168.50.40, 192.168.50.41;
> default-lease-time 172800;
> max-lease-time 172800;
> one-lease-per-client true;
> ddns-update-style ad-hoc;
> ddns-updates off;
> authoritative;
> key-off-mac-address true;
> if substring (option dhcp-client-identifier, 0, 5) = 01:52:41:53:20 {
>         deny booting;
> }
> option voip-tftp-server-address code 150 = array of ip-address ;
> set vendor-string = option vendor-class-identifier;
> failover peer "dhcp" {
>          secondary;
>          address 192.168.101.2;
>          port 520;
>          peer address 192.168.100.2;
>          peer port 520;
>          max-response-delay 60;
>          max-unacked-updates 10;
>          load balance max seconds 5;
>        }
> subnet 192.168.101.0 netmask 255.255.255.224 {
>         }
>
> subnet 192.168.75.128 netmask 255.255.255.128 {
>                 pool {
>                         range 192.168.75.130 192.168.75.254;
>                         deny dynamic bootp clients ;
>                         failover peer "dhcp" ;
>                 }
>         option subnet-mask 255.255.255.128;
>         option broadcast-address 255.255.255.255;
>         option routers 192.168.75.129;
> }
>
> subnet 192.168.235.0 netmask 255.255.255.128 {
>                 pool {
>                         range 192.168.235.13 192.168.235.126;
>
>                         deny dynamic bootp clients ;
>                         failover peer "dhcp" ;
>                 }
>         option subnet-mask 255.255.255.128;
>         option broadcast-address 255.255.255.255;
>         option routers 192.168.235.1;
>
> }
>
>
>
>
>
> On 04/30/2013 03:37 PM, Steven Carr wrote:
>>
>> Can't see anything in the config that is suspect to be honest.
>>
>> I assume you have a 'failover peer "dhcp";' statement inside each pool
>> statement? (that's why I asked for full config)
>>
>> Personally I would change mclt to 3600 and spilt to 128 (there are
>> only a handful of situations where I would see split set to 0 or 255
>> the main one being when you have branch networks with a local DHCP
>> server and need a centralised "backup" DHCP incase the branch fails).
>>
>> You could also try changing the port and peer port numbers (maybe
>> something >1024?) just on the off chance that it is being
>> blocked/terminated by something else, and it would be worth getting
>> packet captures going on each system to see exactly what comms are
>> happening between the two during the startup.
>>
>> The only other thought I have is that it could be something to do with
>> the patch you have wrote. I'm not sure what impact this has had on the
>> data being written out to the leases file or being synchronised (you
>> might see this in a packet capture) but it could be choking on
>> something in that data that wasn't originally meant to be in there.
>>
>> If you do change the split value then I would also flip the order of
>> domain-name-servers on the secondary server to load balance across the
>> two DNS servers, rather than dumping all queries on the first DNS
>> server.
>>
>> Steve
>> _______________________________________________
>> dhcp-users mailing list
>> dhcp-users at lists.isc.org
>> https://lists.isc.org/mailman/listinfo/dhcp-users
>>
>
> _______________________________________________
> dhcp-users mailing list
> dhcp-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/dhcp-users


More information about the dhcp-users mailing list