Secondary server in failover fails to come out of recover state
Oscar Ricardo Silva
oscars at mail.utexas.edu
Tue Apr 30 19:41:35 UTC 2013
I should mention that we've written a patch to use the MAC address as
the identifier instead of the client identifier. We've done this so
that a device will have the same identifier no matter what operating
system it boots into. This is an issue for multi-boot devices or if a
devices boots into a PXE-boot environment and that's why you'll a this
line in the configuration:
key-off-mac-address true;
Operating system versions:
primary: RHEL 6.2, kernel 2.6.32-220.7.1.el6.i686
secondary: RHEL 6.3, kernel 2.6.32-279.19.1.el6.i686
Primary server:
option domain-name-servers 192.168.50.41, 192.168.50.40 ;
option ntp-servers 192.168.50.40, 192.168.50.41;
default-lease-time 172800;
max-lease-time 172800;
one-lease-per-client true;
ddns-update-style ad-hoc;
ddns-updates off;
authoritative;
key-off-mac-address true;
if substring (option dhcp-client-identifier, 0, 5) = 01:52:41:53:20 {
deny booting;
}
option voip-tftp-server-address code 150 = array of ip-address ;
set vendor-string = option vendor-class-identifier;
failover peer "dhcp" {
primary;
address 192.168.100.2;
port 520;
peer address 192.168.101.2;
peer port 520;
max-response-delay 60;
max-unacked-updates 10;
mclt 300;
split 255;
load balance max seconds 5;
}
subnet 192.168.100.0 netmask 255.255.255.224 {
}
Secondary:
option domain-name-servers 192.168.50.41, 192.168.50.40 ;
option ntp-servers 192.168.50.40, 192.168.50.41;
default-lease-time 172800;
max-lease-time 172800;
one-lease-per-client true;
ddns-update-style ad-hoc;
ddns-updates off;
authoritative;
key-off-mac-address true;
if substring (option dhcp-client-identifier, 0, 5) = 01:52:41:53:20 {
deny booting;
}
option voip-tftp-server-address code 150 = array of ip-address ;
set vendor-string = option vendor-class-identifier;
failover peer "dhcp" {
secondary;
address 192.168.101.2;
port 520;
peer address 192.168.100.2;
peer port 520;
max-response-delay 60;
max-unacked-updates 10;
load balance max seconds 5;
}
subnet 192.168.101.0 netmask 255.255.255.224 {
}
> Date: Tue, 30 Apr 2013 19:58:06 +0100
> From: Steven Carr <sjcarr at gmail.com>
> To: Users of ISC DHCP <dhcp-users at lists.isc.org>
> Subject: Re: Secondary server in failover fails to come out of recover
> state
> Message-ID:
> <CALMep05YRY9LWtGsCQUV8uHXV0CAVOVJADiAoKYx8_A90=zHDA at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Can you post the two full configs somewhere (or as near full as you can
> without removing the main bulk of the config)? or feel free to email them
> to me directly. Also, so I can try to reproduce in our lab what OS
are you
> running?
>
On 04/30/2013 01:34 PM, Oscar Ricardo Silva wrote:
> OK, I've tried running the server in debug mode but I don't see any
> additional information available. This happened again today. Also, as
> previously suggested, I have raised the mclt from 120 to 300.
>
>
> At 11am, a configuration change was made on the primary and it was
> restarted. Here's the logs from the secondary and you'll see that at
> 11:06:55 both servers moved to a "normal" state.
>
> Apr 30 11:00:23 secondary-dhcp dhcpd: failover peer dhcp: peer moves
> from normal to shutdown
> Apr 30 11:00:23 secondary-dhcp dhcpd: failover peer dhcp: I move from
> normal to partner-down
> Apr 30 11:00:24 secondary-dhcp dhcpd: peer dhcp: disconnected
> Apr 30 11:03:36 secondary-dhcp dhcpd: failover peer dhcp: peer moves
> from shutdown to recover
> Apr 30 11:03:36 secondary-dhcp dhcpd: failover peer dhcp: peer moves
> from recover to recover
> Apr 30 11:06:55 secondary-dhcp dhcpd: failover peer dhcp: peer moves
> from recover to recover-done
> Apr 30 11:06:55 secondary-dhcp dhcpd: failover peer dhcp: I move from
> partner-down to normal
> Apr 30 11:06:55 secondary-dhcp dhcpd: failover peer dhcp: peer moves
> from recover-done to normal
>
>
>
> At 11:07:42, the secondary was restarted and these are the only entries
> in the log:
>
> Apr 30 11:07:42 secondary-dhcp dhcpd: failover peer dhcp: I move from
> normal to shutdown
> Apr 30 11:07:42 secondary-dhcp dhcpd: failover peer dhcp: peer moves
> from normal to partner-down
> Apr 30 11:07:43 secondary-dhcp dhcpd: failover peer dhcp: I move from
> shutdown to recover
> Apr 30 11:08:45 secondary-dhcp dhcpd: failover peer dhcp: I move from
> recover to startup
> Apr 30 11:08:45 secondary-dhcp dhcpd: failover peer dhcp: I move from
> startup to recover
>
> two hours later, the secondary server is still recovering.
>
>
>
> Again, here's the strangest part of this issue: when I take down the
> secondary server (dhcpd not running at all), the primary still reports
> that the secondary is in recover mode. dhcpd was stopped on the
> secondary at 13:07:08 and here's what the primary reports:
>
> Apr 30 13:04:44 primary-dhcp dhcpd: peer dhcp: disconnected
>
>
> $Tue Apr 30 13:14:38 CDT 2013
>
> partner-state = 00:00:00:06
> local-state = 00:00:00:04
>
>
>
> There are router acls on interfaces between the two servers but the
> networks on which each server resides is completely allowed without
> restriction. iptables is running on each server but again, no
> restrictions on communications between the two. If there was a firewall
> issue then the servers would never have returned to a "normal" state
> after the primary was restarted.
>
> Time is perfectly sync'ed between the two servers.
>
>
>
>> Message: 2
>> Date: Thu, 25 Apr 2013 00:01:45 +0100
>> From: Steven Carr <sjcarr at gmail.com>
>> To: Users of ISC DHCP <dhcp-users at lists.isc.org>
>> Subject: Re: Secondary server in failover fails to come out of recover
>> state
>> Message-ID:
>> <CALMep064aX_L0q5ry4A4SLGn-x=pV1ou4ECdoRGK9o8fx2_DHg at mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Can you crank up the logging level to debug (IIRC this needs to be
>> done via
>> syslog) so it details exactly what it is doing when it goes into RECOVER
>> state, it may give some extra pointers.
>>
>>
>> On 24 April 2013 23:50, Oscar Ricardo Silva <oscars at mail.utexas.edu>
>> wrote:
>>
>>> I should note that while it was recovering, the primary reported:
>>>
>>> partner-state = 00:00:00:06
>>> local-state = 00:00:00:04
>>>
>>>
>>> and the secondary reported:
>>>
>>> partner-state = 00:00:00:04
>>> local-state = 00:00:00:06
>>>
>>>
>>>
>>> In following another suggestion (recreate an empty dhcpd.leases file), I
>>> shutdown the secondary but the primary still reported:
>>>
>>> partner-state = 00:00:00:06
>>> local-state = 00:00:00:04
>>>
>>>
>>>
>>>
>>> The change that was made was the addition of these two scopes:
>>>
>>>
>>> subnet 192.168.75.128 netmask 255.255.255.128 {
>>> pool {
>>> range 192.168.75.130 192.168.75.254;
>>> deny dynamic bootp clients ;
>>> failover peer "dhcp" ;
>>> }
>>> option domain-name "dept.utexas.edu";
>>> option subnet-mask 255.255.255.128;
>>> option broadcast-address 255.255.255.255;
>>> option routers 192.168.75.129;
>>> }
>>>
>>>
>>> subnet 192.168.228.32 netmask 255.255.255.224 {
>>> pool {
>>> range 192.168.228.34 192.168.228.62;
>>> deny dynamic bootp clients ;
>>> failover peer "dhcp" ;
>>> }
>>> default-lease-time 7200;
>>> max-lease-time 7200;
>>> option domain-name "dept.utexas.edu";
>>> option subnet-mask 255.255.255.224;
>>> option broadcast-address 255.255.255.255;
>>> option routers 192.168.228.33;
>>> }
>>>
>>>
>>> the new scopes were first added to the primary, it was then reloaded.
>>> After both servers were in a "normal" state, the corresponding change
>>> was
>>> made on the secondary and it was reloaded.
>>>
>>> Per Stephen Carr's suggestion, I have increased the MCLT to 300 and both
>>> servers are still in the same state.
>>>
>>>
>>>
>>>
>>>
>>> On 04/24/2013 04:40 PM, Oscar Ricardo Silva wrote:
>>>
>>>> We have two servers in a failover relationship, both running
>>>> 4.1-ESV-R7.
>>>> After a reload of dhcpd on the secondary, it has not come out of the
>>>> recover state after almost an hour. We've had this happen with 3.1.3
>>>> and recently upgraded to this version. The only thing we've been able
>>>> to do is stop both instances of dhcpd and remove "my state" and
>>>> "partner
>>>> state" from dhcpd.leases.
>>>>
>>>>
>>>> Here's the timeline of what happened.
>>>>
>>>> 1. A change was made to the configuration of the primary and dhcpd
>>>> reloaded at 15:39:14.
>>>> 2. The primary moved back to a "normal" state at 15:43:42
>>>>
>>>> Apr 24 15:39:14 primary-dhcp dhcpd: failover peer dhcp: I move from
>>>> normal to shutdown
>>>> Apr 24 15:39:15 primary-dhcp dhcpd: failover peer dhcp: peer moves from
>>>> normal to partner-down
>>>> Apr 24 15:39:15 primary-dhcp dhcpd: failover peer dhcp: I move from
>>>> shutdown to recover
>>>> Apr 24 15:40:18 primary-dhcp dhcpd: failover peer dhcp: I move from
>>>> recover to startup
>>>> Apr 24 15:40:18 primary-dhcp dhcpd: failover peer dhcp: I move from
>>>> startup to recover
>>>> Apr 24 15:43:42 primary-dhcp dhcpd: failover peer dhcp: peer update
>>>> completed.
>>>> Apr 24 15:43:42 primary-dhcp dhcpd: failover peer dhcp: I move from
>>>> recover to recover-done
>>>> Apr 24 15:43:42 primary-dhcp dhcpd: failover peer dhcp: peer moves from
>>>> partner-down to normal
>>>> Apr 24 15:43:42 primary-dhcp dhcpd: failover peer dhcp: I move from
>>>> recover-done to normal
>>>> Apr 24 15:44:53 primary-dhcp dhcpd: failover peer dhcp: peer moves from
>>>> normal to shutdown
>>>> Apr 24 15:44:53 primary-dhcp dhcpd: failover peer dhcp: I move from
>>>> normal to partner-down
>>>> Apr 24 15:44:54 primary-dhcp dhcpd: peer dhcp: disconnected
>>>> Apr 24 15:45:59 primary-dhcp dhcpd: failover peer dhcp: peer moves from
>>>> shutdown to recover
>>>> Apr 24 15:45:59 primary-dhcp dhcpd: failover peer dhcp: peer moves from
>>>> recover to recover
>>>>
>>>>
>>>>
>>>> 3. The corresponding change was made on the secondary and it was
>>>> reloaded at 15:44:53
>>>>
>>>> 4. At 15:44:54 it came back up into recover, then moved from
>>>> recover to
>>>> startup, then from startup to recover. That's where it's been ever
>>>> since.
>>>>
>>>> Apr 24 15:44:53 secondary-dhcp dhcpd: failover peer dhcp: I move from
>>>> normal to shutdown
>>>> Apr 24 15:44:53 secondary-dhcp dhcpd: failover peer dhcp: peer moves
>>>> from normal to partner-down
>>>> Apr 24 15:44:54 secondary-dhcp dhcpd: failover peer dhcp: I move from
>>>> shutdown to recover
>>>> Apr 24 15:45:56 secondary-dhcp dhcpd: failover peer dhcp: I move from
>>>> recover to startup
>>>> Apr 24 15:45:59 secondary-dhcp dhcpd: failover peer dhcp: I move from
>>>> startup to recover
>>>>
>>>>
>>>>
>>>> Here's dhcpd.conf for the primary:
>>>>
>>>> option domain-name-servers 192.168.50.41, 192.168.50.40 ;
>>>> option ntp-servers 192.168.50.40, 192.168.50.41;
>>>> default-lease-time 86400;
>>>> max-lease-time 86400;
>>>> one-lease-per-client true;
>>>> ddns-update-style ad-hoc;
>>>> ddns-updates off;
>>>> authoritative;
>>>> if substring (option dhcp-client-identifier, 0, 5) = 01:52:41:53:20 {
>>>> deny booting;
>>>> }
>>>> option voip-tftp-server-address code 150 = array of ip-address ;
>>>> set vendor-string = option vendor-class-identifier;
>>>> failover peer "dhcp" {
>>>> primary;
>>>> address 192.168.100.2;
>>>> port 520;
>>>> peer address 192.168.101.2;
>>>> peer port 520;
>>>> max-response-delay 60;
>>>> max-unacked-updates 10;
>>>> mclt 120;
>>>> split 255;
>>>> load balance max seconds 5;
>>>> }
>>>> subnet 192.168.100.0 netmask 255.255.255.224 {
>>>> }
>>>> include "/dhcpd/dhcpd.network.conf";
>>>>
>>>>
>>>> and the /dhcpd/dhcpd.network.conf file holds the scope definitions.
>>>> Both
>>>> servers sync time through ntp and have the same exact time.
>>>>
>>>>
>>>> Any information would be appreciated.
>>>>
>>>>
>>>>
>>>>
>>> ______________________________**_________________
>>> dhcp-users mailing list
>>> dhcp-users at lists.isc.org
>>> https://lists.isc.org/mailman/**listinfo/dhcp-users<https://lists.isc.org/mailman/listinfo/dhcp-users>
>>>
>>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>> <https://lists.isc.org/pipermail/dhcp-users/attachments/20130425/c084fffc/attachment-0001.html>
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> dhcp-users mailing list
>> dhcp-users at lists.isc.org
>> https://lists.isc.org/mailman/listinfo/dhcp-users
>>
>> End of dhcp-users Digest, Vol 54, Issue 21
>> ******************************************
>>
>
More information about the dhcp-users
mailing list