General questions about failover, config changes and restarting

James Dore james.dore at new.ox.ac.uk
Wed Mar 2 16:52:26 UTC 2016


Hi Glenn,

Thanks for that - what am I looking for in the dhcpd.log that tells me synchronisation has finished on the first server?

I ask because we’ve had occasions in the past where I’ve restarted the first server, but left the second for a couple of hours, and we stop getting addresses issued to new clients. This is the kind of log message we get during this situation - 

dhcpd.log-20151123:2015-11-19T11:45:33.497093+00:00 garibaldi dhcpd: DHCPDISCOVER from 58:7f:57:17:00:1f (Keiths-iPhone-2) via 163.1.173.254: not responding (recover wait)

and they don’t clear until both the peers have moved back to ‘normal’. 

I could see if there’s more log detail I can turn on, I suppose. 

Cheers,
James


> On 2 Mar 2016, at 15:34, Glenn Satchell <glenn.satchell at uniq.com.au> wrote:
> 
> Hi James
> 
> The configurations for the subnets and everything except the failover (and
> possibly the keys) should be exactly the same, so editting one and scp the
> file to the other server is exactly the right thing to do.
> 
> It doesn't matter too much which server is restarted first, but you should
> not restart the second until the first has finished synchronising lease
> information. This may take a little while if there are many thousands of
> leases - I see you have a /22 and /16, so maybe up to 17000 or so leases.
> Could take a few minuted depending on network speed and latency between
> the servers.
> 
> Once the first server has finished synchronising, then it's ok to restart
> the other server, and this should synchronise much quicker.
> 
> regards,
> -glenn
> 
> On Wed, March 2, 2016 11:36 pm, James Dore wrote:
>> Hi all,
>> 
>> Iâ?Tve had a pair of DHCP servers running in a load balance/failover
>> cluster for about 9 months, but havenâ?Tt really got my head round what
>> happens when I make a change to the configuration.
>> 
>> I have a bunch of config files called from the main config file thus:
>> 
>> ##########################
>> #                        #
>> # Failover configuration #
>> #                        #
>> ##########################
>> failover peer "newc-dhcp" {
>>    primary;
>>    address 129.67.111.199; # address of this server
>>    port 519;
>>    peer address 129.67.111.243; # address of the secondary dhcpd
>>    peer port 519;
>>   max-response-delay 60;
>>   max-unacked-updates 10;
>>   mclt 600;
>>   split 128;
>>   load balance max seconds 3;
>> }
>> 
>> key primaryhost {
>>    algorithm hmac-md5;
>>    secret <ssshhh!>
>> };
>> 
>> omapi-key primaryhost;
>> omapi-port 7911;
>> 
>> 
>> ###########################
>> #                         #
>> # Load the golbal options #
>> #                         #
>> ###########################
>> 
>> include "/etc/dhcpd.d/master.conf"; # (Rarely!) Edit this file to set
>> global options
>> 
>> ########################
>> #                      #
>> # Subnet config files  #
>> #                      #
>> ########################
>> 
>> include "/etc/dhcpd.d/vlan1.conf"; # 129.67.108.0/22 Main subnet and
>> static assignments
>> include "/etc/dhcpd.d/vlan3.conf"; # 10.30.0.0/22 Devices subnet config
>> and static assignments
>> include "/etc/dhcpd.d/vlan4.conf"; # 10.4.0.0/16 NAT Vlan4 Subnet config
>> and static assignments
>> include "/etc/dhcpd.d/annexe.conf"; # 163.1.173.0/24 Annexe subnet config
>> and static assignments
>> 
>> Both peers have pretty similar config files, the only difference being the
>> secret and the address/peer address settings. Everything else is the same.
>> (Should it be?)
>> 
>> The things Iâ?Tm curious about are what happens when I make a change to
>> one of the Subnet config files, for instance to add a new static
>> assignment. My usual method has been to edit the file one peer, and then
>> scp it over to the other peer. After that, it seems like I need to do a
>> number of restarts of each peer before they both return to Normal status.
>> They seem to get stuck in Partner-down, Recover, or Recover Wait status
>> for a while.
>> 
>> If I can get them both in Recover Wait, then they will synchronise, but it
>> seems to be difficult to get them there.
>> 
>> Is there anything I can do to smooth the process?
>> 
>> I canâ?Tt find much info about troubleshooting failover or load balancing,
>> all my googling has turned up is instructions on initial setup. Does
>> anyone have some useful pointers or links?
>> 
>> Cheers,
>> James
>> 
>> 
>> _______________________________________________
>> dhcp-users mailing list
>> dhcp-users at lists.isc.org
>> https://lists.isc.org/mailman/listinfo/dhcp-users
>> 
>> 
> 
> 
> _______________________________________________
> dhcp-users mailing list
> dhcp-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/dhcp-users



More information about the dhcp-users mailing list