Performance issue ( maybe )

Tue Sep 7 13:52:33 UTC 2010

On 09/07/10 17:58, Bjarne Blichfeldt wrote:
> ok the plot thickens..
>
>> -----Original Message-----
>> From: Glenn Satchell
>> Sent: 6. september 2010 15:09
>> To: Users of ISC DHCP
>> Subject: Re: Performance issue ( maybe )
>>
>> Ok, so this looks like some sort of networking issue, perhaps NIC,
>> cables or switch port? Check you have up to date drivers for your NICs.
>>
>> Run ifconfig and look for any errors or collisions. Check the speed and
>> duplex settings for the NIC and ask the network guys to check the same
>> settings on the switch port. Check cables are well seated in the server,
>> and if you can, on the switch.
>>
>> Try an ftp of a largish (few 10s of megabytes) file between each of the
>> servers and a third one to see if one works well and the other has some
>> problems? This will help isolate the system and give you a nice test case.
>>
>> Good luck, but at least a problem has been found, now to fix it!
>
> Agreed on checking the network, but so far everything seems to be in order, spanning tree, half/full duplex,
> no errors on the interfaces, no one else having issues, ftp from dhcp1 to dhcp2 runs close to 100Mb. No abnormal traffic.
>
> However, as Tom brought to my attention, there are an awful lot of pool balancing going on. So here is a thought:
> what if the pool balancing creates so much load on the dhcpservice, that the failover connection is lost ? That will create a
> runaway situation.
>
> The situation last week seems to have escalated after a configuration change. We do configuration changes by
> 1. pushing a new config to the primary server, then restart dhcpd.
> 2. pushing a new config to the secondary server, the restrt dhcpd
>
> In both cases, we get communications-interrupted.

communications-interrupted is expected when you stop one of the servers. 
You may need to wait for a few minutes after restarting the first dhcpd 
before restarting the second one. This is to allow time for it to 
synchronise the leases.

Also, I always restart the secondary first. The reason behind this is 
that if you add a new subnet and restart the primary, the running 
secondary will complain about an unknown subnet when the primary tries 
to sync the leases. Doing the secondary first avoids this issue.

> My failover clause is :
>
> failover peer "ipc-dhcp1-ipc-dhcp2" {
>          primary;
>          address 10.11.90.73;
>          port 647;
>          peer address 10.11.90.74;
>          peer port 647;
>          max-response-delay 90;
>          max-unacked-updates 20;
>          mclt 1800;
>          split 128;
>          load balance max seconds 5;
>     }
>
> That means defaults for :
> min-balance 60;
> max-balance 3600;
>
> Tom Schmitt mentioned 2 hours, I assume min-balance time.
>
> My initial thought is to increase our values to:
> min-balance 1800;
> max-balance 7200;
>
Those settings should at least confirm whether or not the frequent 
balancing is causing a problem. The numbers seem reasonable. I guess 
min-balance needs to be big enough to allow a full resync of the leases 
to copy across to the other server.

If you see too many addresses being balanced after half an hour you 
might need to make it a bit smaller. I guess it depends on your lease 
length, but ISTR you said these were quite large.

-- 
regards,
-glenn
--
Glenn Satchell                            |  Miss 9: What do you
Uniq Advances Pty Ltd, Sydney Australia   |  do at work Dad?
mailto:glenn.satchell at uniq.com.au         |  Miss 6: He just
http://www.uniq.com.au tel:0409-458-580   |  types random stuff.