pool misbalance issues on large, busy pool with frequent restarts
Petersen, Kirsten J - NET
Kirsten.Petersen at oregonstate.edu
Wed Jun 2 20:32:46 UTC 2010
We are running ISC DHCP 3.1.1-6 on Debian Lenny.
We have two dhcp servers configured for load balancing and failover. We do a dhcpd-restart every 5 minutes on both servers at the same time to pick up the latest updates to the dhcp config. The max-lease-time is set very short on our wireless networks (1800) because we were running out of leases and wanted them to cycle quickly.
We are seeing an issue where the two servers become out-of-sync with respect to pool balances. Sometimes when this happens, the situation gets bad enough that neither server is handing out leases even when there should be leases still available (we think). To work around that problem, we down the secondary and whack its leases file, then bring it back up and let it recover. This is obviously less than ideal.
Based on what we are seeing in the logs, it looks like one server is sometimes trying to rebalance when its partner is restarting, and it doesn't know that its partner is unavailable. Is that possible?
Logs illustrating brokenness:
May 26 01:02:07 ns2 dhcpd: balancing pool aa8c8a8 128.193.136/21 total 2038 free 895 backup 594 lts -150 max-own (+/-)45 (requesting peer rebalance!)
May 26 01:02:07 ns2 dhcpd: balanced pool aa8c8a8 128.193.136/21 total 2038 free 895 backup 594 lts -150 max-misbal 74
May 26 01:02:07 ns1 dhcpd: balancing pool a8258d8 128.193.136/21 total 2038 free 699 backup 790 lts -45 max-own (+/-)45
May 26 01:03:07 ns2 dhcpd: balancing pool aa8c8a8 128.193.136/21 total 2038 free 896 backup 596 lts -150 max-own (+/-)45
May 26 01:03:07 ns1 dhcpd: balancing pool a8258d8 128.193.136/21 total 2038 free 701 backup 791 lts -45 max-own (+/-)45
May 26 01:03:07 ns1 dhcpd: balanced pool a8258d8 128.193.136/21 total 2038 free 701 backup 791 lts -45 max-misbal 75
A few questions:
* Is there something horribly wrong with our config or the way we are doing restarts?
* Is there a best practice for restarting load-balanced dhcp servers? One idea we had was to write a script that would do something like the following:
- shutdown primary
- tell secondary partner is down
- startup primary
- check status of primary
- tell secondary partner is up
- wait 10 seconds or so
- repeat above for secondary
Does the above look reasonable, or is there a better way?
* When we shutdown a server, should we set its status to "recover-wait" or something else before shutdown, so that it comes up in a state where it is not trying to hand out leases? Or, does the software do this by default? It looks like status information is read from the leases file...
* Or, would it be better to shutdown the primary, then shutdown the secondary, then bring up the primary, and then bring up the secondary? That way, both servers are down at the same time, and neither one tries to rebalance when its partner is unavailable. (Obviously this would result in a brief outage.)
* Is there a way to tell the servers at what times or what interval they should rebalance pools? If not, how do the servers decide when to check for rebalance? If there is a fixed frequency, what is it?
The configs follow.
# primary config
failover peer "dhcp" {
primary;
address x.y.z.10;
port 520;
# our peer is ns2
peer address x.y.z.20;
peer port 520;
max-response-delay 60;
max-unacked-updates 10;
mclt 600;
split 128;
load balance max seconds 3;
max-lease-misbalance 5;
max-lease-ownership 3;
}
# secondary config
failover peer "dhcp" {
secondary;
address x.y.z.20;
port 520;
# ns1 is our secondary
peer address x.y.z.10;
peer port 520;
max-response-delay 60;
max-unacked-updates 10;
load balance max seconds 3;
max-lease-misbalance 5;
max-lease-ownership 3;
}
# Example wireless network definition
subnet x.y.136.0 netmask 255.255.248.0 {
max-lease-time 1800;
option subnet-mask 255.255.248.0;
option netbios-name-servers x.y.z.39;
option routers x.y.136.1;
use-host-decl-names false;
default-lease-time 1800;
option netbios-node-type 8;
pool {
failover peer "dhcp";
deny dynamic bootp clients;
range x.y.136.8 x.y.143.253;
}
}
________________
Kirsten Petersen
Network Services * Oregon State University
http://oregonstate.edu/net * irc.oregonstate.edu #osu-is
"Aging is bad for your health." - Bent Petersen
More information about the dhcp-users
mailing list