Problems with pool balancing.
Erling Paulsen
erling.paulsen at uit.no
Tue May 7 12:19:50 UTC 2013
Update:
Can restarting our servers from typically 8-10 times a day cause
balancing problems? This is due to auto-updates from our
network-admin-tool triggering restarts. We do however restart the
secondary first, and waits for it to complete before restarting the primary.
But, I have come to suspect that our isc-dhcrelay (4.2.5-P1 on FreeBSD
8.3-RELEASE-p7) service is to blame. It receives forwarding packets from
about 100 Cisco routers relay-agents (ip-helpers on vlans) and then
again forwards to alle parties that needs a say in setting up client
parameters - including our primary and secondary dhcp-servers.
When i.e. counting DISCOVER packets for a busy day, I see that the
secondary server is about 13K packets short of the primary and that does
not sound right to me.
The dhcrelay (when running in debug mode) also complains about bad udp
checksums. This is worrying. I have no idea what/why that's all about?
Also, it probably doesn't help that it's running on a vmware host in a
pretty busy cluster!
- Erling
On 05/06/2013 01:44 PM, Erling Paulsen wrote:
> Hello,
>
> We have two servers in failover relationship, both running
> isc-dhcp42-server-4.2.4_2 on FreeBSD 8.3-RELEASE-p7 and we are having
> problems with balancing the pools when pool-usage climbs into the
> higher figures.
>
> Example (numbers from logfiles at about the same timestamp):
>
> Master: total 3905 free 1 backup 512 lts -255
> "lts" is correct according to the the documentation at (free -
> backup)/2 = -255
> Since "leases to share" is negative, the master expects the secondary
> server to hand over leases!
>
> Secondary: total 3905 free 513 backup 0 lts -256
> "lts" is correct according to the the documentation at (free -
> backup)/2 = -256
>
> They do seem to have the same understanding of the current lease
> situatioon. But! "lts" on secondary is also negative, so it's also
> expecting the master to hand over leases!
>
> This cannot possibly end well?
>
> This is what's in the source-code and it seems to comply with the
> description:
>
> if (p->failover_peer->i_am == primary) {
> lts = (p->free_leases - p->backup_leases) / 2;
> peer_lease_state = FTS_BACKUP;
> /* my_lease_state = FTS_FREE; */
> lq = &p->free;
> } else {
> lts = (p->backup_leases - p->free_leases) / 2;
> peer_lease_state = FTS_FREE;
> /* my_lease_state = FTS_BACKUP; */
> lq = &p->backup;
> }
>
> I don't understand what can be the cause of the double-trouble
> negative lts on both sides! And, btw, can someone shed a light on how
> the 'lq' pointer affects the balancing?
>
> Anyone have thoughts of what might be the culprit here?
> Any information would be appreciated.
>
>
> - Erling Paulsen
>
--
---------------------------------|sent-av|-----
Erling Paulsen, Seksjon for Infrastruktur/Nett
TEO 2.402, Universitetet i Tromsø, 9037 TROMSØ .
Kontor (+47) 77 64 64 80 Mob (+47) 91 17 64 01 ..:
More information about the dhcp-users
mailing list