Problems with pool balancing.

Erling Paulsen erling.paulsen at uit.no
Tue May 7 12:19:50 UTC 2013


Update:

Can restarting our servers from typically 8-10 times a day cause 
balancing problems? This is due to auto-updates from our 
network-admin-tool triggering restarts. We do however restart the 
secondary first, and waits for it to complete before restarting the primary.

But, I have come to suspect that our isc-dhcrelay (4.2.5-P1 on FreeBSD  
8.3-RELEASE-p7) service is to blame. It receives forwarding packets from 
about 100 Cisco routers relay-agents (ip-helpers on vlans) and then 
again forwards to alle parties that needs a say in setting up client 
parameters - including our primary and secondary dhcp-servers.

When i.e. counting DISCOVER packets for a busy day, I see that the 
secondary server is about 13K packets short of the primary and that does 
not sound right to me.

The dhcrelay (when running in debug mode) also complains about bad udp 
checksums. This is worrying. I have no idea what/why that's all about?

Also, it probably doesn't help that it's running on a vmware host in a 
pretty busy cluster!

- Erling


On 05/06/2013 01:44 PM, Erling Paulsen wrote:
> Hello,
>
> We have two servers in failover relationship, both running 
> isc-dhcp42-server-4.2.4_2 on FreeBSD 8.3-RELEASE-p7 and we are having 
> problems with balancing the pools when pool-usage climbs into the 
> higher figures.
>
> Example (numbers from logfiles at about the same timestamp):
>
> Master: total 3905  free 1  backup 512  lts -255
> "lts" is correct according to the the documentation at (free - 
> backup)/2 = -255
> Since "leases to share" is negative, the master expects the secondary 
> server to hand over leases!
>
> Secondary: total 3905  free 513  backup 0  lts -256
> "lts" is correct according to the the documentation at (free - 
> backup)/2 = -256
>
> They do seem to have the same understanding of the current lease 
> situatioon. But! "lts" on secondary is also negative, so it's also 
> expecting the master to hand over leases!
>
> This cannot possibly end well?
>
> This is what's in the source-code and it seems to comply with the 
> description:
>
>                 if (p->failover_peer->i_am == primary) {
>                         lts = (p->free_leases - p->backup_leases) / 2;
>                         peer_lease_state = FTS_BACKUP;
>                         /* my_lease_state = FTS_FREE; */
>                         lq = &p->free;
>                 } else {
>                         lts = (p->backup_leases - p->free_leases) / 2;
>                         peer_lease_state = FTS_FREE;
>                         /* my_lease_state = FTS_BACKUP; */
>                         lq = &p->backup;
>                 }
>
> I don't understand what can be the cause of the double-trouble 
> negative lts on both sides! And, btw, can someone shed a light on how 
> the 'lq' pointer affects the balancing?
>
> Anyone have thoughts of what might be the culprit here?
> Any information would be appreciated.
>
>
> - Erling Paulsen
>


-- 
---------------------------------|sent-av|-----
Erling Paulsen, Seksjon for Infrastruktur/Nett
TEO 2.402, Universitetet i Tromsø, 9037 TROMSØ  .
Kontor (+47) 77 64 64 80 Mob (+47) 91 17 64 01 ..:



More information about the dhcp-users mailing list