'MAC affinity' doing the exact opposite, leading to 'pool churn'?

Tue Sep 25 13:19:10 UTC 2007

> This client hashes to the primary (or I assume s1 is primary) - mac
> affinity assigns leases to the secondary.  That is, 'free' is the
> default case, the whole affinity thing is an extra step that moves
> free leases to backup if they reverse-hash.  So it would be unusual,
> but I suppose it's not impossible as far as bugs go.

Yes, that's my assumption so far.  Could you (or anyone else knowledgeable
on this subject) explain how 'MAC address affinity' decides what to do?
Does it execute the exact same algorithm as load balancing when a lease
expires, setting the lease state to 'BACKUP' iff requests by the client
would be load balanced to the secondary?  Does this work independently of
pool balancing, upon each lease expiry disregarding schedules and
thresholds related to pool balancing?  So, as a high-level result of all
this, if a client's request gets load balanced to a certain peer,
'MAC address affinity' tries to keep freed leases belonging to that client
with that peer, right?

> However, I would tend to suspect pool balance to be the culprit here,
> since your primary appears to be pushing the upper limit on what it's
> allowed to own for the purposes of affinity (20 for this pool).  This
> suggests that the primary is having to send some number of the leases
> it would like to keep for itself - but must give over in order to keep
> the pool reasonably well balanced.

There's only the one client ever using this pool.  So "pushing the upper
limit" isn't really appropriate here, I guess.  Everything that happens
in this pool, is in the logs.  In fact, the primary is even giving out
leases, so it's actually moving away from the upper limit mentioned.

If pool balancing would be involved, shouldn't the primary be able to find
*other* leases, leases that it hasn't used before, to send to the
secondary in order to balance the pool?

> If that were the case, however, I'd also expect to see more 'pool' log
> lines, since that's the only way such leases would get moved over to
> the secondary (and then only if the primary's "lts" calculates out to
> greater than max-own).  This kind of balance is not effected on the
> 'client affinity' steps.

Well, the only other 'pool' log lines I see, are the ones that get logged
right after starting the servers.  These are also the log lines that
directly precede the log extracts I posted before (I left out the
'balanc{ing,ed} pool' lines for the other, unused pool):
### s1
16:34:41 s1 dhcpd: balancing pool 80de718 134.58.217/24  total 201  free 201  backup 0  lts 100  max-own (+/-)20
16:34:41 s1 dhcpd: balanced pool 80de718 134.58.217/24  total 201  free 121  backup 80  lts 20  max-misbal 30
16:34:41 s1 dhcpd: Sending updates to dhcp-failover.
16:34:41 s1 dhcpd: failover peer dhcp-failover: peer moves from recover-done to normal
16:34:41 s1 dhcpd: balancing pool 80de718 134.58.217/24  total 201  free 121  backup 80  lts 20  max-own (+/-)20
16:34:41 s1 dhcpd: balanced pool 80de718 134.58.217/24  total 201  free 121  backup 80  lts 20  max-misbal 30
16:34:41 s1 dhcpd: peer dhcp-failover: Got POOLREQ, answering negatively!  Peer may be out of leases or database inconsistent.
### s2
16:34:40 s2 dhcpd: balancing pool 80de6f0 134.58.217/24  total 201  free 201  backup 0  lts -100  max-own (+/-)20
16:34:40 s2 dhcpd: balanced pool 80de6f0 134.58.217/24  total 201  free 201  backup 0  lts -100  max-misbal 30
16:34:40 s2 dhcpd: pool response: 0 leases
16:35:42 s2 dhcpd: balancing pool 80de6f0 134.58.217/24  total 201  free 121  backup 80  lts -20  max-own (+/-)20
16:35:42 s2 dhcpd: balanced pool 80de6f0 134.58.217/24  total 201  free 121  backup 80  lts -20  max-misbal 30

(On a sidenote, what surprised me too was that the pool has a different ID
on the peers, although it's config is identical on both peers.)

And these are the 'pool' log lines posted before.  As you can see, `lts`
has decreased and `backup` has increased without prior pool balancing!
Also note the disagreement between the two peers.  Could this be an effect
of the 'xid mismatch' errors on bind updates?
### s1
17:34:41 s1 dhcpd: balancing pool 80de718 134.58.217/24  total 201  free 118  backup 83  lts 17  max-own (+/-)20
17:34:41 s1 dhcpd: balanced pool 80de718 134.58.217/24  total 201  free 117  backup 84  lts 16  max-misbal 30
### s2
17:35:43 s2 dhcpd: balancing pool 80de6f0 134.58.217/24  total 201  free 117  backup 84  lts -16  max-own (+/-)20
17:35:43 s2 dhcpd: balanced pool 80de6f0 134.58.217/24  total 201  free 121  backup 80  lts -20  max-misbal 30

I haven't repeated the test with a pair of servers patched as suggested,
yet, but I intend to.

Kind regards
Bart Van den Broeck
---------------------- ICT-Infrastructuur - Netwerken aka KULeuvenNet --
---------------------- LUDIT - ICTS - K.U.Leuven --

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm