'MAC affinity' doing the exact opposite, leading to 'pool churn'?

David W. Hankins David_Hankins at isc.org
Tue Sep 25 16:48:19 UTC 2007


On Tue, Sep 25, 2007 at 03:19:10PM +0200, Bart Van den Broeck wrote:
> Yes, that's my assumption so far.  Could you (or anyone else knowledgeable
> on this subject) explain how 'MAC address affinity' decides what to do?

It's the same hash algorithm, but the result is inversed.

There are two pieces; when leases exit 'transitional states' (expired,
released) they enter FREE.  So the server may conditionally queue the
lease to BACKUP upon this case.  This is effectively event-driven.

The second piece is on the scheduled pool rebalance event, which are
those pool log lines.  This runs in two passes; the first moves leases
that hash to the peer so long as 'lts' hasn't reached max-own in the
peer's favor ("let the peer reach the limit by sending it my leases").
The second pass moves leases unconditionally until the local ownership
limit is reached.

Both passes are done in oldest->newest order.

> There's only the one client ever using this pool.  So "pushing the upper
> limit" isn't really appropriate here, I guess.  Everything that happens

It is actually, which is also surprising.  More later.

> If pool balancing would be involved, shouldn't the primary be able to find
> *other* leases, leases that it hasn't used before, to send to the
> secondary in order to balance the pool?

Yes, there's an obvious problem here.  More later.

> ### s1
> 16:34:41 s1 dhcpd: balancing pool 80de718 134.58.217/24  total 201  free 201  backup 0  lts 100  max-own (+/-)20
> 16:34:41 s1 dhcpd: balanced pool 80de718 134.58.217/24  total 201  free 121  backup 80  lts 20  max-misbal 30
> 16:34:41 s1 dhcpd: Sending updates to dhcp-failover.
> 16:34:41 s1 dhcpd: failover peer dhcp-failover: peer moves from recover-done to normal

This is a fresh install!  That makes much more sense.

> 16:34:41 s1 dhcpd: balancing pool 80de718 134.58.217/24  total 201  free 121  backup 80  lts 20  max-own (+/-)20
> 16:34:41 s1 dhcpd: balanced pool 80de718 134.58.217/24  total 201  free 121  backup 80  lts 20  max-misbal 30
> 16:34:41 s1 dhcpd: peer dhcp-failover: Got POOLREQ, answering negatively!  Peer may be out of leases or database inconsistent.

What happened here is s2 entered normal, did its own pool check, and
found that it was severely lacking in leases (it had not yet received
the 80 BNDUPD's above).

> (On a sidenote, what surprised me too was that the pool has a different ID
> on the peers, although it's config is identical on both peers.)

The 'id' is literally the location in memory of the pool structure.
It's pretty bogus.

> And these are the 'pool' log lines posted before.  As you can see, `lts`
> has decreased and `backup` has increased without prior pool balancing!
> Also note the disagreement between the two peers.  Could this be an effect
> of the 'xid mismatch' errors on bind updates?

No, those are an artefact of the client behaviour in your log, which
for some reason DHCPREQUESTs twice in a row very rapidly.

> ### s1
> 17:34:41 s1 dhcpd: balancing pool 80de718 134.58.217/24  total 201  free 118  backup 83  lts 17  max-own (+/-)20
> 17:34:41 s1 dhcpd: balanced pool 80de718 134.58.217/24  total 201  free 117  backup 84  lts 16  max-misbal 30
> ### s2
> 17:35:43 s2 dhcpd: balancing pool 80de6f0 134.58.217/24  total 201  free 117  backup 84  lts -16  max-own (+/-)20
> 17:35:43 s2 dhcpd: balanced pool 80de6f0 134.58.217/24  total 201  free 121  backup 80  lts -20  max-misbal 30

There's no disagreement here.  They're 2 seconds apart; they're
correct.  The primary left it at 16 in the primary's favor, the
secondary moved 4 'virgin' leases thinking they hash to the primary.


OK, a fresh install.  That explains something.  The load balance check
examines the lease's previous binding, but it doesn't appear that it
makes any attempt to 'tie break' the case where the lease has had no
previous binding.

So all 'virgin' leases hash to the primary (hash to zero), and that
makes the primary try and 'own' an unfair share of the leases right
off the bat.

This doesn't explain how or why a freshly allocated and expired lease
would be given away - the 'virgin' leases have zero end times so they
should be first on the list to go to the secondary.

-- 
Ash bugud-gul durbatuluk agh burzum-ishi krimpatul.
Why settle for the lesser evil?	 https://secure.isc.org/store/t-shirt/
-- 
David W. Hankins	"If you don't do it right the first time,
Software Engineer		     you'll just have to do it again."
Internet Systems Consortium, Inc.		-- Jack T. Hankins


More information about the dhcp-users mailing list