[Kea-users] Load-Balancing Network issue between Relay and Kea

Frey, Rick E Rick.Frey at windstream.com
Thu Jan 5 17:16:46 UTC 2023


Prior to discussion on this thread, I was under impression that Kea HA would failover (reach  state of partner-down ) any time the number of max-unacked-clients was exceeded.  As pointed out by others, found in testing that this will not occur if servers are able to successfully communicate to each other but clients are unable to reach one of the servers.  This scenario can occur anytime there is network disruption between clients and one of the Kea servers (or the primary server in case of hot-standby).

The problem in this scenario exists in both load-balancing and hot-standby HA configuration.  In a load-balancing configuration, aprox half of clients will be serviced if clients are only able to connect to one server in the HA pair.  In a hot-standby configuration, none of the clients will be serviced by the standby server if unable to connect to the primary server.

While testing this scenario, found that status of the servers shows they do not appear to check/track un-acked clients unless communication to partner is failed (“communication-interrupted” is true).  The counters for both unacked-clients and unacked-clients-left are always 0 regardless of “secs” field in DHCP request unless communication-interrupted is true.  The  unacked-clients and unacked-clients-left counters are not used unless/until communication is interrupted.

I can appreciate difficulty in determining logic that results in current behavior as there are challenges in detecting partner-down without resulting in a split brain situation.  Wondering if Kea “partner-down” logic can be improved by assessing data sent in heartbeat/sync.  I.e. – if server not seeing updates/leases from partner for same DHCP requests that are ignored (and “secs” exceeded) due to server assuming partner is servicing, that status of one or both servers could be changed.

Example:
server1 not receiving DHCP requests from clients but is communicating with server2.  server2 is receiving DHCP requests from all clients but ignoring some requests due to client should be serviced by server1 (via internal algorithm of client ID).
If server2 sees that server1 is not sending any updates/leases for said client requests, server1 is put into state that allows server2 to service requests.  Tricky part would be in determining if/when server1 should auto change state if begins to see client requests.   Perhaps an option to put node in maintenance mode that requires manually enabling server1?


Sensitivity: Internal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/kea-users/attachments/20230105/a785d3ab/attachment.htm>


More information about the Kea-users mailing list