[Kea-users] Load-Balancing Network issue between Relay and Kea

Kevin P. Fleming lists.kea-users at kevin.km6g.us
Wed Jan 4 21:59:21 UTC 2023


On Wed, Jan 4, 2023, at 15:54, Simon wrote:

> Kevin P. Fleming <lists.kea-users at kevin.km6g.us> wrote:
>
>> If 'max-unacked-clients' isn't sufficient to address this, then this leaves a fairly large opening in the Kea high-availability story, as any network disruption which causes a server to no longer receive discovery packets from clients, but otherwise receives all expected network traffic, won't be noticed except by the clients! This concerns me, as (like other users here) my Kea servers receive all client traffic via DHCP relays, and misconfiguration of the relay such that it only relays to one server and not both will result in half of my clients not getting DHCP service at all.
>
> Surely, if you misconfigure a relay agent in that way, around half your 
> clients will initially be unable to renew their leases, but eventually 
> will get serviced by the available server once their active lease has 
> expired ? That would mean the clients would drop their network config 
> momentarily before setting up a new one - meaning that active 
> connections would drop, but new ones would connect just fine once the 
> new settings are in place.

That's why I posted; I don't really know!

If the server receiving the client requests is not in partner-down state, based on my understanding of the Kea ARM section on HA it will not respond to those requests. That certainly seems to be the case while the lease is still active; once the lease has expired I'm not sure what will happen.

In my network with Kea in load-balancing mode, there seems to be some sort of algorithm involved even for DHCP DISCOVER, where only one of the two servers responds with DHCP OFFER even though they are both running in a normal state. It has been my assumption (untested) up to this point that Kea is using the client's identifier (MAC address, DUID, etc.) to choose one or the other of the active servers to respond to that DISCOVER. If that's true, and both servers are in normal operation (neither is in partner-down), then that algorithm would continue telling the second server to *not* respond to requests from that client because it believes the other server will do so... even if the other server is not receiving the client's requests.

To summarize, that's what I assumed (against untested) 'max-unacked-clients' is for; if the second server assumes the first server will respond to those clients, but it does not (no leases are offered to them), it could notice the situation and decide that the first server is unhealthy or partitioned and force it into a 'down' state.


More information about the Kea-users mailing list