[Kea-users] Load-Balancing Network issue between Relay and Kea

Wed Jan 4 22:14:14 UTC 2023

CCing the list.. sorry.

Eric Graham
DevOps Specialist
Direct: 605.990.1859
Eric.Graham at vantagepnt.com<mailto:eric.graham at vantagepnt.com>
[cid:16f20d06-c1b9-49c2-80f4-06819b01d04a]
________________________________
From: Eric Graham <eric.graham at vantagepnt.com>
Sent: Wednesday, January 4, 2023 4:13 PM
To: Kevin P. Fleming <lists.kea-users at kevin.km6g.us>
Subject: Re: [Kea-users] Load-Balancing Network issue between Relay and Kea

You're right. There's a table of values against which the DUID (if IPv6) is hashed. The result % number of servers is used as an index pointing to the server that will process the packet.

https://gitlab.isc.org/isc-projects/kea/-/blob/46dc8d276efda1a240f0c05580bdcba62ae5a6c7/src/hooks/dhcp/high_availability/query_filter.cc#L416-L446

Even though the Kea load balancing algorithm (as well as the DHCPd load balancing algorithm) is not exactly RFC compliant, this part seems to be. See RFC 3074 § 6.

I have encountered this same issue when one server cannot communicate. For me, it was partially caused by my socket type being wrong. However, I found the load balancing behavior to be sufficiently finnicky that I have standardized on hot-standby. With the size deployments I deal with, load balancing provides marginal performance improvement at the cost of issues like this and more complicated configuration.

Additionally, having a RADIUS backend made this issue even worse. Load balancing + RADIUS = a bad time.

Eric Graham
DevOps Specialist
Direct: 605.990.1859
Eric.Graham at vantagepnt.com<mailto:eric.graham at vantagepnt.com>
[cid:611bb96b-af22-42ae-9890-37d6469ab42b]
________________________________
From: Kea-users <kea-users-bounces at lists.isc.org> on behalf of Kevin P. Fleming <lists.kea-users at kevin.km6g.us>
Sent: Wednesday, January 4, 2023 3:59 PM
To: kea-users at lists.isc.org <kea-users at lists.isc.org>
Subject: Re: [Kea-users] Load-Balancing Network issue between Relay and Kea

CAUTION: This email originated outside the organization. Do not click any links or attachments unless you have verified the sender.

On Wed, Jan 4, 2023, at 15:54, Simon wrote:

> Kevin P. Fleming <lists.kea-users at kevin.km6g.us> wrote:
>
>> If 'max-unacked-clients' isn't sufficient to address this, then this leaves a fairly large opening in the Kea high-availability story, as any network disruption which causes a server to no longer receive discovery packets from clients, but otherwise receives all expected network traffic, won't be noticed except by the clients! This concerns me, as (like other users here) my Kea servers receive all client traffic via DHCP relays, and misconfiguration of the relay such that it only relays to one server and not both will result in half of my clients not getting DHCP service at all.
>
> Surely, if you misconfigure a relay agent in that way, around half your
> clients will initially be unable to renew their leases, but eventually
> will get serviced by the available server once their active lease has
> expired ? That would mean the clients would drop their network config
> momentarily before setting up a new one - meaning that active
> connections would drop, but new ones would connect just fine once the
> new settings are in place.

That's why I posted; I don't really know!

If the server receiving the client requests is not in partner-down state, based on my understanding of the Kea ARM section on HA it will not respond to those requests. That certainly seems to be the case while the lease is still active; once the lease has expired I'm not sure what will happen.

In my network with Kea in load-balancing mode, there seems to be some sort of algorithm involved even for DHCP DISCOVER, where only one of the two servers responds with DHCP OFFER even though they are both running in a normal state. It has been my assumption (untested) up to this point that Kea is using the client's identifier (MAC address, DUID, etc.) to choose one or the other of the active servers to respond to that DISCOVER. If that's true, and both servers are in normal operation (neither is in partner-down), then that algorithm would continue telling the second server to *not* respond to requests from that client because it believes the other server will do so... even if the other server is not receiving the client's requests.

To summarize, that's what I assumed (against untested) 'max-unacked-clients' is for; if the second server assumes the first server will respond to those clients, but it does not (no leases are offered to them), it could notice the situation and decide that the first server is unhealthy or partitioned and force it into a 'down' state.
--
ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information.

To unsubscribe visit https://lists.isc.org/mailman/listinfo/kea-users.

Kea-users mailing list
Kea-users at lists.isc.org
https://lists.isc.org/mailman/listinfo/kea-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/kea-users/attachments/20230104/5514255a/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook-ara2gz05.png
Type: image/png
Size: 16388 bytes
Desc: Outlook-ara2gz05.png
URL: <https://lists.isc.org/pipermail/kea-users/attachments/20230104/5514255a/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook-ohkok5ds.png
Type: image/png
Size: 16388 bytes
Desc: Outlook-ohkok5ds.png
URL: <https://lists.isc.org/pipermail/kea-users/attachments/20230104/5514255a/attachment-0003.png>