[Kea-users] Planning some experimentation with HA using ECMP

Victoria Risk vicky at isc.org
Tue Apr 30 15:44:45 UTC 2024



> On Apr 30, 2024, at 9:18 AM, Dan Geist <dan at polter.net> wrote:
> 
> Thanks Vicky.
> 
> Facebook's dhcplb is an option, but it also solves problems that we don't have (imbalance of DHCP messaging and need for staged deployments). It also requires more infra (either physical or virtual) and a slightly greater network complexity which differs from what we already support in other services. I'm trying to stick with the "just because you have a hammer, you should still try to use a screwdriver on screws" philosophy :)

Well, for sure, not many users have Facebook’s scale, so their solution might not fit either.

Recently we have had a couple of requests to write a new DHCP relay, since ISC DHCP is now EOL and Kea doesn’t include a relay. This is a thing that very few people ask for, which makes it a difficult thing to invest in. However, if there is enough interest in a load-balancer, maybe we could make a relay that is also a load balancer??  I would encourage anyone who has requirements like this to add them to https://gitlab.isc.org/isc-projects/kea/-/issues/1869, which is a ticket about requirements for a relay. One big question is whether a relay should be a version of Kea, or a completely separate (presumably simpler) implementation - any requirements that help decide that are useful. 

> 
> I suppose the WAY in which the traffic is balanced is ultimately a wash, though. Either way, we'd need Kea instances in a horizontal N-number farm with mostly-identical behaviors (regardless of if they listen for the virtual IP or if it's housed one hop upstream). Ideally, having as little state as possible (or as little state that DIFFERS between hosts) is an important aspect.

Certainly, this sounds ideal, and I thought everyone would be doing this with the lease backend. The problem is that the DHCP protocol specifically requires writing the lease to a file on disk before confirming it. This is inevitably time-consuming, and more so if you are writing it to remote shared storage with possible contention. This would mean that a local database with replication is the best way to go. We are not really database experts at ISC, so the expertise in this is going to have to come from other users. I know we have users doing this, but I am under the impression that the database replication is complicated to set up and administer and therefore not for everyone.

> 
> Performance tuning, strategy for maintaining the database backend (monolithic vs multiple replicating instances) and so forth will be important, but is there anything inherent about Kea itself that will break this conceptually (unique metadata payload in messaging that will break on DHCP refresh to a different node or something along those lines)?

Conceptually it should work and I am pretty sure there are users doing this now. (here’s an old thread on this topic https://lists.isc.org/mailman/htdig/kea-users/2017-November/001471.html )
One thing that comes to mind, are options added by the relay(s). We just added a frequently-requested feature from ISC DHCP in Kea 2.5.8 to ’stash’ options provided by the relay on the original assignment to use on refresh. I am thinking this is cached in the Kea server that originally created the lease, and *might* not work if the client renews with a different server….  I am not sure where lease 'extended info’ is stored, and whether it would be included in a separate lease backend.  In any case, relay options would be path-specific, so if you depend on them, that could cause an issue with renewals.

Let us know what you end up doing and how it works!

> 
> Thanks
> Dan
> 
> ----- On Apr 30, 2024, at 8:39 AM, Victoria Risk <vicky at isc.org> wrote:
> 
> 
> On Apr 29, 2024, at 6:16 PM, Dan Geist <dan at polter.net> wrote:
> 
> Hi. I have an environment where many of the network services (DNS, NTP, ToD, etc.) provide scaling, fault tolerance, and load sharing via ECMP (in front of the service) and BGP. Each (of the 2 or more) service node(s) monitors the status of that service and announces/pulls BGP announcements from the upstream router pair. This works really well for protocols with simple request/response transactions.
> 
> I'd like to try doing this same thing with Kea dhcpv(4|6). In that setup, the same "virtual service IP" would be configured on each of several Kea nodes (in addition to the real link IPs) and they would announce these to the next hop (as above). My thinking is that if there is a common configuration and lease backend to these multiple nodes, then this can be a way to provide HA services (and scaling) to a very large number of devices. My only concern is how the multi-step transaction will be handled.
> 
> Before I spend the time to mock this up, has anyone else tried ECMP load distribution with DHCP, specifically on Kea, and are there any "gotchas" to be aware of?
> 
> You might want to check out the DHCP Load Balancer from Facebook: https://github.com/facebookincubator/dhcplb
> 
> 
> Thanks.
> Dan
> 
> --
> Dan Geist dan(@)polter.net
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/kea-users/attachments/20240430/085b448d/attachment.htm>


More information about the Kea-users mailing list