[Kea-users] HA use-cases for large enterprises?

Geist, Dan (CCI-Atlanta) Dan.Geist at cox.com
Tue Aug 22 15:28:46 UTC 2023


Greetings. We’re looking to scope what a solution might look like with KEA for our enterprise. It’s a rather large one with multiple millions of endpoints (not running KEA now).

We’re currently broken into multiple service POPs with multiple pairs of servers in an active/standby configuration (large POPs with lots of pairs and small POPs with only a few). We’re configured now for each server pair to handle about 250k client devices (of mixed types). The largest POPs handle somewhere in the high hundreds to low thousands of complete DORA/SARR transactions per second.

After looking through the various HA options presented in the admin manual, a question arises around scaling and HA setup options: What happens to the model when a common db lease backend (MySQL or PostgreSQL) is used? If most of the transactional latency in using the HA models is due to lease updates being propagated to backups, how does that change when a common backup is in place?


Further, utilizing one of the new features of 2.4 ( **Early allocation**: RFC2131 ) to help with possible lease collisions, would it be possible to create an n-node (horizontally scaled) cluster of servers without a native HA scheme but WITH a highly performant lease and configuration backend? We would likely run ECMP and Anycast on the cluster nodes for the listening IP(s) such that for a given client/server transaction, there would be only one service node communicating with that client. In this way, the drawbacks of HA blocking transactions are avoided by making the service nodes completely unaware of each other and relying on the DHCP service behavior and current lease data to avoid conflicts.



The eventual intent with this setup would be to also include bi-directional and non-blocking db replication to one or more “disaster sites” which would be kept (almost) in sync with the data in POPs but would only ever receive client traffic in the case of catastrophic site failure; absolute sync being not as important as the simple ability to serve the subnets in question.



Thoughts?



Thanks

Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/kea-users/attachments/20230822/634ced70/attachment.htm>


More information about the Kea-users mailing list