Kea 2.0 - Performance, Stability and Security

We are very proud to announce that we have just posted a new stable branch of Kea, Kea 2.0. Kea 2.0 shows the effect of wider deployment in service provider networks, with a number of improvements to promote stability and performance. We have also started to secure Kea’s management interfaces. We would like to thank those users who reported issues and worked with us to troubleshoot and fix, and otherwise improve, Kea.

Performance and Stability

Better Performance with High Availability (HA)

Kea 1.8.0 introduced multi-threaded support that significantly increased the packet processing performance. However, many deployments were constrained by the relatively slow communication between HA partners.

In Kea 2.0, the HA component has undergone a substantial architectural change. When HA+MT is enabled, the DHCPv4 and DHCPv6 daemons are now able to open HTTP sockets on their own and connect directly to each other, bypassing the Control Agent (CA). This eliminates the bottlenecks of sequential UNIX socket connection and the need to translate between HTTP and UNIX socket connections. Running in multiple threads, multiple connections improve performance substantially, in some cases by an order of magnitude.

High Traffic Environments

With help from our users, we were able to pinpoint and resolve some issues that were only observed in high-traffic environments.

  • When Kea is unable to keep up with the incoming traffic, it parks some packets for later processing. The length of this parking lot queue is now configurable; a default value of 256 is used. Lower values tend to provide more responsive service with a higher drop rate when overloaded, while larger values do the opposite.

  • Reclamation of leases stored in some older versions of MySQL was inefficient in earlier Kea versions, causing the periodic lease reclamation process to take an increasing amount of time. The issue is now fixed, which should result in much better long-term performance.

  • A new statistic, packet-queue-size, has been added that reports packet-queue utilization. It reports an average for the last 10, 100, and 1000 packets. This uses an approach similar to the Unix top tool, which returns CPU utilization for the last 1, 5, and 15 minutes. This may be useful for fine-tuning Kea performance and its queue length.

More Resilient to Communication Failures, Overloading

Kea relies on multiple components to provide service. We have added more resilence to preserve service availability despite some degraded connections between components.

  • A new parameter on-fail gives the operator more control over what to do on database connection loss. It has three possible values, which govern whether the DHCP service should be disabled and Kea should shutdown, or Kea should continue DHCP service after all the configured tries have been exhausted: stop-retry-exit, which indicates that DHCP service should stop, attempt to reconnect, and terminate if unable to reconnect; serve-retry-exit, which instructs Kea to continue serving DHCP traffic, attempt to reconnect, and terminate if unable to reconnect; and serve-retry-continue, which tells Kea to continue serving DHCP traffic, try to reconnect, and continue serving even if reconnection fails. This setting is particularly useful for connections to forensic logging and configuration backend services.
  • HA is more responsive when recovering from communication failure. We introduced a new communication-recovery state. In this state, the load balancing servers remain responsive to DHCP queries when the communication between them is interrupted. The new feature is controlled using the delayed-updates-limit configuration parameter.
  • The DHCP service can be independently enabled or disabled by a user command, by the database connection mechanics, or by the HA library. The DHCP service is disabled when any of those originators disable the service, and it is enabled when all those that previously disabled the service enable it. The servers can now recover from situations where both went to partner-down state and the communication was broken in one direction, but worked in the other.
  • Synchronization of the standby server is now more robust.

Cache Threshold Protects Kea from Broken Clients

This popular ISC DHCP feature has now been implemented in Kea.

Some clients renew their leases earlier than specified, either because they ignore the renewal timer or they are broken. Frequent early renewals put an extra burden on the server, which has to write updated leases even though they may have been renewed only seconds earlier. The cache-threshold (expressed as a percentage) and cache-max-age (expressed in seconds) parameters help reduce that extra burden on Kea. Kea still responds to the client but merely resends the existing lease lifetime, thus eliminating the need to update the lease database.


Security Improvements

We have made a start on transforming Kea into an application that does not require a trusted ‘bastion’ host in a protected area of the network. This is a big adjustment for a core network application, so it is going to take a number of releases to implement the new security features required for this. We have focused initially on providing authentication and encryption for remote management connections to Kea.

Kea now supports basic HTTP authentication, as defined in RFC 6717. It is possible to configure a list of credentials (pairs of user identifiers and passwords) that the user or script must provide to use Kea’s REST API. Authentication information is logged on a dedicated logger, making it easier to implement security policies, such as logging to dedicated secure storage. Kea also obscures passwords in debug logs when the whole configuration is printed.

Kea’s Control Agent (CA) now supports TLS. Three modes of operation are available.

  1. The first is plain HTTP with TLS completely disabled; this was the only mode in earlier releases.
  2. The second mode is encryption, where the CA accepts TLS connections. This is the typical mode when securing a website, where clients and servers are not under the control of the same organization.
  3. The third mode (and the default when TLS support is enabled) is mutual authentication between connecting clients and the CA server. In this mode, clients are required to identify themselves using TLS certificates: the clients verify the server’s certificate and the server verifies the client’s. See Section 23.1 of the Kea ARM for details.

Configuration Flexibility

While most of our focus was on performance, stability, and security, we have also added options to your configuration choices.

  • Global reservations can now be used in conjunction with subnet-level reservations. Earlier Kea versions had a single configuration parameter called reservation-mode that governed whether host reservations were global (out-of-pool) or subnet-level (in pool), and it was not possible to use different reservation types at the same time. The reservation-mode parameter is now deprecated and replaced by three separate boolean parameters: reservations-global, reservations-in-subnet, and reservations-out-of-pool, each of which can be controlled independently.

  • It is now possible to configure preferred and valid lease lifetimes based on the client classification.

  • The Configuration Backend has been extended to include client classes. A number of new commands have been added to the cb_cmds subscriber hook library.

  • Kea is now able to drop packets coming from devices that have matching host reservations with class set to DROP (DROP class listed in the client-classes field in the reservations). This effectively allows the operator to selectively drop incoming packets from some devices, such as customers that have overdue payments, and misbehaving or unwanted devices

  • The forensic logging hooks library is now able to log custom expressions. The expressions can include any option (such as relay option 82) or sub-option (such as circuit-id, remote-id, or any other sub-option), packet fields, network interface names, local or remote IP address, and more. It uses the same expressions engine as when defining client classification or flexible identifiers. Evaluating expressions is a relatively expensive operation, so customized logs will have greater performance impact than the default log. The forensic logging hook library also supports flexible rotation intervals (e.g. using seconds or days) and “pre-rotate” and “post-rotate” actions which can be used to call an external script, e.g. to move or compress respective files whenever the rotate action is performed.

New Hook Runs Arbitrary Scripts

Due to popular demand, a new hook that calls an arbitrary external script has been added. This can be any script that initiates an external process, such as updating routing and firewall rules for provisioned devices. The script is called asynchronously, i.e. Kea starts the script, does not wait for its completion, and continues processing the packet. Asynchronous processing greatly decreases the performance impact this hook might otherwise cause.

New GSS-TSIG subscriber-only hook

We have added an early, experimental version of a new GSS-TSIG hook. DNS updates can be protected with dynamic GSS-TSIG keys that were previously retrieved by Kea using TKEY exchange. This is typically required for updates to Active Directory.

Cassandra Backend Deprecated

The Cassandra lease backend is now deprecated, which means that the feature will be removed in our next stable branch.

Please update if you are running any version prior to Kea 1.8.2

With this new stable branch, we plan to end our support for Kea 1.6, per our published release model. We will continue to maintain the Kea 1.8 branch, as well as the Kea 2.0 branch, until we produce Kea 2.2. Our next development version will be Kea 2.1.0, and our first maintenance release of Kea 2.0 will be Kea 2.0.1.


References

Recent Posts

What's New from ISC