How an OOM Issue With BIND 9 Led to Discovering a Memory Allocator Glitch
We recently dealt with an interesting case of a fleet of BIND 9.Read post
2021 was, sadly, a lot like 2020 for a lot of people: a year of private disappointments and losses for many of us as individuals. It was, however, an extremely successful year for ISC.
We completed work on major new stable versions of BIND 9, Kea DHCP, and Stork: BIND 9.18, Kea 2.0, and Stork 1.0. We had some rough patches this year with BIND 9.16, which suffered because an extensive refactoring project was simply not quite finished when we created the branch. Thankfully we have successfully completed that now, and its successor, BIND 9.18, is looking very good. The Kea user base is growing and both Kea and Stork are maturing. We continue to balance long-term ambitious changes with smaller bug fixes and new features. All of our projects maintained a fast cadence of mostly monthly releases, while improving their QA processes.
We made continued progress in updating and adding F-Root nodes. We also increased participation and leadership from ISC in policy discussions about Root Server Operator (RSO) governance, with the active engagement of ISC’s new general counsel, Rob Carolina.
It is always reassuring to see that ISC can still attract talented people who are excited to work on open source. We added new staff on the BIND and DHCP development teams, and new management and new staff for our support team.
Finally, ISC’s financial picture is very solid right now. We will probably always be anxious about our financial security, given that we are trying to fund the development of free software, but we are starting 2022 in a good place. We ended 2021 with nearly 150 support subscribers - over 90% of them renewals from 2020 - with new customers more than making up for the few we lost. More of our customers are subscribing for both BIND and Kea support, and we are gaining more traction with larger enterprises. We are at what we feel is an optimal size, and are feeling cautiously optimistic about 2022.
Despite the Covid-19 pandemic lasting far longer than anyone expected, 2021 was a good year for F-Root, especially since node deployment depends on people actually being physically present in data centers to rack up and connect the hardware. We deployed F-single nodes at nine new locations, and upgraded “classic” installations to F-single configuration at a further eight locations.
As of the end of 2021, F-Root consists of:
The global sites host 2x F-Root servers as well as F-Root management infrastructure. An F-single is our current base configuration, which relies on a single 1U server to provide both the F-Root service and BGP Anycast. A classic site comprises 2x F-Root servers, a console server, routers, and switches. The classic sites are all expected to be phased out and upgraded to the F-single configuration by the end of 2022 and we have a strong pipeline of new installations and upgrades due too.
(To see a list of all the current F-Root nodes, visit https://www.root-servers.org/ and select F from the Root Servers list.)
Our provisioning systems continue to evolve, with lots of effort going into ensuring the consistency of deployments and to improve our monitoring capabilities: Developing improvements to our system monitoring will continue to be the main focus during 2022 (aside from actual node deployments).
|ABQ1||Albuquerque, NM||US NMIX|
|DAD1||Da Nang, VN||VNNIC|
|DEN2||Denver, CO, US||Peaktera|
|PAH1||Paducah, KY, US||PIE|
|RIC1||Richmond, VA, US||Ninja-IX|
|GRU1||São Paolo, BR||NIC.BR|
|LIS1||Lisbon, PT||DNS PT|
|PRG1||Prague, CZ||ISC / Peering.cz|
Even with the pandemic raging outside, 2021 was a productive year for the BIND 9 team. Notably, we stabilized BIND 9.16 to a point that it has been designated as an Extended Support Version (ESV).
At the beginning of 2021, Petr Špaček joined the BIND 9 team and began improving our recursive performance testing. Later in the year, Aram Sargsyan jumped right into the fray, working on OpenSSL improvements. In another major personal achievement, Mark Andrews hit 20 years at the company, and he has been named Distinguished Engineer to reflect his long-time commitment to BIND 9.
At the beginning of 2021, we decided to modify our BIND release model again, to lengthen the time between major branches and provide extended support for every stable branch. You can find out more in our blog post on the topic. This leaves us with a 4-year support cycle for each Stable/ESV release and a 2-year overlap to allow a graceful migration period for our users.
We have reduced the changes backported to the BIND 9.11 branch to a bare minimum, keeping the promise that only security and high-impact issues will be fixed in BIND 9.11 as it nears end-of-life. Code changes in BIND 9.16 were unfortunately greater in the first half of 2021, but since we marked BIND 9.16 as ESV in mid-year, we are gradually reducing the amount of work backported from the development branch to BIND 9.16.
The BIND 9.16 network manager code was stabilized in the first half of 2021 to the point where it is stable and reliable. The TCP-handling parts of the network manager code had to be rewritten from scratch, but while this was a difficult choice to make during a stable cycle, it has proved to be reliable since that code was merged.
BIND 9.16 authoritative performance was already in the “good enough” ballpark, achieving around 1Mqps; during the year, we significantly improved the recursive performance testing, building on Petr’s experience. The new recursive testing framework allowed us to identify and fix bottlenecks in the existing code, which reduced the number of threads by one-third by moving internal tasks to run in the network manager worker loops. This at least doubles BIND 9.16’s performance compared to BIND 9.11. We published this blog post where recursive performance is discussed in more detail: https://www.isc.org/blogs/bind-resolver-performance-july-2021/. We continued to work in this area, refactoring the dispatch (DNS client) code in the upcoming BIND 9.18 release, bringing even higher performance to the recursive function of BIND 9 while reducing memory consumption at the same time.
In BIND 9.18, the BIND 9 native memory allocator was removed and replaced by
jemalloc, which could be optionally compiled in and is recommended for all workloads. BIND 9.18 uses the same or less memory than BIND 9.11 while the performance more than doubled.
Users upgrading from 9.11 to 9.16 reported significantly higher memory usage. We fixed this by backporting some of the improvements made for 9.18 to 9.16. This is discussed in length in this KB article: https://kb.isc.org/docs/bind-memory-consumption-explained.
The upcoming BIND 9.18 will include support for DoT (DNS over TLS), XoT (XFR over TLS), and DoH (DNS over HTTPS), protocols that add encryption-layer support for the new privacy-focused protocols. This is important for users who want to enhance the privacy of their client-resolver connections without relying on the “big tech” providers. We will continue our work in this area, so BIND 9 will keep being a great choice for people who want additional privacy.
ISC’s engineers participate in IETF protocol development. We have collaborated with NLNetLabs on interoperability changes to DNS Cookies - both writing a standard RFC and implementing the new SipHash-based algorithm for DNS Cookies. We continued to collaborate on the new iteration of Catalog Zones that should make all the open source versions interoperable, allowing for more diversity in the secondary servers. BIND 9 now includes support for HTTPSRV records, finally solving the long-standing “CNAME-flattening/ANAME” problem.
The BIND Administrative Reference Manual (ARM) has been converted from DocBook (XML-based) to Sphinx (RST-based) and is now regularly published at https://bind9.readthedocs.io/.
The BIND source code, documentation, and configuration options have been changed to follow RFC 8499, DNS Terminology, and to stop using terminology that is considered offensive. The new language is the default in the configuration but the old configuration syntax still works, to prevent breakage of existing deployments. The main git branch is now called
In 2021, BIND 9.11 had four CVEs and BIND 9.16 had five CVEs. Details can be found in the BIND 9 Security Vulnerability Matrix in our Knowledgebase.
During investigation of a GSS-TSIG vulnerability in the BIND 9 implementation of SPNEGO, BIND 9 engineers found (and reported, of course) a serious vulnerability of the Heimdal Kerberos implementation used in FreeBSD and other BSDs.
After at least one customer reported performance degradation during zone transfers when deploying BIND 9.16, we analyzed the code and implemented a new approach to these transfers. Previously, BIND 9 used a technique where it would “quantize” long-running jobs into smaller chunks and intermingle this small-chunk processing into regular query-response processing. Under the new approach, long-running jobs are offloaded to a separate thread pool that runs the jobs independently, leaving scheduling to the operating system kernel. Instead of quantizing the job into arbitrarily sized chunks, the long-running job blocks the other threads for the least possible time - locking and unlocking the shared resources only for a short period of time.
The build system used by BIND 9 has been completely rewritten to use
libtool, making it more modern and easier to understand and modify.
Support for building BIND 9 on Windows has been removed in the development release; the last major release supporting Windows is 9.16.
During the development cycle, it became clear that supporting different implementations of PKCS#11 natively in BIND was inefficient and expensive. Therefore, we contracted with an external party to improve the
engine_pkcs11 for OpenSSL. This has been done: performance and stability were improved to the point where we were able to drop BIND’s native PKCS#11 implementation in favor of the OpenSSL-based PKCS#11 implemented in
engine_pkcs11. In 2022, we plan to implement the PKCS#11 engine for OpenSSL 1.x and the PKCS#11 driver for OpenSSL 3.x, without using
In 2022 we plan to continue fixing bugs, adding new features requested by users and customers, and refactoring old code. A major project planned for the next development cycle is the refactoring of the venerable red-black tree database (RBTDB) implementation used to store authoritative zone records and cached RRsets.
We recently published an interview with the BIND QA Manager.
Despite all the craziness in the world and in the personal lives of some team members, 2021 was a very good year for the project. We managed to release Kea 2.0.0, which was a major milestone in the project history. Bumping the major number to 2 (we’ve been releasing 1.x since 2015) reflects the overall feeling that the code base has matured significantly and is usable even in the largest and most demanding ISP deployments.
This confidence is backed up by the growing number of existing and new users. We ended 2021 with 51 Kea support customers, and during the year we closed 521 engineering tickets, including new features, bug fixes, documentation improvements, and more. We shipped 12 releases, several code drops, and quite a few patches to customers who couldn’t wait for the next monthly release. We’re also seeing more support customers who are not ISPs; the expansion into enterprise and university markets is both exciting and scary. The trend is also visible in the new features being requested and implemented.
Kea is now fully multi-threaded, including the tricky High Availability scenario with two servers communicating over multiple connections and multiple threads. Being able to process lease updates in parallel (with the bottlenecks of a shared single UNIX socket and single TCP connection eliminated) really gave impressive results. In the most extreme scenarios, Kea performance improved tenfold over prior versions. More typical scenarios are less dramatic, but Kea 2.0 is still several times more performant than Kea 1.8. In the most efficient scenario, Kea is able to assign 38K new leases per second; we’re approaching levels where our testing tools are not able to keep up. It’s a good problem to have.
We completed a major sponsored development project: GSS-TSIG. This was by far the biggest custom feature in the history of Kea. GSS-TSIG provides integration with Microsoft Windows environments, using Active Directory that in turn uses Kerberos. The team worked closely with the customer and provided many engineering checkpoints, so the customer could oversee the technical details and perform early integration.
For much of its early life Kea was focused on delivering protocol features, but since it is more or less complete in this regard, our development efforts now shift towards improving management. This is of vital interest for many existing and prospective users. Kea’s REST API interface can now be protected with TLS, including the mutual mode where both server and connecting client certificates are validated. We also developed support for MySQL connections over TLS, and there’s a new section in the Kea ARM about security.
2021 was the second year in a row when we did not publish any security advisories for Kea. However, our internal lodge of scoffers fears that due to the new TLS and Kerberos code in Kea, our good fortune with security incidents may come to an end. Time will tell!
We migrated our automated Kea build and testing farm to Amazon Web Services (AWS), and took this opportunity to review and significantly update our testing procedures. Previously, we had a fixed number of virtual machines (VMs) that were always on. With AWS, we migrated to an on-demand mode, where the VMs are created to run a specific set of tests and then are destroyed afterward. This allows us to run tests on more systems. At the end of the year, we had 3789 system tests, running tests on 14 operating systems (various versions of Alpine, CentOS, Debian, Fedora, FreeBSD, and Ubuntu). With over 8,400 unit-tests in Kea code, that gives us over 118,000 test/OS combinations that are being run on every commit to master.
The Kea sources are now 1,008,159 LOC (that’s over a million lines of code). That code is now scrutinized using multiple automated tools, including but not limited to Coverity Scan, TSAN (thread sanitizer), ASAN (address sanitizer), UBSAN (Undefined Behavior Sanitizer), gcov (coverage report), and more. We also run extensive performance tests that check a variety of simple and complex scenarios, including simulating millions of DHCP clients, thousands of subnets, millions of reservations, and more. We now provide native packages for many OSes that greatly simplify the installation effort.
2021 was also a good year for Stork. The team put out nine releases in total, including 1.0 in December. While the project is still miniscule compared to Kea, the user base is growing rapidly, reporting bugs and requesting new features. The project has gotten many new features: TLS support; lease inspection; a configuration review module (Stork can now make suggestions about items to improve in Kea configs); better Prometheus and Grafana integration with new statistics; a full configuration viewer for Kea; a service configuration dump tool with the ability to get all the debugging information typically needed by ISC’s support team, such as config files, log files, a database dump, the OS/Kea/Stork versions, etc., in one tarball; and many more. We now have a more-or-less complete dashboard for Kea and are now shifting towards making Stork capable of configuring Kea.
Stork now has a bit over 123,000 lines of code. The team closed 161 GitLab issues in 2021.
Unfortunately, 2021 was not a productive year for ISC DHCP. With our DHCP engineering resources fully dedicated to general Kea work, GSS-TSIG, and rapidly increasing user requests for Stork, we decided to focus on these newer projects rather than the legacy ISC DHCP. Nevertheless, we’re now getting ready to release ISC DHCP 4.4.3, which will be the last release with both the client and relay components. While the release will be finished in 2022, much of the preparation for this updated release was done in late 2021.
We ended the year with just under 150 customers, including 17 that were new for 2021, and after losing only six customers in 2021. Over 90% of our 2020 support customers renewed their agreements with us.
The support team changed significantly in 2021, with a new manager and a doubling of the number of support engineers, from three to six.
Here is a sample of the things the support team accomplished in 2021:
Here are some things we’re looking to accomplish in 2022:
ISC’s status as a root service operator (RSO) spawned some significant work related to the ongoing discussions of a new root server system governance structure (RSS GS). In mid-2021, the community of 12 RSOs began to review and discuss a draft proposal for a new RSS GS put forward by the Root Server System Governance Working Group (RSS GWG), chartered by ICANN.
This resulted in the adoption and publication on 17 November 2021 of “RSSAC058: Success Criteria for RSS Governance Structure” and “RSSAC059: RSSAC Advisory on Success Criteria for the Root Server System Governance Structure.” These documents were warmly received by the ICANN Board of Directors and they will now form part of a revised RSS GWG work plan.
The European Union Proposal for a new NIS2 Directive became a significant focus for ISC in March 2021. NIS2 (a proposed law building on the original NIS Directive to strengthen European approaches to cyber security) was drafted in a manner that appeared to call for EU member states to regulate the cyber security arrangements of all 12 global RSOs. While ISC applauds the desire to strengthen cyber defenses, ISC submitted public comments on the proposed Directive specifically calling out the danger of any sovereign state attempting to directly regulate the world’s root server system (RSS). We pointed out that, far from enhancing the resilience and security of the Internet, sovereign intervention in RSS operations could destabilize the RSS and DNS. We explained that regulatory intervention in the RSS by one sovereign state could prompt (potentially conflicting) regulatory intervention by other sovereign states. We suggested that this would, in turn, risk fragmenting the Internet as we know it. Some of our fellow RSOs made similar observations. The European Parliament subsequently amended the draft Directive in late 2021 to take root servers out of its scope, but the matter is not finally resolved. We now wait to see whether the European Commission will attempt to negotiate the reintroduction of the RSS into the law.
ISC joined the https://openinventionnetwork.com, the patent non-aggression community.
We held and recorded nine technical webinars, providing ongoing training to our users. In addition to posting the recordings on ISC’s YouTube channel, we also created a BrightTalk channel to promote our recordings to a wider audience.
ISC staff gave six conference talks, archived on the ISC website at https://www.isc.org/presentations/. This is fewer than in prior years, because - of course - the pandemic has ruined everything.
We launched the ISC swag store, using Shopify and Printful to sell and create some fun ISC-branded items.
ISC donated to several non-profit NOGs, as well as to Kea conservation in New Zealand, showing our support for some important organizations.
In the past, some years we have simply been unable to keep up with the number of externally created issues. In 2021, we did some impressive catching-up on that front.
We don’t always fix the issues reported, if the software version is old or if we disagree with the user’s interpretation, or if the reward doesn’t seem worth the effort. Last year though, we fixed more than 80 issues reported by open source users through our GitLab.
We are especially grateful to reporters who:
Here are some of these technical contributors from the user community whose reports improved BIND in 2021:
We also offer our thanks to:
Our stalwart ISC DHCP community experts, Simon Hobson, Sten Carlsen, Bill Shirley, Bob Harold, Niall O’Reilly, Glen Satchell, and Gregory Sloop, who are helping a whole new generation of users with their ISC DHCP issues via the dhcp-users mailing list.
Numerous other BIND users, Kea users, and ISC DHCP users, who provided expert advice to others on our user mailing lists. ISC staff could not possibly answer all these questions ourselves, not only because of the number of questions, but because we don’t have the depth and variety of operational experience our users have. We are grateful for these contributions of technical expertise.
The many sponsors of our F-root nodes. They donate rackspace, purchase servers, help support our operating costs and generally make it possible to provide free root services to the Internet.
What's New from ISC