How an OOM Issue With BIND 9 Led to Discovering a Memory Allocator Glitch
We recently dealt with an interesting case of a fleet of BIND 9.Read post
2020 is finally over, and we can assess the damage. We at ISC are thankful that we have emerged remarkably unscathed by either the pandemic or the related economic disruption. We will provide more details when we publish our 2020 Annual Report later in 2021, but for now please enjoy this brief slide show of the highlights and the more in-depth discussion below.
When the pandemic broke out in the spring of 2020, we had no idea what the likely economic adjustment might mean for ISC. Because we are relatively small and don’t have a lot of options for increasing revenue if our support business drops off, we are vulnerable in a recession so we had to be conservative. Worried about the prospect of a worldwide economic slowdown, we cancelled all travel, instituted a hiring freeze, applied for a special small-business loan, and postponed staff raises.
Luckily for ISC, Internet usage increased significantly during the pandemic and our support business has been strong. We sadly lost a couple of customers due to the impacts of the pandemic on their industries, but overall we have not seen any significant disruption in our support business. ISC gained a net of 12 new customers in 2020, with a significant increase in demand for Kea support.
Our biggest single change in 2020 was closing our headquarters at 950 Charter Street in Redwood City, California. This was already going to be a big job, and it was further complicated by COVID-19. Moving meant shutting down ISC’s datacenter in the back of the old warehouse we had occupied for over 20 years.
The project required sunsetting the last remaining hosting and secondary name services we provided for impecunious countries and non-profits, and relocating ISC’s own equipment to professionally managed data centers at the Palo Alto Internet Exchange (PAIX) and Hurricane Electric in Fremont. With some sadness we realized we could no longer maintain a Stratum-0 GPS clock and so we decommissioned clock.isc.org, referring our users to nwtime.org. We returned the massive trailer with our backup power supply that had sat in our parking lot for years. Our wonderful facilities manager, Rory Doolin, who had been with ISC since the beginning, found new homes for old equipment, and he recycled and donated lightly-used furniture.
We had hoped to have an open house to mourn the passing of this gathering place and landmark, but that too was a casualty of the pandemic. Although giving up our historic building was emotional for many of us, we realized our resources could be better allocated elsewhere, and moving out has allowed us to direct funds to more effectively pursue our software development and deployment objectives.
All remaining paper documents were packed and shipped to the new headquarters and business office in Newmarket, NH, and our “offices” became 100% virtual, although that wasn’t a huge pivot for us since the majority of staff already worked from home.
Closing our datacenter accelerated some of our projects to migrate to modern cloud-based applications for some of our non-core functions, including sales and finance. We migrated our Customer Relationship Management application from SugarCRM to Zendesk; for self-hosted applications, we continue to rely on open source solutions, including GitLab, Mattermost, Jenkins, etc.
2020 brought more staff changes than we have had in years. Seven ISC team members moved on, including one who retired after ten years with ISC and two whose jobs were tied to our building. We hired seven new full-time staff members: three developers for the BIND team, one for the DHCP team, a new Support Engineer, a Director of Finance and Accounting, and an Accounting Manager. We also brought on a part-time General Counsel. We tried to reach out to a diverse pool of applicants as part of our search process; we placed recruitment ads on Indeed, GlassDoor, LinkedIn, Flexjobs, Fossjobs.net, Twitter, Facebook, GitHub, Ada’s List, Women in Technology, and BlackJobs.com, and considered a total of 188 applicants.
We gave 33 public webinars and conference talks, all of which are archived on our website. We offered a series of training talks on BIND and DNSSEC in the spring, and in the fall did a series on Using Kea. Between these two sets of programs, we held separate presentations on Kea performance with multithreading, troubleshooting with dig, DNS encryption, BIND and RPZ, DNS support in OpenStack, EDNS Client Subnet Identifier in BIND, and Comcast’s VinylDNS provisioning system. We had a record number of non-ISC guest presenters: Andreas Taudte, Joe Crowe and Paul Cleary, Stephan Lagerholm and Graham Hayes, Matt Stith, and Carsten Strotmann.
The F-Root operations team made significant upgrades in 2020.
In 2020 we added these new nodes to the F-Root system:
With the help and support of our local partners we were able to bring online refreshed hardware in Port-au-Prince, Haiti (ISC-funded); Podgorica, Montenegro (ISC-funded); Philipsburg, St. Maarten (ISC-funded); Hong Kong, China (sponsor HKIX); Osaka, Japan (sponsored by NTT); Kuala Lumpur, Malaysia (thanks to PPIM); Turin, Italy (ISC-funded); and Suva, Fiji (thanks to APNIC).
As of the end of 2020, F-Root consists of three “global” sites, 16 “classic” sites, and 42 “F-single” sites in service, in addition to over 200 F-Root instances hosted by Cloudflare. The global sites host 2x F-Root servers and F-Root management infrastructure. A classic site comprises 2x F-Root servers, a console server, routers, and switches. An F-single is our current configuration, which relies on a single 1U server to provide both the F-Root service and BGP Anycast. The classic sites are all expected to be phased out and upgraded to the F-single configuration by the end of 2022.
In 2019 we embarked on our most ambitious refactoring project ever, replacing the ancient native socket layer in BIND with a new open source component, libuv. This is a project we had put off for several years because of the complexity and risk of changing such a fundamental and, frankly, ancient part of BIND. Unlike the OSI-layer slideware we have all seen, the socket “layer” in BIND is not a neatly defined layer in the named daemon. Instead, it is - or was - fairly closely integrated into functions that manage tasks which have to be parked while waiting for responses expected over the network.
The 2019 work on the new network manager was not complete when we created the new 9.16 stable branch, and some users discovered problems. For a while the TCP performance on FreeBSD was much worse than on Linux, and performance on platforms without load-balanced sockets performance was abysmal. Additional refactoring, and re-writing TCPDNS support, improved both stability and performance, but we still haven’t completed the transition to the new network manager. Some operations still rely on the old BIND sockets.
Partly as a result of this, we have decided to modify our BIND release model again, to lengthen the time between major branches and provide extended support for every stable branch. You can find out more in our blog post on the topic. This longer release cycle will enable us to continue to tackle complex refactoring projects. The next such project we have in mind is the rbtdb: BIND’s Red-Black tree, a critical data structure that is ripe for an overhaul.
The new network manager was a precondition for adding native support for the new encrypted transports DoH and DoT to BIND, so delays in completing the network manager caused us to miss our goal of shipping DoH support in 2020. However, we were pleased to release DoT in December; we see potential applications for DoT, particularly in the Enterprise. We plan to follow this up with further work on XoT, and of course finish and release DoH.
Our technical support team encountered some issues related to BIND 9’s cache management and memory usage, reported by a few large and observant operators. We think we addressed some of these with improvements in our serve-stale implementation, but others may remain. It can be complicated to discover the root causes of cache problems. In particular, it is difficult to capture and recreate the conditions that triggered the problem in the lab. One of our major initiatives for 2021 is building better test capabilities to realistically simulate a heavily loaded resolver.
We made a lot of improvements in BIND quality assurance in 2020, which enabled us to maintain a predictable monthly release cadence while adding more test tools and more platform coverage to our GitLab continuous integration system. Automating the testing of the RPMs we have been producing also helped us keep up with the monthly updates. We published the gitlab-runner scripts we are using for BIND as open source, and our QA Manager even found time to contribute a feature to git (https://git.kernel.org/pub/scm/git/git.git/commit/?id=296d4a94e7231a1d57356889f51bff57a1a3c5a1).
We issued nine BIND CVEs in 2020, from 2020-8616 through 2020-8624. Details can be found in the BIND 9 Security Vulnerability Matrix in our Knowledgebase. This is the same number of CVEs we had in 2019.
CVE-2020-25705 SADDNS was announced by researchers; it’s a way of exploiting ICMP implementations to make it feasible to mount a DNS poisoning attack on vulnerable resolvers. The fix is in the operating system, but this CVE prompted us to “tighten up” the application of DNS cookies to help prevent more spoofing-type attacks in the future.
We welcomed two new developers to the BIND team in 2020, with another joining on January 1, 2021. One of these new team members implemented the improvements to serve-stale and another has already taken over work on DNS over HTTPS (DoH), so we are expecting a strong 2021.
The DHCP team had two major accomplishments in 2020: we released the first multithreaded version of Kea, and we developed a new graphical dashboard called Stork. We also added another software engineer to the team, specifically focused on assisting with customer-support issues and questions.
Transforming Kea into a multithreaded application took nearly a year of focused effort, as we had to modify both Kea itself and a number of hooks. We also put effort into various benchmarking tests, to ensure that the result was much faster than the single-threaded version. Our motivation for this project was to push Kea to a significantly higher throughput, something our service provider customers were asking for. We achieved dramatic performance gains, which we shared in a webinar and KB article. Our High Availability hook is now the limiting factor in Kea throughput in most scenarios.
The second big accomplishment was the development of our first project with a graphical user interface, Stork. While we plan to make Stork a management dashboard for both BIND and Kea, Kea was the clear priority. A number of greenfield carriers, mostly providing local fiber access networks, are using Kea. They have no incumbent management tools, and they wanted something like the Anterius dashboard we did as a Google Summer of Code project two years ago. We set out to build a monitoring dashboard that could also eventually serve as a platform for configuration management. The result was Stork, which was built with resources from the DHCP development team using Go and Angular, both new technologies for ISC. We are not “reinventing the wheel” but rather integrating with powerful open source management utilities, such as Prometheus and Grafana. Stork is still considered experimental, but is already quite useful and is approaching a 1.0 version in mid-2021. Before we declare it to be production-ready, we would like to finish implementing automated testing for the UI, and add the ability to view and search lease files.
Stork provides a quick view of Kea server status, and is particularly useful for monitoring High Availability status and pool utilization. Stork consists of an agent running on the application server, which discovers which ISC daemons are running, and a web server which displays current status of the machines and applications being monitored. This enables us to provide an integrated view of system resource usage and application activity, to support troubleshooting.
In addition to these two main activities, the DHCP team has added many new features and tweaks to accommodate the rapidly growing number of enterprise ISC DHCP users migrating to Kea. Several of these fall into the category of features we have made more flexible to accommodate additional operational requirements: we added support for bootp and leasequery; expanded our DDNS, client classification, and host reservations support; added support for multiple IP addresses per reservation; and made improvements to our High Availability hook. We are also continuing to track IETF work on DHCP, including draft-ietf-dhc-v6only, although standards work on DHCP has dropped off considerably vs. prior years. We maintained a regular schedule of monthly Kea maintenance releases while supporting a growing support customer base. As we did with BIND, we reformatted our Kea Administrative Reference Manual to reStructuredText format and posted it on Read the Docs. Finally, we sponsored a series of free technical webinars to train new Kea users, developed and delivered by Carsten Strotmann.
Towards the end of 2020 we started work on making Kea more secure, adding basic HTTP authentication and access controls on the remote management interface. Kea was designed to be run on a protected internal management network, so previously we recommended running a local HTTPS proxy in front of Kea to secure access, but we are seeing more users deploying large numbers of Kea servers, where this becomes impractical. We are planning to continue working on Kea security and on more granular access controls in 2021, as our support customer base expands from mostly service providers to include an increasing proportion of enterprise users, many of whom have enterprise-wide application security standards.
We released versions 4.4.2 and 4.1-ESV-R16 of ISC DHCP. The software is now in minimal-maintenance mode, but we are still receiving and responding to pull requests and issues from open source users for ISC DHCP, particularly the client and relay. We did add one new feature: support for the new IETF draft, https://tools.ietf.org/html/draft-ietf-dhc-v6only-08. We have also addressed a number of issues submitted by users, with patches that are published in our GitLab repo.
ISC as a whole made a significant investment in producing pre-compiled images for BIND and Kea in 2020. There seems to be a general trend towards ready-to-use software, and we received a number of requests to add packages. We added new Kea packages to support ARM architectures and Alpine Linux, and we expect Stork users to rely primarily on packages rather than building from source. We also added a package for the Kea Migration utility, which is based on ISC DHCP. The BIND team published an official Docker image for BIND, and we have requests for the same for Kea. While many of our traditional BIND support customers prefer the control of building their own software, we are seeing new customers - Kea users in particular - who are sold on containerization.
An ISC developer wrote and contributed the BSD kernel implementation of ILNP, the Identifier Locator Network Protocol. This is an IETF draft that has significant potential in an IPv6-only environment, such as a datacenter, to simplify the creation of overlay networks.
In 2020 we implemented a Code of Conduct for communications in ISC-sponsored fora, including our user mailing lists and GitLab discussions. We would like to welcome new users, and encourage users who might feel intimidated to participate actively in the open source and Internet communities. We have struggled to find the resources to support Outreachy interns, but we welcome any ideas and suggestions about how we could engage with new or discouraged open source users. We believe enabling wide engagement in and “control” over the Internet is an important part of our mission.
We missed getting to travel and see each other and the rest of the Internet community this year. To make up for that loss, we produced and shared two ISC Pandemic Cookbooks (Spring, and Holiday), which were fun projects to maintain connections among our team and social media followers.
We would like to recognize our significant technical contributors from the user community, including:
We also offer our thanks to:
What's New from ISC