How an OOM Issue With BIND 9 Led to Discovering a Memory Allocator Glitch
We recently dealt with an interesting case of a fleet of BIND 9.Read post
The first release of BIND 9 was in September 2000. In the intervening 16 years, we have issued 225 more releases, give or take a few. We have continuously added new additional features and RFCs.
BIND 9 is a big project: at last count there were 691,554 lines of code* in BIND. That is 3 times the size of PowerDNS, 5 times the size of the Unbound resolver, and 6 times the size of the Knot authoritative server. According to the Cocomo model, the BIND 9 codebase is estimated to represent 138 person-years worth of software development effort.
As we have added to the original BIND 9 over the years, the code has gotten increasingly complex. This complexity has made it difficult and error-prone to modify. Since we cannot test all the code paths in some of the more complex areas of the code, we may introduce new bugs inadvertently. External developers tend to be limited to working on only the less-complex areas of the code, and even the core team is reluctant to modify some logic.
We tried and failed to do a complete rewrite of BIND already, through the BIND 10 project. Recently, inspired by the alarming experience of spending several weeks trying to pinpoint the source of a severe bug in particularly complex part of BIND, we have decided to start gradually refactoring BIND 9. The goal is to rationalize and simplify some of the most complex functions to improve maintainability. In the process, we hope to also improve quality, and in some areas, remove performance bottlenecks.
The McCabe Cyclomatic Complexity Index is one well-known measure of the complexity of a function. This measures how many different paths there are through the code. As a general rule, if C is the complexity, then:
|number of functions in BIND 9 with C > 20||number of functions in BIND 9 with C > 50||number of functions in BIND 9 with C > 100|
Witold Kręcicki, the BIND developer who proposed this refactoring project, devised an index to measure the overall complexity of a software system, based on how many complex functions it has.
WPK Maintainability Index
This index measures how many functions need refactoring, and indicates how deep that refactoring needs to be.
He looked at the complexity of several other software systems, including two from ISC, and two newer DNS systems. In the chart below, lower numbers indicate a need for more extensive refactoring. Overall, BIND is more complex than Kea, Knot, or PowerDNS and less complex than ISC DHCP. This correlates with what we know about the maintainability of those projects.
We also checked to see where our worst bugs, the critical defects that are published as CVEs, are located. We expected to see some correlation with high complexity code, and we found it.
In the past 2 years ISC has published 21 CVEs in BIND 9. 18 of those were in current public versions of
named (2016-2848 was in an unsupported version that was shipped in some operating system distributions, 2016-2775 was in lwresd, 2016-1284 was in our subscription branch only).
Out of these 18 bugs, 13 were in overly complex functions (cyclomatic complexity > 20). 10 were in very complex functions (> 60) (see table below for more details).
Our current plan is to refactor three major functions and files in our next major release, BIND 9.12. We estimate this will consume around 25% of our BIND development resources. This means, one engineer will be dedicated to refactoring while the remainder of the team focuses on fixing bugs, supporting users, and adding new features.
We will target the most complex functions which we know are frequently exercised code paths, where we have a lot of demand for new development. The goals for each function will vary slightly, but overall, the objectives are:
Our initial targets for refactoring include:
We hope to complete these three and release them by the end of 2017 in BIND 9.12. After we finish that, we hope to be able to continue with refactoring. A few top candidates for 2018 are:
namedhas one thread running on one core, and receiving all the requests, and it gives each job to a worker. This causes a lot of context switches, and moving work from one core to another core is costly. The idea is to have multiple listeners, one on each core. This will require redesign of the handling of the incoming connections.
We are limited in what we can do by our funding model: we are funded primarily by support revenues from users who subscribe to annual support contracts. These users are paying for priority action on bugs that impact them, attention to feature requests, and troubleshooting and diagnostic help. While they will certainly welcome improved code quality, the reality is that everyone would like someone else to fund that. In addition, refactoring is going to mean putting new feature development in some areas on hold, while we are re-writing functions those features will use.
So, while we plan to dedicate 25% of our BIND resources to refactoring, we may have to modify that plan if we can’t find funding for refactoring.
Our first-year goals are to refactor or redesign a small percentage of what needs to be refactored. We hope to be able to continue the refactoring, and deepen it to include removing obsolete features and associated code, in coming years. If we can do this, we can rejuvenate BIND and prolong its relevance for another decade.
If your organization would like to support this BIND refactoring effort, please contact email@example.com to discuss making a donation. Individual donors, consider making a donation to ISC and mentioning “refactoring” in your comments.
* All figures provided for Lines of Code include blanks and comments.
|CVE||Most complex fn in bugfix||Function complexity|
What's New from ISC