With the recent spate of patch releases of BIND due to security issues, I thought that it was worth putting fingers to keyboard to shed some light on the sources of these problems and what ISC is doing about them.
ISC has a formal process for handling reports of security bugs. If we think the reported issue is serious enough, we will issue a release of the software containing the fix, and a security advisory explaining the problem. Although underlying reasons can be different, many of these advisories report the cause as the issue “triggering an assertion in BIND, after which BIND exits.” So what are assertions, and why do they cause BIND to crash?
When writing BIND 9, the authors were very mindful of security. They considered a security hole that allows a compromise of the machine on which the name server is running (for example, by allowing remote code execution) to be worse than one that causes the program to exit. A flaw that allowed an attacker to control the information returned in response to queries opens the way for fraudulent and illegal activities; process termination, although a denial of service, was judged to be less harmful.
To catch programming errors that could lead to such a compromise, BIND was created using a “design by contract” paradigm. In this approach, assertions are made throughout the code as to the state of variables at certain points. If these are violated, BIND will do a controlled termination rather than continuing with possibly corrupted data. There are thousands of such assertions scattered throughout the BIND source code. Note that the assertions are not concerned with the data received by BIND (either in the form of DNS requests or responses) - BIND has no control over that. Instead, assertions are made about the way the program is operating.
A good example of this is the problem (CVE-2015-5477) that resulted in the release of BIND versions 9.9.7-P2 and 9.10.2-P3 in July 2015. Here, a function in BIND required that the variable into which it was going to place a pointer to data should be NULL (i.e. empty), and included an assertion to that effect. The reasoning for this was that in order to use the function, the programmer should be aware that the contents of the variable will be modified, and should signal that awareness by ensuring that it was NULL before calling the function. Passing a non-NULL variable to the function might indicate that the programmer had made a mistake and that the variable was already pointing to valid data. Should this be the case, overwriting the variable might lead to invalid data being used and a possible compromise. For this reason, BIND was written to terminate should that state of affairs occur.
In this particular case the variable had been set to NULL and the function called. However, a corner case (which amazingly, seems never to have been reached in 15 years of use) required the returned data to be discarded and the function called a second time. The error lay in the fact that the variable was not reset to NULL before the second call. As there was no way for BIND to tell that the content of the variable was pointing to discarded data, as opposed to valid data, BIND took the safer course and terminated.
The next question, of course, is why that testing didn’t catch that bug, and this is something we have asked ourselves. Actually, since the bug was in all versions of BIND 9 (which was released in 2000), a good question is why it wasn’t seen in everyday use. It is believed that BIND is the most widely used name server on the Internet and so all instances combined handle a vast number of queries every day. Multiply that by fifteen years…
The answer is that for most of the time, all these instances of BIND handle queries and responses whose format conforms to the DNS standards. So the code paths executed by BIND in normal use are a subset of those possible. Any errors in those paths will be quickly discovered. Typically the errors resulting in security alerts lie in little-used paths that handle ill-formed queries and responses, and the bug lay in one such path.
All ISC code is reviewed before being included in a release. It is also run through static analysis tools that help identify issues such as these. Unit tests and system tests are also run on the code. ISC testing (and testing by others) does include the sending of malformed queries and responses to BIND, but the number of possibilities is vast and testing did not explore all cases.
It is at this point that “fuzzers” come in and, in particular, the “American fuzzy lop” (AFL) program. Fuzzers generate incorrect packets and send them to the program under test (in this case, BIND) to try to cause it to fail. AFL is effective in doing this because it monitors the code coverage in the program under test and varies the packets so as to maximize the amount of code executed. The vulnerability described above was discovered using AFL, as was one of the vulnerabilities (CVE-2015-5722) that triggered the September 2015 release of BIND 9.9.7-P3 and 9.10.2-P4.
Discovery of that latter vulnerability was enabled by a recent enhancement to AFL that significantly increased the rate at which fuzzed packets were tested. Following that report, ISC ran exhaustive tests with the AFL, picking up several non-critical bugs (all corrected in the 9.9.8 and 9.10.3 releases of BIND). ISC staff discovered the second security vulnerability during a painstaking visual examination of code similar to that in which the bugs were found; it was subsequently picked up by AFL during additional testing with the fuzzer. In view of its effectiveness, ISC has now incorporated AFL into its BIND test suite.
Dr. Stephen Morris, Sr. Director of Software Engineering