Surprise bugs and release schedules

I know this won’t be a shock to anyone, but software has bugs.

Sometimes they are discovered and have little real impact — perhaps a few lines of code change and are easily tested. Ideally they occur early in a release cycle so they don’t really affect much. Most of the time these are minor and are easily put into a release at any point.

It is a very different thing when a big one comes along just before a release, or affects many parts of the code. Often times, due to the scope of the change, these require extensive review and testing before releasing fixes for them. They may even be architectural flaws.

At ISC, we strive to release the best code we can. We change how we do things when something isn’t working. We have people who’s job is to think about and decide when to stop a release cycle for a newly found bug, a fairly new thing for us.

It is always a tricky decision when a major bug comes along late in a release cycle and could potentially delay a release for more than a month. One such issue came along recently, and it might be informative to know how and why we decided what to do.

This particular issue deals with what happens when a trusted-key statement has a bad key in it. This can occur from a mistake or starting with an old key. BIND 9 will work very, very hard to try all possible paths to reach something that might work. We thought BIND 9 tried too hard at times, and now we have proof. We have analyzed this problem and are now working on a fix for it.

Although this bug will cause BIND 9 to generate more traffic than it should, it is due to misconfiguration. Until the root is signed many people are using interim methods such as Trust Anchor Repositories (TARs) and ISC’s DNSSEC Look-aside Validation service (DLV). Each of these requires a trusted-key statement to be placed in named.conf. When these keys are bad BIND 9 responds badly. Keeping these keys up to date is critical to keep a resolver working, either by updating from the TARs or using DLV.

Meanwhile, we have at least three different release versions at or nearing completion. Do we hold up the releases for this? Do we change what this release does to mitigate the issue?

This bug has existed for many years, perhaps as long as BIND 9 has had DNSSEC support. That would be BIND 9.0.0, released over 10 years ago. This is not a new bug, but with more people using DNSSEC, it has come to light now. Every supported version of BIND 9 has this bug.

Rather than delay other very useful features and fixes, we decided not to delay our releases. Our plan is to release a well-tested fix in 4-6 weeks. As this is a very involved problem, we want to get it right rather than make things worse with a quick patch.

For this specific bug, where an administrator will lose the ability to resolve domains and the misconfiguration should be discovered and corrected fairly quickly, we chose to not derail our release plans. If this were a different bug we might have made a different choice.


Leave a reply

Last modified: June 17, 2013 at 6:34 pm