- By Adib Behjat on February 9, 2010
I know this won’t be a shock to anyone, but software has bugs.
Sometimes they are discovered and have little real impact — perhaps a few lines of code change and are easily tested. Ideally they occur early in a release cycle so they don’t really affect much. Most of the time these are minor and are easily put into a release at any point.
It is a very different thing when a big one comes along just before a release, or affects many parts of the code. Often times, due to the scope of the change, these require extensive review and testing before releasing fixes for them. They may even be architectural flaws.
At ISC, we strive to release the best code we can. We change how we do things when something isn’t working. We have people who’s job is to think about and decide when to stop a release cycle for a newly found bug, a fairly new thing for us.
It is always a tricky decision when a major bug comes along late in a release cycle and could potentially delay a release for more than a month. One such issue came along recently, and it might be informative to know how and why we decided what to do.
This particular issue deals with what happens when a trusted-key statement has a bad key in it. This can occur from a mistake or starting with an old key. BIND 9 will work very, very hard to try all possible paths to reach something that might work. We thought BIND 9 tried too hard at times, and now we have proof. We have analyzed this problem and are now working on a fix for it.
Although this bug will cause BIND 9 to generate more traffic than it should, it is due to misconfiguration. Until the root is signed many people are using interim methods such as Trust Anchor Repositories (TARs) and ISC’s DNSSEC Look-aside Validation service (DLV). Each of these requires a trusted-key statement to be placed in named.conf. When these keys are bad BIND 9 responds badly. Keeping these keys up to date is critical to keep a resolver working, either by updating from the TARs or using DLV.
Meanwhile, we have at least three different release versions at or nearing completion. Do we hold up the releases for this? Do we change what this release does to mitigate the issue?
This bug has existed for many years, perhaps as long as BIND 9 has had DNSSEC support. That would be BIND 9.0.0, released over 10 years ago. This is not a new bug, but with more people using DNSSEC, it has come to light now. Every supported version of BIND 9 has this bug.
Rather than delay other very useful features and fixes, we decided not to delay our releases. Our plan is to release a well-tested fix in 4-6 weeks. As this is a very involved problem, we want to get it right rather than make things worse with a quick patch.
For this specific bug, where an administrator will lose the ability to resolve domains and the misconfiguration should be discovered and corrected fairly quickly, we chose to not derail our release plans. If this were a different bug we might have made a different choice.
- By Adib Behjat on January 29, 2010
In the Fall of 2009, the organizations responsible for generating the root zone, ICANN, Verisign, and the US Department of Commerce, announced that they had come to a agreement on how to sign the root zone with DNSSEC (DNS Security Extensions) A website has been created by ICANN and Verisign to provide information about the change and a rollout timeline.
First, a signed root zone aka “DURZ” (Deliberately Unvalidatable Root Zone) – which cannot be used for validation purposes – will be deployed across all of the root servers in a phased rollout from Jan-May 2010. If all goes well, the fully validated root zone will be put into production on the 1st of July, 2010.
As one of the twelve Root Server Operators, ISC has created this blog post to answer some common questions regarding a signed root zone and what the community can do to prepare for the change. In addition, ISC will be describing what we are doing to prepare F.ROOT-SERVERS.NET to handle a signed root.
Why do we need to sign the root zone?
DNSSEC has been developed by the Internet Engineering Task Force (IETF) so that a digital signature (RRSIG) can be applied to DNS Resource Record Set (RRSet) so that a client can verify that these records are authentic. Since DNS is a hierarchal naming system, a signed root zone means that a DNSSEC-aware client can look up a domain in say the .ORG namespace (which is one of many TLDs that have already signed their zones with DNSSEC) and can follow a completely signed (and verified) delegation path.
What are some possible side effects of a signed root zone?
The big change is that DNS responses from the root servers for ‘.’ will become larger, as they will contain the answer in the form of a RRset and its signature (RRSIG).
Before DNSSEC, most DNS packets were over UDP and smaller than 512 bytes, which was enshrined in the early DNS-related RFCs. Since most DNS responses with signed RRsets (containing the paired RRSet and RRSIG) will exceed that 512 byte limit, the IETF developed the EDNS0 extension (RFC 2671) to allow a client to request a response larger that 512 bytes (up to 4096 bytes) over UDP via IP fragments. EDNS0 is now widely supported by many device and appliance vendors (as well as DNS server applications like BIND).
However there are many devices & appliances in the wild that are still configured by default to only accept DNS packets smaller than 512 bytes or that don’t allow for IP fragments. In some cases the clients may try a smaller buffer size until they can get the response thru; in other cases, clients would then just fall back to TCP.
Is there any way I can test to see if my nameserver supports EDNS0?
You can check using the Reply Size Test Server developed by DNS-OARC to check and see if your resolver can accept EDNS0 packets and that your firewall (if there is one) is accepting IP fragments.
Note that if the results are smaller than you expect and if you are running a modern DNS software package (like BIND 9.5.0 or later), then the problem may lie behind a intermediary firewall, NAT device or router between your name server and the test server. Please fix any issues you see now or you will likely experience degraded performance from the root servers once the signed root is fully rolled out.
What is ISC doing to prepare F.ROOT-SERVERS.NET (F-root) to support a signed root?
ISC has been a long-standing supporter of DNSSEC, including extensive contributions to the protocol by our engineers, and has been operating F-root in a DNSSEC-ready state for several years now by running a DNSSEC aware DNS server, etc.
F-root is scheduled to load the signed root zone (“DURZ”) the week of the 12th of April, 2010.
By then ISC will have standardized all F-root servers to be running BIND 9.6.2, which is the first BIND release to have full support for the SHA-2 DNSSEC algorithum which will be used to sign the root zone. Note that this step is not strictly needed since the root servers are serving the content and not doing validation.
ISC will also be adjusting its monitoring across F-root so that we are now tracking both UDP and TCP queries, as we expect a increase in TCP traffic to the root servers once the root zone is signed. We will upload any significant data events during the six month rollout of the signed root zone to the DNS-OARC pcap repository so that it’s available to researchers for further analysis.
- By Paul Vixie on January 22, 2010
I was asked recently, “why is ISC a not-for-profit?” Apparently we walk
like a for-profit and we quack like a for-profit but we are in fact not
for-profit. Most companies with a strong brand like ours have share
holders. Why not ISC?
Primarily because the infrastructure we’re responsible for — BIND, F-root,
our network — has to be kept in the public interest. If the current
staff and board got killed by a freak meteor shower, it’s nice to know
that our successors couldn’t take ISC’s assets out of the public’s service.
There’s also some real freedom in being non-profitable. If we had
shareholders our goal would be to reward them, and we wouldn’t have built
the resources we need for our public benefit mission and thus could not
give away services.
To be a non-profit means we have no shareholders, no stock options, and no
dividends; it means that if our assets were liquidated then the state of
Delaware would get the proceeds; it means our employee compensation has to
fit strict audit guidelines; and it means no person or company can be the
sole or primary beneficiary of our operations.
About 25% of the money ISC spends in a year comes from unrestricted grants,
and the rest we receive as restricted grants (like BIND10 sponsorship, BIND
Forum membership, and F-root sponsorship) and commercial revenues (like
BIND support, BIND development, and BIND consulting).
It’s largely those commercial revenues that make ISC seem to walk like a
for-profit or even quack like a for-profit. But let’s break it down even
further. I can see four ways that ISC walks or quacks like a for-profit.
1. Our business operations people — finance, sales, marketing, business
development — are extremely good at what they do, and they got that way
by working for $bigcorp for decades before coming to ISC.
2. Continuity is a necessary side goal, and that means recurring revenue,
which is more reliable as commercial contracts and restricted grants than
as unrestricted grants. (Is the world ready for an ISC walk-a-thon?)
3. Is it profit or just rational exuberance? That we’ve got this great
talent pool and we’re this focused without stock options and without
dot.com level incentive compensation, astounds me no end. Success and high
spirits are the reward we expect from hard work toward relevant goals –
which is all the same whether there are shareholders or not.
4. Ambition doesn’t care about shareholders. When we make more money we
get to do more cool stuff, and we have a really long list of cool new stuff
we would like to be doing. High among my personal goals is to fix up the
company headquarters, buy the staff better furniture and computers, hire
some people so that less vacation time goes unused, offer some competitive
benefits like continuing education, and maybe upgrade some of our I. T.
plant which is in some cases five or even ten years old. (None of that
will sound very sexy unless you work at ISC, but trust me, it’s cool stuff.)
I would never want a shareholder anywhere near what we do here, because
then if the board fired me they could sell the whole thing to $bigcorp.
I’d like to be personally wealthy, but if I decided to focus on that I’d
first start a new enterprise and work on things that ISC has no interest
in, so that when it’s all IPO’d and controlled by bankers, none of my eggs
or my kids’ eggs are still in the basket.
So, ISC is a not-for-profit because this work is what we wanted to do and
we didn’t want anybody able to own it. Remembering what’s happened to the
employees and customers of $bigcorp over the years, you should be able to
imagine my discomfort at having the Internet’s core infrastructure in the
hands of capital asset managers.
- By Adib Behjat on January 3, 2010
There is nothing more sensational than the unexpected, and when the NANOG (North American Network Operators Group) community was recently informed that an ASN collision had occurred it caused a lot of people to sit up and take notice. This event was also very interesting in that researching takes us back to a time before ARIN and RIPE existed, creating an interesting historical twist.One of the groups to take notice was Renesys, an “Internet Intelligence” Company as they had one of the prime data sets to research this particular problem. As part of their business they collect BGP data from many sources, and already have many analysis tools for that data. After crunching the data, they concluded there were two more ASN’s of interest. Indeed, one of these ASN’s was in use for the ISC node in Fiji, which is one of our F Root local nodes. This added a new twist, as now the problem seemed to be affecting one of the root server operators, seeming to elevate the problem to a much higher level.
Renesys had contacted ISC directly, which caused an internal investigation. Initially it looked like a similar problem, based on the dates that ISC had been issued an ASN that was already in use by another party. In order to report this to the RIR, the initial e-mail assigning the resource to ISC was located. This would provide the original ticket number, and should help speed our query to the RIR. After this e-mail was forwarded to the team and several sets of eyes took a fresh look we realized an important error had occurred.ISC had been issued ASN 38568 by APNIC for our Fiji node. When the ASN was entered into internal databases, it was entered as ASN 35868. The 5 and 8 in the second and third positions were transposed. Once the data had been entered wrong, it then spread to other internal systems.
Fast forward to a few months ago. ISC wanted to update routing registry objects better and started a project to generate routing registry updates via script. These scripts generated objects from the internal data stores, which had the transposed ASN entry. Indeed, it is this routing registry object which renesys found in the RIPE database. Note that the object has since been removed, as it was in error.
This allows us to answer some of the questions asked by renesys in the blog entry.
Despite the fact that verification services are readily available, neither the RIRs, the companies who received the duplicated ASNs, nor their providers seems to have checked if the ASN was assigned before making and accepting the ASN assignment.Based on the timelines involved, ASN 35868 was assigned to “Logix3″ several years prior to ISC asking for an ASN from APNIC. As a result there would have been no duplicate entry for Logix3 to find when they received that ASN from ARIN. ISC was later assigned 38568 by APNIC. The RIR properly assigned the ASN, and it’s entirely likely (although there are no direct records) that an ISC engineer looked up that ASN in the APNIC database, and saw the proper entry. Indeed, renesys’s question seems predicated on the idea that a duplicate was assigned by the RIR, which did not happen in this particular instance.
Not asked, but perhaps an even better question is Why didn’t either party notice a routing issue?
There are actually many cases on the Internet where duplicate ASN’s are used on purpose. Networks may use a single ASN in multiple locations for a number of reasons, and the pitfalls are well documented. Indeed, the primary problem is that due to BGP’s loop detection, each ASN island throws away the routes from the other ASN islands, as they trigger loop detection. In short, when ISC used Logix3′s ASN by mistake it created a situation where Logix3 couldn’t see the route originated by the Fiji node, and the Fiji node couldn’t see the routes originated by Logix3. Surely someone would notice?
Well, it turns out probably not. The ISC node in Fiji is configured with a default route, and this will send traffic to its upstream ISP no matter what, and thus is able to reach all of Logix3. Logix3 may or may not be configured the same way, ISC has no way to know, however the way we route F-Root prevents a problem. First, Fiji is one of more than 50 local instances of F Root, so it’s extremely likely Logix3 would prefer one of our other instances. Secondly, ISC announces the F-Root prefix in two parts. Our local nodes, like Fiji, announce 220.127.116.11/24. Even if this route was rejected, ISC also announces 18.104.22.168/23, a covering aggregate, only from our global nodes. This route would have been passed on and accepted by Logix3 (assuming they receive full routes from their upstream). Thus as far as ISC can tell, there is no situation where this mistake would have lead to a loss of connectivity for anyone involved.
ISC quickly removed the incorrect records from the RIPE routing database to help remove confusion. The process of renumbering the Fiji node was not quite as quick, but was completed after a couple of weeks. ISC considered immediately shutting off the node, but based on the fact that we don’t believe this situation is causing any issue to either party we have decided to leave it in place until an orderly transition can be arranged.
This was an embarrassing situation for ISC. We wanted to come clean with the full details to help everyone understand what happened here so proper corrective action can be taken going forward. We are glad that Renesys and other smart folks are looking at the data and trying to find these sorts of problems, but also that it appears these sorts of mistakes are few and far between.
- By Francis Dupont on December 11, 2009
I’d like to share an idea I implemented for AFTR (so I am describing it in the AFTR context) which is a part of the debug primer and which could be integrated into BIND 10.
AFTR is managed through control channels (over TCP or a stream Unix socket) like a BIND 9 rndc but in a connected mode (so on the AFTR side it is named “sessions”). Four commands are for interest here:
The first command is named ‘noop’ and just polls the liveness of process by returning (or not returning when it is frozen) an answer.
The second command is named ‘fork‘ and as you can expect calls the fork() Unix system call (which does lazy memory copy in all modern systems). This command comes from a control channel C.
- the parent just closes C so C remains attached only to the child
- the child closes everything at the exception of C, in particular it closes the socket used for the service, and it reopens syslog. After, it waits for commands from C which acts which a live image of the process at the instant the fork() is performed.
The next command is ‘abort’ and calls the Unix abort(). It is supposed to be used after a ‘fork‘ to get a core file, so you can use postmortem analysis tools on an image of a live process.
The last command is ‘reboot‘ and restarts from the very beginning. It is implemented (at Rob’s suggestion) by closing everything and do execv() with a copy of the arguments in main().
So the AFTR debug primer says:
“Summary for the busy operator:
- noop -> nothing: go to the shell to kill and relaunch it
- noop -> expected message: open another session, send fork, wait for the child pid message, send abort on this new session. On the previous session (where you sent noop), send reboot“
In the context of BIND 10 we can implement the same set of commands:
- I believe we already have the equivalent of ‘noop‘
- a way to address commands to an image of a module is needed (but is not difficult)
- reboot by itself is not an interesting command as the control provides already a way to relaunch a module but it makes sense if the abort() on critical inconsistency condition is replaced by a fork/abort/reboot as described above. (I suggest to name this the ‘phenix mode‘)