- By Adib Behjat on July 6, 2010
ISC has announced that there were some backwards compatibility problems in the 9.7.1 release. Here is a bit more information on the topic. These problems were also in 9.7.0.
The first issue was a problem in how those versions of BIND 9 processed certain formats of negative responses. In particular, BIND 9′s internal logic expected certain records to be present because that is what BIND 9 generated. Some other types of servers (many were custom-created it appears) did not include everything we expected to find, and sometimes those had to be queried for.
When these records were not found in the message, we would “fall off the end of the list” and return a “not found” error, which was treated as a hard failure by the upper-layer code, and SERVFAIL was returned to the client.
The second issue was a protocol strictness issue. The versions of BIND mentioned above require a much more strict message response from servers. Specifically, 9.6 and earlier would allow messages without the AA bit (authoritative data) set to be accepted as answers if the rest of the message appeared to be an answer. This was done as a work-around for some TLD servers which mis-handled the AA bit.
When the TLD servers were fixed to correctly set or not set the AA bit, BIND 9 was told to start paying attention to it as the protocol specifies. However, it appears other servers (once again, many custom servers as well as load balancers) also do not properly set the AA bit. This caused those domains to fail, and SERVFAIL returned to the client.
The BIND 9.7.1-P1 release addresses these two issues. There is no configuration knob for the “missing records” issue as it is a bug in BIND. We are not including a tunable option for the strictness check in the 9.7.1-P1 release.
We may re-introduce the “AA bit strictness” check back into a BIND 9 release. However, should we do so, it would be done with more notice, more testing, and it will be an option at that time. Because this follows the “be gracious in what you receive” principle we may choose to not require strict protocol compliance instead.
If people see the need (not desire, but need) for the “AA bit strictness check” to be an option sooner, we could consider it for 9.7.2, which is scheduled for release in September. However, at this time, with 9.7.1-P1, we feel reverting to previous behavior is easier to test and less disruptive for a patch release.
- By Adib Behjat on June 17, 2010
To mix metaphors, my e-mail has been ringing off the hook after my previous article (“Perspectives on a DNS-CERT“) and I’ve had to think deep and difficult thoughts about what we really mean by DNSCERT, and whether DNS-OARC really has the capability or really can grow the capability to operate such a thing. I’ve had some discussions with ICANN and with members of the DNS-OARC board and staff, and it’s time I checkpointed the current state of my thinking about all this.
First, DNS-OARC was convened as an operational and technical body, and they’ve stuck to that vision, and they’re likely to continue to stick to it. This means that the technical and operational functions associated with a DNSCERT seem natural and necessary to the DNS-OARC folks, and, subject to clearing it with their membership and having a viable funding model, they’re ready to march forward.
Second, ICANN has heard the community’s reaction loud and clear, that the world wants them to remain a technical coordinating body, and to not become an infrastructure operator over and above what they already do for their “L Root”. They’ve also heard my arguments about how easy it is to find seed funding for possibly unsustainable activities and that the proof of a proposal’s viability comes in its fourth year not its first year. ICANN can be of great help to a DNSCERT both in doing the “gap analysis” as they’ve already done, and in socializing and publicizing the idea to their GTLD and CCTLD holders who would have to join and sponsor a DNSCERT activity if it’s ever going to amount to anything.
Third, DNSCERT as envisaged by the ICANN SSR “gap analysis” is a different
goal set than DNS-OARC’s. Some things DNSCERT would do are outside of the scope of DNS-OARC, and some things DNS-OARC is doing and/or will someday do are beyond the scope of DNSCERT. There’s substantial overlap, but I was wrong earlier when I said that DNS-OARC should do it all.
I think what’s needed is a new nonprofit corporation (“The DNSCERT Foundation” or similar; let’s call it TDF here) whose members are other international nonprofit corporations representing DNS stakeholders — such as ICANN, DNS-OARC, various CERTs, CENTR, MAAWG, APWG, and a few dozen others. Current and future members of DNS-OARC will join and sponsor the DNSCERT activity through their DNS-OARC membership and additional restricted grants of money and of “like kind” resources including personnel and equipment.
DNSCERT should be a joint venture across the entire DNS industry, and the 24×7 “watch floor” should be distributed across the globe. Much of the technical and operations work should be outsourced to the participants, who by running a tool set in common and doing training in common including sending personnel to DNSCERT HQ on a quarterly or annual rotation, will form an extremely robust and redundant asset base for the DNSCERT function.
TDF’s main purpose would be to define a DNSCERT Functions Contract and then enter into a joint venture with DNS-OARC Inc to execute that contract. TDF’s role in the JV would be governance and oversight. DNS-OARC’s role would be execution. TDF’s governance activities would include research above the raw technology level, such as system level risk assessment and contingency planning. For example, perhaps ICANN’s ill-fated “DNS Root System Scalability Study” could be retried in this broader framework since ICANN’s track record for hiring consultants to write reports and recommendations isn’t working.
I’ve socialized and refined the above proposal by talking to a lot of people, most of whom did not give me permission to thank them publically. I do have permission to mention that Ondrej Filip (.CZ), Leslie Cowley (.UK), Frederico Neves (.BR), Jay Daley (.NZ), and Jeff Moss (DefCon) think that something like this is worth investigating further. My first order of business is to expand that list — if you and/or your company would like to weigh in positively on this proposal, please send me e-mail and I’ll add you to the list, or you can add a comment to this article.
Importantly, neither ICANN nor DNS-OARC wants to take the next step of making a formal public statement of support of this approach unless the community has first given the nod. Therefore I’m asking ICANN to schedule a BOF session in Brussels, and I hope it’s early in the week like Monday or Tuesday, where we can get a whole bunch of DNS stakeholders (including many DNS-OARC members) in a room and find out whether the community has a will and if so what it is.
- By Adib Behjat on June 4, 2010
BIND 9.7.0 introduced automatic in-server signature re-freshing and automatic key rollover. This allows BIND 9.7, if provided with the DNSSEC private key files, to sign records as they are added to the zone, or as the signatures need to be refreshed. This refresh happens periodically to spread out the load on the server and to even out zone transfer load.
However, BIND 9.7.0 and 9.7.1 will not perform this slow signing step in one case. When a new Zone Signing Key (ZSK) is being rolled to, BIND will very quickly re-sign the zone using this new key and remove the old key from use.
This was not an accident. We believed that this is what operators would want, since getting an old key out of the zone as quickly as possible means it can be removed from the DNSKEY record set as quickly as possible.
For operators of large zones, this caused problems. A small zone which can quite quickly be re-signed is unlikely to notice the effect our current method causes. Large zones may have sudden and large CPU increases as signatures are created, and large zone transfers which may interfere with publication requirements.
I will go into some technical detail about how the key timers work, how they are used in the command-line tools, and how BIND 9.7.0 and 9.7.1 uses them. I will also explain the functional change we plan to include in BIND 9.7.2 to change this behavior to ease operational problems.
Description of Key Timers
Before we can dig too deeply in how a gradual key roll may occur, I need to describe the internal state BIND 9 maintains on a particular key. This is a description of how a Zone Signing Key (ZSK) is tracked within BIND 9.7. A similar method is used for Key Signing Keys but is not documented here.
In BIND 9.6 external tools were used to re-sign the zone, namely dnssec-signzone. The keys were managed externally to the server process, usually manually. Key management was performed by controlling which private keys the command-line tool had access to when signing was performed. In BIND 9.7, management of keys has been moved into the server.
A ZSK changes states using defined points in time. These states are: Created, Published, Active, Inactive, and Removed. The private key file itself maintains these timers and they are set upon key creation or through a command line tool. Once specified, these timers trigger the necessary state changes. The key states are used by the server for key selection when signing and which to include in the zone DNSKEY record set.
Figure 1: ZSK Timer Relationships
In Figure 1, the arrows represent events (blue) or times (yellow) when the key state changes. Between these events or times, The periods we are most concerned with inside BIND 9 are marked with the letters A, B, C, and D.
- A: The key is created and on disk, but not yet included in the zone file and not used to sign any records. Keys can be pre-created as early as desired.
- B: The key is published in the zone, but is not yet used to sign any records. The new key must be pre-published at for at least as long as the TTL on the DNSKEY record plus any master to secondary transfer delays.
- C: The key is published in the zone, and is active for use in signing records. How long a key is used is a local policy decision. Recommendations range from rolling very frequently to five years or more. Each has its merits, and they are not discussed here.
- D: The key is published in the zone, but is inactive and is no longer used to sign any records. How long a key remains in this state is dependent both on when it was last used and the TTL values in the zone.
- The last state (not shown in the figure) is removed. It is no longer used by BIND in any way. This state is terminal.
Section D is somewhat interesting because it has two components. At some point during this interval, BIND 9.6 tools would have removed all signatures made with this key as signatures were refreshed over time. This is the first part of this box. In BIND 9.7.0 and 9.7.1 the duration is 0, causing an immediate spike in all types of resource requirements.
The second part of box D represents the maximum TTL value in use in the zone. The key cannot be removed from the zone until all records signed with this key have expired from caches. This is based on zone contents and is unchanged between server versions.
Figure 2 describes what is a typical ZSK roll. Exactly one ZSK is active at all times. This is for form of key rolling that is expected to be used in production as it puts the least strain on resources by not having more than one signature on a particular record.
The rolling period dnssec-signzone uses is described by R. This is directly related to the zone’s signature validity period. If signatures last 30 days, they must be refreshed before that 30 days have passed or they will become stale, and the zone will fail to validate. This typically happens sooner than strictly necessary, but it must occur (using 30 days as an example) at least (30 days – MaxTTL) to avoid problems.
With the command-line tools, when a ZSK becomes Inactive, the key would no longer be used to sign records. The new key would be used for all signatures, and this happens as signatures are refreshed. This does not mean no signatures exist for this key, only that no new ones will be created using that key.
BIND 9.7.0 and BIND 9.7.1 treat this transition from old to new key as an immediate trigger to re-sign the entire zone with the new key and remove the old key’s signatures. This causes a huge delta change in zone contents; increases server CPU load; and increases resources needed hold and transfer this zone data.
This change was not an accident. The purpose in this immediate re-signing was to remove the old key as quickly as possible. This is a good idea in an emergency rolling, but the behavior change was unexpected and has caused operational problems.
In Figure 2, one key stops signing records at the exact moment another key begins signing. This is what is expected to be done with ZSKs in practice as it minimizes all overhead of storing and transmitting two signatures.
Nothing currently in BIND 9.7 or proposed here disallows overlapping active regions. A record may be signed many times by many keys, and the overhead may be necessary at times for particular types of key rolling. However, it is critical that while active regions may overlap, they must never be disjoint. If at any time there is a gap between keys BIND 9 cannot correctly maintain the zone and the zone will appear broken to validating resolvers.
Proposed change in 9.7.2
The proposed change to 9.7.2 is to make the command-line tool behavior the default visible behavior. Signatures will transition from old to new key as the records re-signing timer expires. Additionally, a key will not be removed from the zone until BIND 9 knows that all signatures using that key are removed from the zone and it is safe based on the TTL to remove that key.
It is important that keys have sane timer values set or the zone may become broken when rolling to a new ZSK. BIND 9 may need to alter the administrator-supplied values for pre- and post-use publish in order to ensure the zone does not break. It should also be possible for BIND 9 to be provided only key start/stop times and have a reasonable pre- and post-use publish time calculated based on zone TTL values and last use of a key.
It may also be necessary for some keys to be used past their end date. An example of this would be if a key is added but no following key is provided. Rather than break the zone, the older key may continue to be used, with sufficient notification in the log files to indicate this is happening.
The expected impact of this change is anticipated to be minimal. Those anticipating the same transitional key rollover when migrating from command-line tools to BIND 9.7′s autosign feature will see expected behavior. Those who were unaware of the difference and for whom it was operationally insignificant will remain unaware. There will be a short-term difference in zone size as a result of the DNSKEY record set being larger for a longer period of time. A safety control may need to be added to ensure that a key is not removed after deactivation until it is safe to so so, producing warnings if triggered.
There may be installations or circumstances where the current 9.7.0 and 9.7.1 behavior is needed. To counteract this, we plan to introduce a new control that allows this immediate behavior to be enabled. I believe a “resigning roll duration” set to “automatic” or “immediate” may suffice. This will allow the administrator to choose the old immediate behavior if they really want it, or for immediate issues like compromised keys. It balances updates over time by default, choosing the best rate to resign to ensure all signatures are updated before the old key is scheduled to be removed.
- By Adib Behjat on May 24, 2010
I seem to read all the time that open source projects must be less secure, since the bad guys can look through the source code to find vulnerabilities. I was pleased to see an article today that takes the point of view that security through obscurity is not the right direction and that open source projects can be more secure than competing proprietary software.
Ram Mohan has written an article “In Defense of BIND: Open Source DNS Software Yields a Better Breed of Secure Product” that is quite worth a read.
- By Adib Behjat on May 10, 2010
The press seems to love stories of doom and gloom. And for almost as long as the Internet has been around, there have been dire predictions of some resource exhaustion, success disaster or security flaw that will destroy the internet. And who is the villain in this week’s piece? DNSSEC and the signing of all the root servers.
While I love a good story as much as the next person, it seems time to actually throw a few facts on the fire.
What is DNSSEC and what is “signing”?
The Domain Name System Security Extensions (DNSSEC) is a way of ensuring that when your name server queries the authoritative servers for example.com, you have a high degree of certainty that the answers you get back actually came from the example.com servers and haven’t been altered in transit.
Based on the trust anchors that you configure for your name server, when your name server validates a DNS response, it will go link by link in the chain. Each link “signs” its piece by using public key cryptography to create a digital signature. Your resolver can use that signature to validate that the server you think should sign a link has actually signed it and that what you receive was correctly signed.
For the DNS label “host.example.com.”, you would go to the root and get the signature that will let you validate “com”. From “com”, you would get the signature that will let you validate “example.com”, and so on. In order for this to work, the root servers sign all of their RRsets, including the RRset for “com”. The “com” servers would sign all their RRsets, including the ones for “example.com”.
What really happened on May 5?
The root server operators didn’t want to just turn on DNSSEC, start signing the root zone and hope nothing broke. They came up with a detailed, phased plan for deployment. The first stage was to hand out signature records that were not validatable but would test if handing out DNSSEC signatures would break anything.
On 25 Jan 2010, the first of the 13 root servers starting handing out a Deliberately Unvalidatable Root Zone (DURZ). This was done in phases over several months so that if something did break, not all root servers would be serving the new data and name servers that couldn’t accept the DURZ would still have access to root servers.
What happened on 5 May 2010 was that the last of the 13 root servers started serving the DURZ. And if you’re reading this, I think you can assume that the Internet is still working.
It is important to keep in mind that while the roots are “signed”, they are still not serving a validatable root zone (i.e. you can’t yet use the roots as a trust anchor). The current schedule is that a real signed root zone will start being served July of 2010. This will make the roots usable as a trust anchor.
While this will be a real milestone, the Top Level Domains (TLDs), such as .NET, .COM, etc. will also all need to be signed before you can have just the root as a trust anchor. Until then, additional trust anchors, such as dlv.isc.org, will still need to be used.
What’s the real problem?
The underlying issue is DNS response size. “Conventional wisdom” for years for configuring firewalls, proxies and load balancers was that UDP packets were 512 bytes or smaller and that DNS only used TCP for zone transfers, not queries. Sadly, both of these are wrong. For quite a while, DNS responses did fit in 512 byte packets. However, as the Internet grew and DNS was used for more and more things not originally envisioned by its designers, packets did get bigger.
DNS did have a back up plan for larger packets; the truncate bit (TC). If a nameserver had a response where all the required records in an RRSset would not fit into a 512 byte packet, the nameserver would send back an incomplete response with the TC bit set. That was supposed to tell the querying server that the response was truncated and that it should retry the query using TCP, which doesn’t have the 512 byte restriction. Too bad various middleware boxes were configured to not allow DNS over TCP…
The IETF developed the Extension Mechanisms for DNS EDNS0 to allow querying nameservers that could accept larger than 512 byte UDP packets to tell the authoritative nameserver how big a packet they could accept. And life should have been good. Except for folks that blocked larger than 512 byte packets… And all sorts of intervening network boxes and middle ware boxes did all sorts of broken things with fragmented UDP packets, broke EDNS0, filtered DNS queries over TCP, etc.
The big problem this first phase of the signed root deployment is trying to catch is letting folks find all these broken middleware boxes so that they can be fixed before a signed and validatable root zone file is deployed.
These broken boxes really should be fixed and the vendors have no excuse. EDNS0 & queries over TCP have been around for more than ten years. IPv4 fragmentation has been around for decades. This is not DNSSEC specific; DNS needs this fixed even if you are not planning on deploying DNSSEC.
Will my DNS stop working?
Probably not. OARC and the root operators have been monitoring this rollout carefully. By watching increases in TCP query traffic and packet sniffing, they have determined that the number of nameservers behind truly broken middleware boxes is quite small. Those boxes should be fixed, since this breaks any large DNS response; this is not a problem particular to DNSSEC.
DNS is very robust. You may have timeouts of UDP, falling back to smaller UDP packet sizes or falling back to TCP but it takes a lot of broken-ness to truly make DNS unusable.
Now that the root is signed, how long do I have to sign all my zones?
As long as it takes you to determine you have a business case for doing DNSSEC, determine your trust anchor policy, your signing policy and do a phased rollout plan of your own.
There is no requirement for you to sign your own zones just because the root is signed. You should sign your zones when you have a good reason to sign them.
I would like to start using DNSSEC validation. Do I have to sign my
You do not have to sign your own zones in order to start doing DNSSEC validation. Odds are that you’ll want to start signing your zones at some point. But all you need to do for DNSSEC validation is to choose a trust anchor, enable DNSSEC validation in your nameserver configuration and put the trust anchor in your nameserver configuration.
What if I don’t want to do DNSSEC? Will DNS still work for me?
Yes. One of the design requirements for DNSSEC was that it shouldn’t break DNS for people not using it. If you never sign your zones and never turn on DNSSEC validation, even if the rest of Internet does, you will never notice. Your DNS will continue to work as it has in the past. The only caveat is that you are behind one of the really broken middleware boxes, you should get it fixed. But you’ll need that for regular DNS responses larger than 512 bytes anyway.
Where do I go from here?
- Test your resolver’s network path’s ability to deal with EDNS0 and large UDP packets
- RFC 2671: Extension Mechanisms for DNS (EDNS0)
- DNSSEC related RFCs