- By Adib Behjat on May 10, 2010
DNSSEC is coming. Is your organization ready?
The DNS community is buzzing with activity around the implementation of the DNS Security Extension, DNSSEC. In simple terms, DNSSEC provides a “chain of trust” within the DNS hierarchy and the authentication of DNS responses. Once deployed across the DNS, DNSSEC will render the infamous man-in-the-middle attack a thing of the past.
But DNSSEC adds many new twists to running a DNS service, both for authoritative and recursive customer-facing servers. At ISC, we are frequently asked what “being ready for DNSSEC” really means. Here are some things that you can expect when deploying DNSSEC.
DNSSEC changes, in general
Regardless of the type of server (authoritative or recursive), many changes to the operational environment will come about with the adoption of DNSSEC.
Larger UDP packets
DNSSEC adds additional data to each answer that a server returns. These larger answers can sometimes exceed the packet sizes expected by some software and hardware. These larger packets may be returned even when the client is not using DNSSEC.
Increased TCP usage
Larger UDP packets may trigger a more frequent fall-back to TCP. TCP is less efficient than UDP for DNS because it causes more network packets to be transferred for a single query and places additional memory requirements on both client and server.
Increased memory requirements
A DNSSEC signed zone is anywhere from 4 to 14 times the size of the original zone, when using RSA keys and signatures. This increase is dependent on the key size. Memory requirements will also increase when a new key is being “rolled.” This has a particular heavy impact on the DNS infrastructure of larger TLD Registries and others who service a very large zone.
Firewalls, load balancing, and DNS
Firewalls can cause no end of problems for the larger responses DNSSEC generates. Load balancer hardware may block larger packets or packets containing new DNSSEC record types. Both of these network components may decide an incoming packet is not acceptable and block it or change it in ways which break the DNSSEC protocol.
This list is for people who run servers that serve DNS zones and who wish to sign them.
Additional procedures when changing records
DNS used to be simpler. Zones only changed when new records were added or removed. This is no longer the case with DNSSEC. A signed zone is quite simple in concept: keys are added to the zone and used to generate signatures on the records in the zone. The complexities are in the details.
A DNSSEC zone has an imposed record order. This order is a necessary component of the means by which DNSSEC signals that a name does not exist. This ordering is handled automatically by the signing tools. However, if a new record is added to a signed zone, and the zone was not re-signed to add the signatures and this ordering, that new record would not be accepted by a validating client or resolver.
Typically, signatures are valid for a short time, such as one month. These signatures need to be refreshed by re-signing the records (or the entire zone) periodically. If a signature is allowed to expire, clients and resolvers cannot validate that data and will treat it as a failure. It is critical that zone signatures be maintained.
Just like maintaining keys to an office building or your house, keys are critical to the security of DNSSEC. Without sufficiently strong keys and policies to prevent them from being compromised, DNSSEC adds no security.
The length of a key is one measure of its strength. Choosing a too-long key will cause more work on each resolver and slow the signing process. It will also generate larger signatures, increasing load in other areas. Using a too-short key will decrease security. Because DNSSEC uses two keys (one which can change much more rapidly and signs the majority of the zone data) a smaller, 1024-bit key may be sufficient for the “zone signing key.” A larger value may be desirable for the “key signing key,” perhaps 2048-bit. Please choose key lengths with caution and understand the implications carefully.
Key compromise might occur when an employee leaves a company and they choose to take a copy of the private parts of the DNSSEC keys with them, or through an attacker gaining access to a server, or because you generated weak keys and attackers were able to guess them.
Careful selection of TTL
Each record in DNS has a “time to live” (TTL) value associated with it. This value tells a cache how long it may retain its copy of the given record. In general cases, the TTL on one record has no effect on other records in the zone. DNSSEC still uses a TTL, but there are subtle changes in how the TTLs of various types of records interact. In some cases, the TTL on the DNSKEY record may decrease the effective TTL on other records in the cache, for instance.
Keys and their signatures are one of the largest query types a server will generate. It is desirable to keep a large TTL value on keys, and indeed on many types of records. A larger TTL can cause problems when performing certain DNSSEC procedures such as rolling a new key. It may also increase recovery time when something bad happens, such as letting signatures expire.
Because each zone is different and its usage patterns vary, we do not suggest a single one-size-fits-all TTL value. However, we do not suggest longer than a week nor shorter than an hour for most TTL values which are not expected to change rapidly. A shorter TTL assists with agility for key rollover but puts more strain on the servers and network.
To effectively deal with any unforeseen disasters, emergency procedures must be developed to re-generate keys and re-sign the zone in a worst-case scenario. For large zone files, this can further impact the requirements for memory, bandwidth, and performance.
This list is for people that run servers which serve customers, or other recursive clients.
New failure types
Queries that used to succeed may suddenly fail in new and creative ways. Previously the most common reason for a query not working was a simple network reachability problem; DNSSEC increases the variteies of networking failure that can occur. Queries can also fail for other reasons such as signature failures caused by poorly maintained DNSSEC signed zones.
Some organizations attempt to monetize failed DNS lookups, or attempt to be helpful in some way by providing an automatic search for possible terms when a user types an invalid address in a browser. This will break DNSSEC for the clients of this resolver if these clients are also performing DNSSEC validation. Allowing customers to opt-in or opt-out of any redirection service is required for end-to-end DNSSEC validation.
What to ask your vendors
Firewall, load balancers, and other middleware boxes
- The first and most important question to ask them is: Will this software or hardware interfere with DNSSEC? If they hedge or do not know, there may be danger ahead.
- Are UDP packets larger than 512 bytes handled correctly?
- Will the new record types (DNSKEY, RRSIG, NSEC, NSEC3, etc) be handled properly?
- What changes will be made to a packet by this software or hardware? If the records are changed or filtered in any way, chances are things will break.
DNS service providers
If you outsource any part of your DNS infrastructure, you may want to ask some questions about their DNSSEC plans.
- How are keys protected?
- How well do they understand DNSSEC?
- Who can you call when things are broken, and is it likely to be handled immediately or will it require waiting for “the DNSSEC expert” to wake up?
Geo-DNS and other global methods
Many companies use some form of load balancing based on the perceived location of the client. This generally involves rewriting the IP addresses or names returned. These services can still be used with DNSSEC by splitting the records out into their own zone and delegating that zone to the load balancing system.
Each record variant could also be signed, but this may be much harder to do in practice due to lack of production and experimental knowledge in this area. For now, using DNSSEC with any form of a server which generates answers on the fly or based on client location should be avoided.
DNSSEC adds many new complications to DNS. However, with careful planning, its usefulness will outweigh them.
If you need help
- ISC offers expert support and consulting services to help migrate into DNSSEC.
- ISC has training for all things DNS, and a class specifically for DNSSEC.
- If outsourcing service of zones is desired, ISC’s SNS is a commercial-grade DNSSEC DNS service.
- ISC is the author of BIND 9, which as of 9.7 will perform automatic zone maintenance and resigning.
- BIND 9.7 also supports automatic maintenance of trust anchors for early adopters wishing to gain experience with a validating resolver.
- If maintaining a pile of trust anchors is undesirable, ISC’s DLV service can assist.
- By Adib Behjat on May 2, 2010
In this interview we see yet another attempt by a technology executive to discredit all roads that do not lead to their products and services. Since in this case the creative pot shots are aimed at my company’s products and services, and since this is far from the first time these canards have been trotted out, I’ve decided to respond for the record.
[DNS] is an industry that has seen very little innovation.
This is false by inspection. The DNS industry innovates both on the wire with protocol extensions such as dynamic updates, transaction signatures, real time change notification, incremental zone transfers, and data authenticity — to name just a few of the dozens of protocol extensions defined by the IETF in the years since DNS was first defined, and off the wire with implementation improvements in areas such as performance, usability, and correctness, and with transparent on the wire changes in operational practice such as load balancing and global anycast.
…unfortunately, this has resulted in [BIND] being the most commonly exploited DNS server.
I think that since BIND has an 85% market share, it’s natural that we’d be the most commonly attacked DNS server. Fortunately our source code is open and we have a software auditing team of planetary proportions. My own experience with software has been that software with a smaller market size has the same bug count (per million lines of code) as software with a larger market size, and that proprietary software without constant public inspection has a much higher bug count than open source software.
Many of the distributed DoS attacks that knock nameservers offline are related to inherent weaknesses in BIND’s technology.
Can you provide a live repeatable example, even one, among the “many” you are claiming here?
Unless your company proactively stays on top of all the latest BIND news, you’re vulnerable.
How is this different from any other software an enterprise might run, such as Windows or Mac/OS or Oracle or Linux? It seems to me that the choice to outsource one’s operations should (and will) be made on a cost:benefit basis or on a philosophic basis, but not based on FUD.
We have many enterprises and service providers as customers of ISC’s paid support services for BIND, and also our consulting, training, and software enhancement services. These customers range in size from SME to the Fortune 500, and they’ve determined that in-house open source fits their business plans and corporate philosophies. Please don’t imply that they are idiots just because they don’t subscribe to your business model.
- By Adib Behjat on April 18, 2010
There have been some questions about why BIND 10′s first milestone release only supports SQLite3 for storing zone information. I hope I can answer some of the questions by explaining how and why we came to this decision.
Part of the decision was a simple matter of time. We knew we would only have resources to implement a single data store. We ended up implementing two, but one is a trivial one: authors.bind and other static zone content.
That explains why we chose to implement only one, but why was it SQLite3?
BIND 9, which we are improving on here, has a rock-solid in-memory database implementation. When BIND 9 was written, this database was our only working data store, which tended to cause us to come to expect its behavior. For example, some things that are very easy to do when you have all the data at hand are very hard to do efficiently when you must search for the answer. When we ran into the need for some special attribute of a stored name or data, we would just add it.
This soft of thinking made our code very dependent on the characteristics and behavior of our in-memory database implementation. Worse, it blurred the boundary between the application code and the database in many ways, since if we didn’t like what it did, we could just change it. We controlled both sides of the API.
We have had many users requesting that BIND 9 perform SQL or LDAP queries for authoritative data. There are some contributed code for SQL, but it is extremely inefficient as there is no caching of answers. It also uses an additional layer between the full-on BIND 9 Database API, called the “sdb” or “simple database” API. Simple here is a relative term.
We did not want this sort of thinking to happen in BIND 10. We knew from the start that we could do an in-memory database, but the SQL beast was a new thing for us. It has many possible ways it can be used: Can we write to the database? Does it have necessary ordering capability for DNSSEC? How long should we cache query results for, if at all? What does the schema look like?
After all those questions are answered in SQL terms, the design and implementation of an in-memory database looks like a walk in the park. This was the compelling reason to choose SQL.
Once we decided to do SQL first, SQLite3 was the logical choice. It requires no server, no license, and has a very easy to use API. It is fairly fast, and has language bindings for pretty much every modern language out there. In short, it was a painless way to get SQL without making it hard on those trying out our release.
I hope this helps explain some of what we thought about before deciding on an SQL back end for the first-year release. We fully intend to implement in-memory, non-SQL on-disk, and many SQL variants before the project is done.
- By Adib Behjat on April 15, 2010
2010 is shaping up to be a banner year in at least two areas: major steps toward the deployment of DNSSEC, and discoveries of operational snags affecting the deployment of DNSSEC.
An example of the former took place on March 25, when it was announced that the ARPA TLD had been signed. ARPA contains the sub-zones in-addr.arpa and ip6.arpa, which are used for reverse DNS: converting IP addresses to DNS names. It is an essential piece of the DNS infrastructure, and the signing of ARPA makes it possible for reverse lookups to be cryptographically authenticated via DNSSEC.
Unfortunately, an example of the latter took place a short time later. The public key for ARPA was placed in IANA’s Interim Trust Anchor Repository (ITAR), then detected and published in ISC’s DNSSEC Lookaside Validation (DLV) zone, dlv.isc.org. Suddenly, and for several hours afterward, recursive resolvers that relied on ISC DLV for DNSSEC validation were unable to answer reverse DNS queries at all.
The problem was caused by obsolete data persisting in resolver caches, and presents a good opportunity for a discussion of things that can go wrong with DNSSEC at transitional moments, such as the initial signing of a zone.
Caching trust chains
To check the validity of DNSSEC signatures, a resolver must fetch a copy the zone’s public DNSSEC key (DNSKEY), and then it must prove that the DNSKEY it fetched is also valid… which may involve checking it against yet another DNSKEY, and so on, until the validator reaches something that it unambiguously knows to be valid. That last thing is called a “trust anchor”, and is configured into named.conf using the “trusted-keys” or “managed-keys” statement. For DNSSEC validation to work, a resolver must be configured with at least one trust anchor.
A DNSKEY is considered valid if one of the following conditions is seen:
- The resolver has a trust anchor that exactly matches that DNSKEY
- The resolver is configured to use DNSSEC Lookaside Validation (DLV) and has a trust anchor for a DLV zone (such as dlv.isc.org), and the DLV zone contains a record matching the DNSKEY, or
- The zone’s parent contains a delegation signer (DS) record matching the DNSKEY, and that DS record can, in turn, be validated using the parent zone’s DNSKEY.
When the DNSKEY, DS, or DLV records are fetched, they are cached by the resolver. (When they don’t exist, the fact of their nonexistence is cached instead.) After a time, cached information expires and is removed, and is refreshed by new queries. But the different records can expire at different times, leading to inconsistency in the cache if, for example, a new DLV record is found that is inconsistent with an old DNSKEY still in the cache. Such inconsistencies can cause validation failures, which will continue until the last obsolete record has expired from the cache.
ARPA: What went wrong
In the case of last month’s signing of ARPA, here is what happened:
- The ARPA zone was signed and a DNSKEY was inserted at the zone apex. DNSSEC-aware resolvers began receiving signed answers.
- Attempting to validate, a resolver using DLV would fetch a copy of the ARPA zone DNSKEY record, then look for a matching DLV record at arpa.dlv.isc.org. It didn’t find one, so the zone was deemed not to be secure
- The DNSKEY was stored in the cache, with a trust level indicating that DNSSEC validation had not taken place.
At this point, everything in the resolver cache was consistent. The zone didn’t validate, but that’s okay–it wasn’t supposed to. But then:
- The new DLV record was inserted into dlv.isc.org
- The “negative cache” record indicating that arpa.dlv.isc.org did not exist expired, and was removed from the cache.
- The resolver received an answer from the ARPA zone, and found a cached DNSKEY record, but no information about the DLV record. So it looked up the DLV record, and this time it found one.
- Now the resolver had a valid DLV record indicating that ARPA was secure… but a DNSKEY record in its cache which had never been validated. It therefore incorrectly assumed that the cached DNSKEY had failed validation, and so it returned SERVFAIL.
The good news is that a workaround was available to resolver operators: Remove the old unvalidated DNSKEY by flushing cached data for the ARPA zone apex. The command to do this is “rndc flushname arpa”, and it forces the resolver to fetch a new copy of the DNSKEY that can now be fully validated. The bad news is, not every resolver operator was in a position to know about this.
Other transition issues
It’s not only DLV users who’ll have difficulties of this sort. An identical problem can arise if a new DS record is inserted into the parent zone, or if a trust anchor is configured into the resolver, without the cache being cleared. Simlar problems can also happen further up the DNS, e.g., if your zone already has a DS record in the parent, but then a trust anchor is created for the parent zone.
Problems like this can happen when any zone is signed, but they are more likely to occur if the zone is a popular one, such as a well-known search engine or a top-level domain. These are more likely to be in a resolver’s cache when the transition occurs.
ISC is currently working on fixes to BIND 9 to minimize or eliminate all disruptions of this type. We’re taking our time on this one, in hopes of ensuring that we cover all the possible failure modes. We don’t want to just fix the specific ARPA problem and miss some other bug that’s waiting to bite us next month. The fixes are in progress and will be available in future versions of BIND 9.
What you can do
In the meantime, we can offer a few tips, for both authoritative and recursive name server operators, that should help with transitions.
As an example, let us suppose a TLD is being newly signed: WX, controlled by the beautiful and exotic island nation of West Xylophone. The WX registry operator generates keys, signs and publishes the zone, then places a public key for “wx” in the IANA ITAR and ISC DLV (or, assuming this takes place after the signing of the root zone later in 2010, submits DS records into root). How can she ensure minimal DNS disruption?
- Reduce TTL values
Every record in a zone has an associated TTL (time to live) value, which indicates how long it should be stored in a resolver’s cache before being discarded and refreshed. Every zone also has a “negative cache” TTL value set in its SOA record, which indicates how long a resolver should remember the fact that a record does not exist before looking for it again.
Longer TTL values are often a good thing: they reduce the load on an authoritative server by ensuring that repeat queries from resolvers come in less frequently. But longer TTL values also lengthen the possible disruptions when things change.
So before placing DS records for your zone in the parent zone or submitting DLV records into dlv.isc.org, consider temporarily reducing your TTLs. If your authority servers can handle the additional load, reduce the negative cache TTL, DNSKEY TTL, and SOA TTL values to five minutes (300 seconds). Wait for at least as long as the longest of the former TTL values; this ensures that all the old records will be purged from caches and only records with the new TTL values will still be around. Now you can publish the DS or DLV record; there may still be validation failures for some resolvers, but they will last at most five minutes. After ten minutes, it’s safe to restore the TTLs to their original settings.
Meanwhile, resolver operators concerned about the ability to validate responses from the WX zone after it is signed and trusted can take steps as well:
- Flush the cache when adding a trust anchor
Some operators, instead of using ISC DLV, prefer to configure trust anchors themselves; they will track changes in the IANA ITAR or other trust anchor repositories, and update their resolver configuration whenever a new key is published. If you do this, make sure that the old key is flushed out of the cache.
The simplest way to do this is kill and restart the resolver. But that involves some downtime and some increase in latency, so you may prefer to keep your resolver running. To do this, add the newly published “wx” key to your “trusted-keys” or “managed-keys” statement in named.conf, run “rndc reconfig” to load the new configuration, and finally run “rndc flushname wx” to remove the cached DNSKEY record, forcing it to be re-fetched and validated against the trust anchor.
- Stay informed of new DNSSEC deployments
The addition of top level domains and other critical zones to DLV are announced on the “dlv-announce” mailing list. This can provide some forewarning for DLV users so that they can run “rndc flushname” quickly if it turns out to be necessary.
- Update resolvers
When the fixes to these problems are complete, installing the latest versions of BIND 9 will make the hoop-jumping much less necessary.
- By jreed on March 10, 2010
The past few months, the BIND 10 developers have been using a test-driven development model. As classes and functions are coded, corresponding unit tests are also coded to help verify the routines do what is expected — with good or bad input providing correct results. Sometimes the unit tests are written before the new code or the tests are written soon after. (We don’t always follow the definition as defined at the Wikipedia where the developer first writes the test case which fails.) Providing full unit testing is a requirement of our human code review processes which we follow before importing code for official public release.
BIND 10 is developed in C++ and Python. Back in October, one of our developers, Jinmei experimented with some simple C++ test cases usingCppUnit, CxxTest, googletest, and Boost.Test. (These historical experiments are in our subversion repo.) We decided to use googletest (aka gtest) for C++ and the standard PyUnit for Python. (We didn’t do any experiments of Python unit testing frameworks.)
We provide code coverage reports to show if all code is adequately tested by our unit tests. The reports also indicate what lines of code and functions aren’t used. To build BIND 10 with this support, configure with the –with-lcov option and after building use “make coverage” to generate an HTML report.
We use LCOV which is a front-end to GCC’s gcov report what lines of code are actually executed when running the unittests. LCOV generates HTML webpages to highlight the code (for lines covered and not covered) and showing coverage rates with bar graphs.
The GCC gcov(1) manual page says:
“Software developers also use coverage testing in concert with testsuites, to make sure software is actually good enough for a release. Testsuites can verify that a program works as expected; a coverage program tests to see how much of the program is exercised by the testsuite. Developers can then determine what kinds of test cases need to be added to the testsuites to create both better testing and a better final product.”
We automatically generate new reports when changes are committed to the subversion source repository. The latest C++ unit test code coverage report is at http://bind10.isc.org/~tester/LATEST_UNITTEST_COVERAGE/ and the latest Python unit test code coverage report is at http://bind10.isc.org/~tester/LATEST_PYTHON_UNITTEST_COVERAGE/. (Note that we are still working on automating and improving the python tests.)
The unit tests and the unit test coverage reports have been useful to recognize bugs and to improve the code. (For example, see revisions 361, 387, 404, 426, 451 and many others.) In addition, when we have noticed bugs (outside of the unit tests framework), new test cases have been added to catch them.
We also do other automated source code checking which I will introduce in a different blog article. We will be expanding our automated testing to include higher-level feature tests on different hardware platforms and operating systems and microbenchmarks for charting performance of key components and functions. If you’d like to participate by coding or implementing unit tests or higher level test suites or by running a build slave, please let me know.