An analysis on the DNSKEY query storm problem
Summary:
We have developed a patch to BIND 9 DNSSEC validator to address a recently reported problem that the validator can cause a massive number of DNSSEC related queries at a high rate when it's configured with a stale trust anchor. This patch suppresses such queries by caching the trust anchor mismatch and temporarily caching other DNSSEC related responses toward the secure entry point, and should reduce the number of unnecessary queries by 1-2 orders of magnitude. However, the validator periodically (and unsuccessfully) tries to check the validity of the trust anchor, and a thought experiment indicates that could increase the number of queries to top level servers by 8% in some worst case scenarios (before reacting to this number, please read the analysis: It's based on an unrealistically bad scenario and using old data. Hence "thought experiment"). We believe the patch is good enough to deploy to solve the current problem in practice, but further investigation on possible tuning, optimizing the implementation, and realistic impacts on top level servers will be necessary. Feedback on various possible tuning and optimization approaches would be welcome.
Introduction:
A recently published "The ISP Column" titled "Roll Over and Die?" reported that the BIND 9 DNSSEC validator implementation can cause a "storm" of DNSSEC related queries when it's configured with a stale trust anchor, and that this behavior seemingly contributed to a sudden and continuous increase of query load at some top level authoritative servers.
On receiving a problem report of this behavior, we confirmed the problem, recognized its severity, and started developing a patch to fix it. Meanwhile, due to the severity and possible scale of impact, we thought we should carefully assess the problem and the initial patch we came up with, separately from the development. The idea was to solve the issue in a reliable manner, not by an ad hoc first-aid patch that could subsequently cause another problem or might only mitigate the problem partially.
I was asked for performing the analysis (so I was not involved with developing the patch per se). In what follows I'll explain what I found in the analysis in some detail. To be specific, I'll refer to the following example case provided in the ISP column:
> Consider TXT RRset for test.example.com in a signed example.com zone,
> and assume example.com has 2 name server addresses and com has 14.
> If the BIND 9 recursive validator is configured with a stale trust
> anchor for the com zone, it could send up to 844 DNSSEC related queries:
> 784 for com/DNSKEY to 14 com server addresses,
> 56 example.com/DS to 14 com server addresses,
> and 4 example.com/DNSKEY to 2 example.com server addresses.
(there are actually 15 IPv4/IPv6 server addresses for com as of this writing, but I'll use the number given in the column)
Confirming the Problem:
I first tried to reproduce the reported problem with the head branch version of BIND 9 (which is mostly equivalent to BIND 9.7.0, though we had already confirmed this problem was present in previous versions of BIND). The problem can easily be reproduced. I only need to add a fake DNSKEY for the .org zone as a trust anchor using the `trusted-key` statement:
trusted-keys {
org. 257 3 5 "FAKEFAKEFAKEFAKEFAKEFAKEFAKE";
};
Start a BIND 9 recursive validator, and ask it for a name under .org that could be validated with a valid .org key (e.g. www.isc.org/A). I saw the BIND 9 validator send a large number of org/DNSKEY and isc.org/DS queries.
I then looked into the code to understand how exactly this happened. This behavior of BIND 9 is explained 3-fold:
- When BIND 9 encounters a mismatch between a trust anchor and the corresponding fetched DNSKEY during validation, it treats the event as a general validation failure, and tries the original validation with other server addresses. This is why the queries could be sent to all 2 or 14 server addresses in the ISP Column example.
- If the validation chain consists of more than one secure delegation, BIND 9 doesn't cache the intermediate results (e.g., DNSKEY/DS + RRSIG for subzones under the Secure Entry Point, SEP) unless the complete trust chain is established. So, if a validation fails due to trust anchor mismatch, subsequent validation attempts that lead to the mismatching trust anchor also involve re-queries for these intermediate results. This is why, in the ISP Column example, the queries for example.com/DS and example.com/DNSKEY were repeated, not just for the stale trust anchor, com/DNSKEY.
- BIND 9 doesn't remember the fact that a configured trust anchor doesn't match a fetched DNSKEY. So, whenever it needs to validate data under this SEP, it queries for the (mismatching) DNSKEYs.
Analysis:
In general, none of the described behaviors above makes sense (irrespective of them causing a packet storm), and could be eliminated without sacrificing security level. The following are a detailed analysis on each of the behaviors:
- Behavior #1 doesn't make sense since the most likely cause of this failure is a stale trust anchor and the retry will also fail for the same reason. It should also be noted that the retry wouldn't help if this is a DoS attack by sending forged DNSKEYs: Since all possible server addresses were tried when the mismatch was found (except in the case the validation failed due to timeout), the attacker would be able to respond to queries to any NS addresses, which also means it's likely that it's an on-path attack from an attacker relatively close to the victim validator. So, even if the validator retries the whole process, the attacker would be able to respond to these queries with forged data again.
- Behavior #2 might be considered a safety guard in that the validator doesn't cache untrusted intermediate results, but we could actually cache these results as "pending" data. Even if these are forged data sent by an attacker, we won't return them as validated data unless the complete trust chain is confirmed. On the other hand, if we cache the intermediate results, we could reduce the total number of re-queries when trust anchor mismatch happens.
- Behavior #3 doesn't make much sense, just like behavior #1. When a mismatch is found it's most likely because the configured trusted key is stale, in which case re-queries won't change the result. Even if the mismatch is a result of an attack, (immediate) re-query wouldn't help as explained.
Our "BCP" Solution to the Problem:
We've developed a patch to BIND 9 to address each of the above problematic behavior. The patch works as follows:
- For behavior #1, the patch suppresses the fallback mechanism with other name server addresses when validation fails due to mismatching DNSKEY. (The server will still try other addresses if it's a general failure such as server failure (SERVFAIL) or when RRSIG is proven to be invalid).
- For behavior #2, the patch avoids the redundant intermediate queries by caching them as pending answer.
- For behavior #3, the patch introduces a cache (separately from the DNS cache) of the mismatching DNSKEY and doesn't bother to re-fetch it until the cache expires. We call this the "bad cache" cache. Cache entries expire in 10 minutes by default (and can be configured via the lame-ttl option in the range between 30 seconds and 30 minutes).
Revisiting the "test.example.com" example of the ISP Column article, this patch can reduce the number of unnecessary queries by a factor of about 50 (from 844 to 16: 1 for each of DNSKEY and DS of example.com, and com/DNSKEY to 14 com server addresses), for a single triggering query (such as "test.example.com/TXT"). Moreover, since com/DNSKEY is cached as a "bad cache", any subsequent DNSSEC queries that would require com/DNSKEY will be suppressed due to the "bad cache" caching for 10 minutes (by default), whether or not it's triggered by "test.example.com/TXT". Assuming this is a busy recursive resolver that could cause this every minute, the effect of the patch can be a factor of 600.
In addition, the patched version will emit log messages about the most likely cause of the failure:
notice: validating @0x100906200: org DNSKEY: \
please check the 'trusted-keys' for 'org' in named.conf.
so that the validator administrator will be able to notice and fix the configuration error sooner and more easily.
Preliminary Experiments on Revised Impact:
The patch explained above should significantly improve the situation, but it will still periodically send DNSKEY queries for the stale trust anchor. It's difficult to tell exactly how severe it will be, but I tried to estimate some very worst case impacts using old data I happened to have.
First, I examined a query log of an instance of F-root server taken between 4:18am and 5:18am on October 28th, 2005. It contained 3,701,880 UDP queries from 50,731 unique IPv4 addresses. A worst case scenario would be that these 50K clients install a stale trust anchor of the root zone, and send the DNSKEY queries every time the "bad cache" expires (note: many of these hosts sent only a few queries to the F-root server, but this DNSKEY query can be triggered by a query to subdomains). By default the cache expires every 10 minutes, so these 50K+ clients would send about 300K DNSKEY queries per hour. This would increase the total number of queries to the server about 8.2%.
Note that this is a very unlikely worst case: It assumes all recursive servers out there are DNSSEC validators with a stale root trust anchor, and all these servers send the DNSKEY queries every time the cache expires. In reality the number of such validators are much, much smaller, and some or many or them are less busy and send the redundant queries less frequently. On the other hand, it should also be noted that this analysis only considers the average query load. If many validators with a stale trust anchor synchronize in sending the DNSKEY queries for some reason, the peak query load might be much higher.
Next, I examined a query log collected at a busy recursive server (handling about 700qps with cache hit rate of about 80%) to see how this patch would affect a moderately busy recursive validator. I set up a test recursive (non validating) resolver that ran the head branch version of BIND 9, sent the collected query to the test server at approximately the same query rate, and counted the number of queries sent to the root (and some other major TLD) servers. The test server received 2,486,630 queries in one hour, and sent 63,706 queries to root servers in that period.
Now, if the test server were configured with a stale trust anchor for the root zone and periodically (every 10 minutes by default) sent DNSKEY queries to root servers, it would send 120 more queries to the root servers as there are currently 20 root server (IPv4 + IPv6) addresses. This is 0.19% of the original queries, so, at least from this result the revised impact on recursive validators will be marginal.
Open Issue 1: Number of Retries on Bad Cache Expiration:
One possible concern of our current patch is that it repeats the query for the mismatching DNSKEY with all name server addresses unless it finds a matching DNSKEY (once the cache expires and the DNSKEY is needed for subsequent validation). If, for example, we further limit the retried DNSKEY queries after cache expiration to 1, the hypothetical worst case on the root server shown above would be an increase of 0.4% (because on each cache expiration the client would send the query only to 1 of 20 root server addresses).
In general it's a tradeoff between increased query load/traffic and increased possibility of successful validation. But, in reality, it's less likely that trying multiple servers helps find a matching key:
- In the vast majority of cases the mismatch is due to a stale trust anchor. In this case the only solution is to update the validator's configuration. Query retries won't help whether or not it sends to multiple addresses or no matter how often it retries.
- In the rare case of DoS attacks, if it comes from an on-path attacker, trying multiple addresses won't help; an attack from an off-path attacker may be rejected by the multiple queries, but in that case the first attack attempt would have been rejected in the first place (since at that time the queries were sent to all server addresses)
So, it seems to me we can safely reduce the number of re-queries (even to one) once we identify a mismatch between a trust anchor and fetched DNSKEYs. In our current patch, however, we didn't adopt this optimization, because it would make the implementation more complicated and we basically thought the improved query load would be acceptable in practice.
Open Issue 2: Bad Cache TTL:
Another possible issue is the (default) TTL of the bad cache. As explained, our current patch uses the default value of lame-ttl, which is 10 minutes. But this is probably not a reasonable choice, because the source of the problem is different in these cases: Lame server errors are a problem at the remote server side; unnecessary DNSKEY queries are most likely to be a local configuration error.
So, basically, the problem won't be solved just by waiting for some period, and, in that sense, the cache TTL could be very long. However, in the very rare case of a hit-and-run (on-path) attack, using a reasonably short TTL may make sense. Of course, if we use a shorter TTL, the resulting load of unnecessary DNSKEY queries will be higher, so we should be careful about selecting the default value.
Next Steps:
Right now, the most important action is to distribute our "BCP" patch promptly. It will definitely make the situation much better.
Then, it would be helpful if we could get more real-world statistics to estimate the possible impact on major root and higher level servers with our revised patch. It would be a good idea to apply the analysis explained above to a more recent query logs at the F-root servers, and, if we can get help from others, it would be nice to perform the same analysis using statistics on other higher level servers.



Comments
How about this - on start up of the named process, if the configured trusted-keys contains any stale key (one that is not in agreement with what can be queried from the 'net), named won't "come up" unless the administrator has done something to tell it to ignore staleness (because they are testing a key or something that is not really and fully used the zone). Something like an option: "obscure-bypass-of-a-stale-trusted-key-check-result".
I realize that this would only be a check on servers 'booting' up with stale keys, not servers that are continually running. But, this would help raise flags.
Hi Jinmei, great blog, some remarks, some questions:
Your blog confirms the behavior _exactly_ what I reported in the article (and to ISC) a month ago. Good.
The number of com (14) and root servers (19) in the article were taken from the lab environment. We assumed the numbers were close enough to match reality. Given that the actual amount of servers (15 and 20 respectively) is higher, the load is even more severe.
Please note that this problem is also triggered by a stale DS or DNSKEY record anywhere in the chain, not just a stale trust-anchor.
Why is this called a "BCP" patch, and not a bug-fix? Is that marketing? Clearly, understanding the severity and the possible scale of impact, this can't be the intended behavior, hence it must be a bug.
Why has ISC issued two major full releases (9.6.2 and 9.7.0) with this bug, knowing the severity and possible scale of impact of it? Wasn't it a better decision to postpone distribution of these? I regard this as irresponsible.
Is ISC going to back port RFC5011 support to 9.6.2 as well? The 9.6.2 release is now able to validate root without the possibility to automatically roll the key, and has this severe bug.
Kind Regards,
Roy Arends
Nominet UK
Roy --
Stepping in for a moment here... Jinmei's quite busy finishing the BIND 10 Y1 deliverable this week... I want to say how much I appreciate that you are reading and responding to our new blog. Thank you.
This is indeed a major bug and we're treating it as such. Michael explained in an earlier blog post (http://www.isc.org/community/blog/201002/surprise-bugs-and-release-sched...) some of the reasons why we chose not to defer our planned release schedules. Here's some more background to that decision.
The most important step for us in resolving this issue with the packet "storms" was to develop the best possible solution for this problem promptly. The second most important step was to to ensure that it gets installed where it matters - on as many of those validating resolvers already running older versions of BIND as possible.
This means patches - properly-researched, thoroughly-tested and well-publicized patches. Administrators install patches to fix problems; OS vendors port and distribute patches - they don't leap on new major versions in the same way. Many administrators tell us they don't install point releases. Holding up BIND 9.6.2 and 9.7.0 wasn't going to make things better any more quickly, and releasing 9.7.0 has been of great benefit to those running authoritative servers who want to start signing their zones.
So - patched versions will be released using something akin to our security vulnerability notification processes - and we're pleased to say that they will be coming out ahead of schedule too.
Meanwhile - the patches are our best effort now ("best current practice", thats the "BCP" from Jinmei's post) for resolving this problem. There will very likely be "better current practices" in future releases. We welcome ideas and input towards that goal, always.
We had not planned to port RFC5011 back to 9.6, but we're open to feedback and discussion on that point.
Best,
Larissa
Hi Ed,
I thought I'd step in here a moment. Jinmei's very involved in the BIND 10 Y1 deliverable... so I thought I'd take a stab at responding to your feedback. We're so pleased that people are now reading and responding to the blog!
This is a very interesting idea, I will make sure it gets into our suggestion queue for upcoming versions. Any more ideas in this vein or elaboration are very welcome.
Best,
Larissa