RTT Banding Removal From BIND 9
In response to our customers and colleagues, ISC has chosen to remove the RTT Banding feature from BIND 9, starting with BIND 9.8.0. Other supported versions will have RTT Banding removed in their next releases.
BIND 9.8.0 is scheduled to go out on March 1st, 2011. 9.8.1 will follow around a month later.
Before Banding
Prior to implementing RTT Banding, BIND 9 used a simple method of measurement with decay to ensure that the best known server is used to resolve queries. If one server for a domain was 10 ms away and another was 80 ms, BIND 9 would use the 10 ms server most often, but still periodically try the one that was further away. This provides a good mix of performance and adaptability in changing network situations.
About RTT Banding
RTT Banding was a security tool intended to add a small amount of randomness to make spoofing harder. This feature is implemented in BIND 9.5.x, 9.6.x, and 9.7.x versions of BIND. RTT Banding is a recursive server feature.
RTT stands for “Round Trip Time” – the time it takes for a question to reach a remote DNS server, and for the answer to come back. Typical times are within a small number of milliseconds for servers close by, up to 10 ms for servers at exchange points or within the same local area, and around 60 to 80 ms for US east to west coast traffic.
RTT Banding relies on a domain having at least two name servers, and that those name servers fall within the same band. If a domain has two name servers both of which fall into the first 0 - 128 ms band, an attacker cannot know which server was used for resolution, thus making a spoofing attack two times harder. If a domain has four servers in the same band, an attack is four times harder.
Compared to other anti-spoofing prevention mechanisms, this two to four times increase is minor. Port randomization can make an attacker’s likelihood of success between 4,000 and 65,000 times lower alone.
Why Banding is Bad
A very common scenario is for a large content provider to spend some time optimizing their DNS service. They may choose to install authoritative servers closer to customers to make DNS resolution as fast as possible.
RTT Banding defeats the benefit of locating servers closer to customers as queries may go to remote servers as frequently as to those which are closer.
For example, if a content provider has two servers which BIND may choose from, one 10 ms away and another 80 ms away, which is typical for US coast-to-coast traffic, with banding the average latency for queries is 45 ms. Without banding it would be slightly higher than 10 ms.
The minor security improvement experienced by using RTT Banding, compared to its serious effect on performance, has persuaded ISC to remove this feature from BIND 9.



Comments
I'm curious why you'd decide to remove RTT banding entirely rather than making this a configurable option defaulted to off. It seems to me there'd be some use cases where you'd want the security over performance and some where performance was far more important.
In 9.7.4 and 9.6-ESV-R2, which are the to-be-released versions on the 9.7 and 9.6 release train, we will be doing this.
At a .0 release we chose to disable it fully rather than make it an option. We will revisit if this should be an option again based on feedback like yours, and it may be reintroduced in 9.8.1 as a configurable option.
That said, we believe the same level of security can be obtained without the locality performance hit. Banding provides a typical 2x difficulty in certain types of attacks and a 4x typical maximum. Adding just 2 or 4 ports to the random selection pool would attain this same level of security in all cases rather than depending on the number of NS servers in the cluster and their timing, and still provide the latency characteristics that are desired.
Another option would be to make the bands smaller, e.g. 10-30 ms instead of 128 ms. 128 ms is so large that it allows users to go overseas, not just from coast to coast, and that's the big killer.
If you add this back with a configuration option, you could make the band size configurable.
We thought of that as well.
There are better ways to solve this that don't require configuration. BIND 10 uses a different approach that I think we may want to use in BIND 9 as it "feels right" both from an operational/performance view and a balanced security view.
BIND 9 will select the best server, and degrade the others over time so eventually they are tried again, and then lock onto the best again.
BIND 10 uses a probabilistic approach where server use is equated to the relative values of the measured RTT values. So, for instance, if one server is 40 ms away, one is 50 ms away, and one is 200 ms away, the 40 will be used slightly more than the 50 one, and each of those would be used on average 4x more often than the 200 ms one, and collectively 8x more often.
BIND 10's algorithm has the benefit that even slow servers are tried based periodically, but weight is given heavily to servers which are closest.
This, I think, will be the best solution going forward.