Why forwarding is a Bad Thing

Thu Mar 22 16:57:39 UTC 2001

At 3:07 PM +0000 3/22/01, Jim Reid wrote:

>  Well I've made my views about forwarding known many times here. And I
>  suppose you must have been asleep when the topic briefly came up in my
>  tutorial at SANE last year. :-)

	I seem to recall having missed most of your tutorial at SANE. 
Yes, being on the Program Committee means that you get to sit in on 
any tutorials you want (if space is available, and it usually is), 
but it also means that there are certain other commitments made for 
your time.

	Speaking of which, I'm on the program committee again this year, 
and the announcement/call for papers should be going out very soon. 
You may want to keep a lookout for it.

>  [1] Clueless admins don't understand the concept. They mistakenly
>  believe that if the first forwarded target doesn't give the desired
>  answer, the name server will try the second. And so on.

	When you say "the name server will try the second", which name 
server are you referring to?  The first one to which queries are 
being forwarded, or the one that is forwarding the queries?  If the 
latter, then does the following really make any sense:

	997.   [support]       forwarders are now used in order by measured RTT.

	Or does this mean that only one forwarder is ever used, and the 
name server that is forwarding the queries is now just a little more 
intelligent in choosing the machine to which it is forwarding queries?

>  [2] Forwarding set ups are usually not documented at all. This gives
>  rise to all sorts of nasty operational problems. Server A forwards to
>  B which forwards ... to A. Debugging those subtle SERVFAIL errors can
>  be entertaining. Or if some server's cache has bad data, finding out
>  how it got there and tracing it back to the source of the problem can
>  be troublesome.

	The documentation is the /etc/named.conf file itself, right?  I 
mean, it's pretty obvious when a machine is forwarding queries, isn't 
it?


	Now the poisoned cache propagation problem, that I can 
understand.  But this is why we run all recursive caching name 
servers in non-authoritative mode, so that even if they get their 
cache poisoned somehow, this won't be propagated authoritatively to 
anyone else.

	IMO, there's only so much you can do about poisoned caches, and 
beyond setting them up to be recursive/non-authoritative, the best 
thing you can do is run the latest stable version of BIND, so that 
you should at least be as resistant as possible to poisoning.

>  [3] The addresses of the forwarding targets get hard-wired into config
>  files. [Why not let your server find the addresses of other name
>  servers for itself by following the NS records?]

	But the target machines are themselves recursive/caching-only 
servers, and therefore would not be advertised in the NS RRset for 
any zone.  So, how else would you find out about them?

>                                                    They also end up in
>  folklore: the details are passed by word of mouth between system
>  admins without knowing if the info is correct or even
>  appropriate. Having the addresses bolted into config files makes it
>  very difficult to renumber or relocate the target(s) of forwarded
>  queries.

	I can see that.  Indeed, it would be very helpful to be able to 
forward to name servers by their own name, in addition to/instead of 
by their IP address.  This would allow you to renumber them at will, 
while keeping the forwarding structure intact.

>            Suppose that everything forwards to the firewall and a new
>  firewall has to be deployed... If all the forwarding name servers are
>  not under one single control, it can be almost impossible to do
>  anything to the target servers.

	I can also see that, although this hasn't been a problem in any 
environment I've ever personally seen.

>  [4] Forwarding is just dumb. A name server has the intelligence to
>  navigate the name space, locate other name servers, work around
>  dead/lame ones, measure round trip times and pick the closest servers
>  and generally look after itself.

	Right, but if you have a large farm of machines and a caching 
name server running on each of them, simply because of asking similar 
questions at different times, each of them is guaranteed to build up 
a slightly different picture of the world than each of the others.

	If what you're selling to your customers is a distributed, 
replicated, fault-tolerant virtual system, then the *LAST* thing you 
want is for the individual machines that are a part of that system to 
build up differing views of the world.


	This also causes a lot of duplication of effort, as each caching 
name server goes out to find 75-90 or even 99% of the same answers as 
all the others, traffic which could have been caught and 
reduced/eliminated by having a "level two" DNS cache to which unknown 
queries are forwarded.

>                                    Instead of letting the server use
>  those capabilities, forwarding constrains it to blindly throw queries
>  at a small number of servers. This is a bit like buying a Ferrari and
>  then chaining it to a lamp-post so it can only go round in circles at
>  5-10kph. At least with 8.2.3 and BIND9, the forwarding server can at
>  least exploit the RTT to the targets to pick the fastest one.

	So long as you don't do forward-only, this is more like checking 
with your neighbors to see if someone else has already gotten a copy 
of the newspaper you want, before you go hop in your Ferrari and 
sprint down to the corner store (or halfway across the world) to pick 
up the issue you want.

>  [5] The target(s) of forwarding become single points of failure. If
>  they stop, DNS stops and the sky falls in.

	True enough, but this is why you build farms of these things and 
you do standard load-balancing/high-availability stuff to make them 
much more reliable than they already are to begin with.

>  [6] The benefits of having a single central cache for all to share are
>  over-hyped IMHO. They may have been valid in the days when a
>  campus-sized net had one 56/64k line and every off-site packet was
>  expensive, but not today. Is there *really* a lot of synergy from
>  having all the organisation's web servers and mail servers (say)
>  resolve on one name server cache? And what about having all the eggs
>  in one basket?

	In my personal experience, there is a big benefit to have a 
central cache which is shared by all machines.  At AOL, I saw a huge 
locality of reference pattern, on the order of 75-90% or more.  At 
least, I saw this during the very brief periods of time when I could 
afford to turn on query logging, because using this option sapped so 
much of the power of the machine that it could only be left on for 
the briefest of possible moments.

	At Skynet, our central caching name server systems (a pair of 
identically configured machines, plus a third that was much less 
powerful) were each handling on the order of 200-250 queries per 
second (on average), and when I took a single mail server and removed 
the local forwarding caching name server that was running locally, I 
immediately saw a jump on the order of 50-100 queries per second 
added to the central servers.  Do this for a few other machines on 
the network, and you suddenly bury the central caching name servers 
by asking them to handle at least one order of magnitude more DNS 
queries per second than it had previously been handling.


	In the experience of Nick Christenson (Internet mail expert at 
Sendmail, Inc. and my co-author for the invited talk "Design and 
Implementation of Highly Scalable Internet E-mail Systems", which I 
presented at LISA 2000), there is a big benefit to having caching 
name servers running on each mail server, so that as you scale the 
network of mail servers up, you automatically scale up the name 
server power that they are making use of.

	Nick and I have compromised by saying that you should have a 
central set of caching name servers to which all unknown queries will 
be forwarded, and local caching forwarding name servers on each mail 
server (running 8.2.3 or BINDv9, so that you get the proper use of 
RTT to determine the server(s) to which queries should be forwarded).

	This gets you the automatic scaling factor for the 75-90% 
locality of reference queries, plus the centralized "consistent one 
world" view, plus the level two caching effect.


	If anyone on this list would like to see the paper Nick and I 
wrote, sling over to 
<http://www.shub-internet.org/brad/papers/dihses/> and pick your 
favourite version.  From there, pick your file format (including PDF, 
if you want).

>  [7] Forwarding name server architectures tend to become baroque. As a
>  result they are prone to be vulnerable to subtle N-th order
>  problems. Suppose server A forwards to B which forwards to C and B
>  fails. Or gets switched off. Or renumbered. What breaks? How does the
>  problem manifest itself? How is it debugged? Remember too that the
>  administrator of A probably doesn't know their name server forwards,
>  let alone where it forwards to. And the administrator of B doesn't
>  realise that A forwards to it. See [2]. For added amusement, now add
>  per-zone forwarding to the mix. Or wildcarding.

	Sorry, I don't do any of this.  I have only one level of 
forwarding on the machines I've set up.  I have never considered 
additional levels of forwarding, and I never would have considered 
them for many of the same reasons you've mentioned here.

>  [8] A name server will usually be quicker resolving things for itself
>  than forwarding the queries elsewhere for resolving. The work to
>  resolve the name will be the same whether the forwarding or target
>  server does the job. So why introduce another (unnecessary) link --
>  in reality a single point of failure -- in the chain?

	This is true, except for the case where another client has 
recently asked for the same information you're looking for, and 
therefore this is already available on the central caching name 
servers.  Again, I have seen quite a lot of this in my personal 
experience.

>  [9] Forwarding can create extra and unnecessary traffic on the
>  internal net. The numbers of queries and answers are usually doubled.
>  This can sometimes be amazingly stupid. Suppose a forwarding server in
>  London can only forward to a server running on a firewall in New
>  York. The London server has to resolve a name that lives on name
>  servers elsewhere in London. So instead of a hopefully quick local
>  lookup the query goes from London to New York to London and back
>  again. Ho hum.

	I would never forward queries from one remote site to another, 
for the same reasons.  I only ever forward queries within a site. 
Indeed, it never would have occurred to me to even think about 
forwarding queries between remote sites.

>  [10] When using forwarding with split DNS, every forwarding target
>  server has to be configured with the details of every apex zone in the
>  internal name space. This can be very messy to set up and
>  maintain. And it probably won't be documented....

	I've never done forwarding in a split DNS environment, but if you 
keep the set of servers that are handling the forwarding queries to a 
minimum (as I have always done in all implementations I've ever been 
involved with), this shouldn't be a problem.

>  Now sometimes forwarding is the only option: say because of firewall
>  access policies or dial-up internet connectivity or (shudder!) NAT. In
>  these cases, I would grudgingly use forwarding if someone put a gun to
>  my head. Other than that, I'd avoid using or setting up forwarding
>  name servers if at all possible.

	I'll agree that there are cases where you simply can't avoid 
using forwarding, but again I've never been involved in any of them.

	I've only ever used forwarding when I had the choice as to 
whether or not to do so, and have found that using forwarding in this 
manner has greatly improved performance (e.g., minimized the average 
time it takes to answer any query), given me a relatively consistent 
"one world view" which my customers have demanded, and has not 
otherwise caused any problems I am aware of.

>  I realise your experiences at AOL have given you a warm feeling about
>  forwarding. I wasn't in that environment, so I can't say for sure if
>  this was the right way of solving the problem you described. It
>  doesn't seem to me that forwarding was needed or that your
>  "consistency" rationale justified it. [Maybe I don't understand the
>  problem or haven't thought hard enough about it.] There's always a
>  window when name servers are inconsistent: when the master has loaded
>  new data for a zone but the slaves haven't. I don't see how your
>  forwarding setup solves or mitigates that problem.

	Because only one of the central servers receives the forwarded 
query, it obviously only asks the question at one point in time, 
therefore regardless of what answer it receives from what server, 
that answer is recorded and the server is internally consistent with 
itself on this matter.

	Once that server is internally consistent on this matter, it 
hands that same answer back out in a consistent fashion, until such 
time as the TTL on that information dies out.  However, that TTL dies 
out at essentially the same time everywhere, at which point in time 
the entire process might be repeated.

--
Brad Knowles, <brad.knowles at skynet.be>

/*     efdtt.c     Author:  Charles M. Hannum <root at ihack.net>             */
/*                                                                         */
/*     Thanks to Phil Carmody <fatphil at asdf.org> for additional tweaks.    */
/*                                                                         */
/*     Length:  434 bytes (excluding unnecessary newlines)                 */
/*                                                                         */
/*     Usage is:  cat title-key scrambled.vob | efdtt >clear.vob           */
/*     where title-key = "153 2 8 105 225" or other similar 5-byte key     */

#define m(i)(x[i]^s[i+84])<<
unsigned char x[5],y,s[2048];main(n){for(read(0,x,5);read(0,s,n=2048);write(1,s
,n))if(s[y=s[13]%8+20]/16%4==1){int i=m(1)17^256+m(0)8,k=m(2)0,j=m(4)17^m(3)9^k
*2-k%8^8,a=0,c=26;for(s[y]-=16;--c;j*=2)a=a*2^i&1,i=i/2^j&1<<24;for(j=127;++j<n
;c=c>y)c+=y=i^i/8^i>>4^i>>12,i=i>>8^y<<17,a^=a>>14,y=a^a*8^a<<6,a=a>>8^y<<9,k=s
[j],k="7Wo~'G_\216"[k&7]+2^"cr3sfw6v;*k+>/n."[k>>4]*2^k*257/8,s[j]=k^(k&k*2&34)
*6^c+~y;}}