Why forwarding is a Bad Thing

Thu Mar 22 19:00:19 UTC 2001

>>>>> "Brad" == Brad Knowles <brad.knowles at skynet.be> writes:

    >> [1] Clueless admins don't understand the concept. They
    >> mistakenly believe that if the first forwarded target doesn't
    >> give the desired answer, the name server will try the
    >> second. And so on.

    Brad> 	When you say "the name server will try the second",
    Brad> which name server are you referring to?  The first one to
    Brad> which queries are being forwarded, or the one that is
    Brad> forwarding the queries? 

The forwarding server. We see this question on bind-users fairly
frequently. "My server forwards to A and B. When I lookup foo, A says
it doesn't exist. Why doesn't my forwarding server then forward to B
which knows foo does exist?"

    Brad> 	Or does this mean that only one forwarder is ever
    Brad> used, and the name server that is forwarding the queries is
    Brad> now just a little more intelligent in choosing the machine
    Brad> to which it is forwarding queries?

In BIND8.2.3 and BIND9, yes. Modulo the usual RTT smoothing in case a
faster emerges.

    >> [2] Forwarding set ups are usually not documented at all. This
    >> gives rise to all sorts of nasty operational problems. Server A
    >> forwards to B which forwards ... to A. Debugging those subtle
    >> SERVFAIL errors can be entertaining. Or if some server's cache
    >> has bad data, finding out how it got there and tracing it back
    >> to the source of the problem can be troublesome.

    Brad> 	The documentation is the /etc/named.conf file itself,
    Brad> right?  I mean, it's pretty obvious when a machine is
    Brad> forwarding queries, isn't it?

Yes, but that presumes you can read someone else's named.conf file or
find the admin for that other server, if there is one. Even though the
config files document the setup, that wasn't the documentation I was
alluding to. I was thinking of a DNS architecture document describing
the name server setup, the interaction between them, server locations,
who's authoritative for what, who can query through any firewall, how
queries are handled, etc, etc.

    Brad> 	Now the poisoned cache propagation problem, that I can
    Brad> understand.  But this is why we run all recursive caching
    Brad> name servers in non-authoritative mode, so that even if they
    Brad> get their cache poisoned somehow, this won't be propagated
    Brad> authoritatively to anyone else.

And one central cache beaten on by everything else in the world is a
wonderful place to spread that infection.

    Brad> 	IMO, there's only so much you can do about poisoned
    Brad> caches, and beyond setting them up to be
    Brad> recursive/non-authoritative, the best thing you can do is
    Brad> run the latest stable version of BIND, so that you should at
    Brad> least be as resistant as possible to poisoning.

True, but that's only part of the story. You should have added "and
get your servers to only query authoritative name servers if at all
possible".

    >> [3] The addresses of the forwarding targets get hard-wired into
    >> config files. [Why not let your server find the addresses of
    >> other name servers for itself by following the NS records?]

    Brad> 	But the target machines are themselves
    Brad> recursive/caching-only servers, and therefore would not be
    Brad> advertised in the NS RRset for any zone.  So, how else would
    Brad> you find out about them?

You don't. I meant only query authoritative name servers that are
found by following NS records. Why forward a query to another name
server for resolution when the forwarding server is perfectly capable
of resolving the query for itself? Put more simply: what's the point
of having a dog and barking yourself?

    Brad> 	I can see that.  Indeed, it would be very helpful to
    Brad> be able to forward to name servers by their own name, in
    Brad> addition to/instead of by their IP address.  This would
    Brad> allow you to renumber them at will, while keeping the
    Brad> forwarding structure intact.

Er, but what if your server has to forward the queries to resolve the
address of those names? :-)

    Brad> 	Right, but if you have a large farm of machines and a
    Brad> caching name server running on each of them, simply because
    Brad> of asking similar questions at different times, each of them
    Brad> is guaranteed to build up a slightly different picture of
    Brad> the world than each of the others.

Fine. That's how it should be, shouldn't it?

    Brad> 	At Skynet, our central caching name server systems (a
    Brad> pair of identically configured machines, plus a third that
    Brad> was much less powerful) were each handling on the order of
    Brad> 200-250 queries per second (on average), and when I took a
    Brad> single mail server and removed the local forwarding caching
    Brad> name server that was running locally, I immediately saw a
    Brad> jump on the order of 50-100 queries per second added to the
    Brad> central servers.  Do this for a few other machines on the
    Brad> network, and you suddenly bury the central caching name
    Brad> servers by asking them to handle at least one order of
    Brad> magnitude more DNS queries per second than it had previously
    Brad> been handling.

Fair enough, but this still doesn't convince me. Why not run a caching
only name server on each mail server and leave them to get on with it?
Why should anyone care if the cache on server foo is different from
the one on server bar? I just don't understand why you'd want or need
to have identical caches on these two systems: it's not as if the
names being looked up by foo and bar are identical after all.