subdomain delegation issues

Mon Aug 18 22:44:08 UTC 2003

Phillip L wrote:

> Hi All.
>
> Until recently, i've been running bind for our enterprise network, with
> two single nameservers.  This has worked fine (for ever!)
>
> My most recent experiments involve installing new bind DNS servers at each
> remote office, these are interconnected with godawful slow 56k data
> lines...  Each remote office also has broadband internet access...
>
> Here's the picture.   At the top of my tree, i have :
>
>         example.net
>
> at each remote office, i have a dns server, with has been delegated
> responsibility for it's own clients :
>
>         office.example.net
>
> This delegation works downwards.  Clints of example.net can ping by name,
> clients of office.example.net  however...
>
> to save bandwidth, i have configured each dns server at the remote offices
> to retreive and cache dns data directly from the internet.  The net result
> is that when an office.example.net client tries to ping anything above
> itself in the tree..  eg..
>
>         client.example.net or even
>         client.office2.example.net
>
> The server tries to resolve example.net from the internet root servers,
> which have no idea about our internal dns system...
>
> I have tempoarily told all office.example.net dns configurations to
> forward all requests to ns.example.net which works, but uses precious
> (56k!) bandwidth.. im sure there's a better way

Well, I'm not sure how you expect a nameserver to resolve names for which it
is not master without some form of forwarding, iteration or replication.
Basically this boils down not so much to "how can the nameserver
(supernaturally) know the answers to the queries without using the network?"
but rather "how can I minimize the bandwidth usage of my remote
nameservers?". It's a common problem, unfortunately it is also subject to a
number of variables, so there is no one Right Answer. I'll step you through
my thought process.

First of all, you could simply stick with what you have and fiddle with the
TTL values on your resource records. Raising the TTLs will increase cache
persistence and thus reduce bandwidth consumption. The down side, of course,
is that changes will propagate more slowly.

The next factor that comes into play is the topology of your network relative
to the topology of your nameservice infrastucture. Is it better for your
nameservers to talk directly to each other, or do would all of the packets
end up going through a central WAN "hub" regardless (for that matter, since
all locations have broadband Internet, couldn't you set up a VPN between
them)? To the extent that you have a decentralized nameservice infrastructure
*and* a decentralized network topology, then it may make more sense to
abandon forwarding in favor of an example.net "stub" zone, which will enable
each remote nameserver to talk directly -- and therefore more efficiently --
to the delegated nameserver(s) for any given office.example.net zone, without
having to go through the central forwarders (then again, if your TTLs are
fairly high and your query variety is fairly low, then a centralized cache
may make sense regardless of your network topology). Of course, you could
mimic the effect of these stub zones by defining every office.example.net
zone as a "type forward" zone in every named.conf, but why bother with that
huge maintenance headache when you could just define a single example.net
stub zone in each remote nameserver?

Lastly, I'd consider making the remote servers slaves of example.net and/or
some or all of the office.example.net subzones other than their own. As such,
all of the factors which relate to the decision between slaving and
forward/stub come into play:
1) the frequency with which the zone changes (you want to avoid repeated
replication of records that the slave doesn't really care about)
2) the REFRESH setting for the zone, relative to the TTL settings on the
individual records in it (the lower the ratio, the more expensive slaving is
relative to forwarding or stubbing, although again there's that tradeoff
between bandwidth-conservation and speed-of-change-propagation to consider)
3) the variety of queries typically experienced for the zone (a wide variety
of queries devalues the effects of caching, perhaps to the point that slaving
might make sense because all the queries could then be answered locally)
4) whether you're running BIND 9 or something earlier (Incremental Zone
Transfer (IXFR) doesn't work reliably prior to BIND 9, but if you can get
IXFR working it makes a huge difference in your zone-transfer bandwidth).

Note that if you're slaving simply as a way of conserving bandwidth, you
probably *dont'* want to publish the remote servers in the NS records of the
zone. Doing so causes queries for the zone to be sent to them by other,
non-authoritative nameservers, which, when added up, could be rather
expensive; again, dependent on your network topology. Generally, you'll want
to leave the remote servers out of the NS records, i.e. they would be
"stealth" slaves. If you still want them to get zone changes quickly, you'll
have to add them to the "also-notify", because stealth slaves don't
automatically get NOTIFY messages -- that too is another tradeoff between
bandwidth-conservation and speed-of-change-propagation.

Of course, you're not completely locked into the same methodology for every
zone. You could have a variety of different zone types -- slave, forward,
stub -- or in some cases, no definition at all, for different
office.example.net zones, finely tuned to their exact query profile,
zone-level settings and/or record-level (e.g. TTL) settings, etc., although a
polyglot config like that would probably be hard to maintain.

Since your main concern seemed to be bandwidth-conservation, I didn't even
mention the redundancy aspects of the equation. Obviously, if redundancy is
important to you, then this tilts the scales heavily towards making your
remote servers slaves of as many zones as possible.

- Kevin