seeking comments on setting up local copy of root zone
Danny Thomas
d.thomas at its.uq.edu.au
Wed May 3 08:32:05 UTC 2006
RATIONALE
=============================================================================
Our primary network provider does not implement anycast routing to any of
the instances of root name-servers in Australia:
http://www.apnic.net/services/rootserver/
Apart from making use of the these instances that people have gone to the
trouble of deploying, using the anycast instances would reduce load on the
'real' roots including deflection of any DOS traffic generated in that
provider's routing community, but should also improve resiliency. Having
a nearby root would reduce latency for a small proportion of queries, but
having local instances of gTLD and ccTLD/2LD name-servers would presumably
be a bigger win for latency.
NB it is not that the provider won't do anycast, but their current routing
policies would not prefer the Australian instances. Our secondary provider
does anycast to local instances of f & k
In any case we are looking at setting up the root zone locally, mainly to
reduce (potential) load on the root name-servers. We don't generate that
much traffic to the roots in the normal course of events (see below) and
while only a minority is from bogus queries (0.7%, 33%, 18%), there is
always the possibility this proportion could rise significantly. I don't
know whether previous DOS attacks on the root name-servers have included
botnets making large numbers of random bogus queries, perhaps incidentally,
but the possibility does exist.
Having a local copy of root could help in case of DOS or other problems
with the root name-servers.
Negative caching would tend to reduce root lookups for repetitive
bogus queries, e.g. unqualified hostnames.
While the root zone has a ncache-ttl aka MINIMUM of 1 day, the default bind
setting of max-ncache-ttl should limit that to 3 hours.
However the third column in the results for each name-server show the most
popular queries resulting in NXDOMAIN happen hundreds of times per day so I
must be mis-calculating/understanding the negative caching time.
NB we do master the reverse zones listed in
http://ietf.org/internet-drafts/draft-andrews-full-service-resolvers-02.txt
in fact we use 10/8, 172.16/12 and 192.168/16 blocks, but we do not master
versions of the rfc2606 reserved zones such as example (except for localhost).
Even though it doesn't seem covered officially yet (Kato's draft has expired
but perhaps dnsop has passed something to IESG), and without a copy of the
root zone it would probably be a good idea to master something for .local,
but .internal seems to have been used by a local OrgUnit.
IMPLEMENTATION CHOICES
=============================================================================
There would seem to be two ways of copying the root zone
1) using a zone transfer from the f-root, policy on availability:
http://marc.theaimsgroup.com/?l=bind-users&m=11025331560946
The concern is that if the zone transfer starts failing for any reason,
the root zone has an expiry time of only one week (i.e. shorter than the
xmas break here). Unless there is something special about the way the
root zone is handled, expiry of the root would probably lead to ugly
consequences ...
2) fetching ftp://ftp.internic.net/domain/root.zone.gz on a regular basis
(how often are versions produced?), unpacking and if the last record
is that special TXT record, rndc reloading. The advantage of this
approach is that transfer failures only result in an out of date
copy, and I would expect changes in the root zone to be fairly gradual.
Other important zones are available from that site including arpa,
in-addr.arpa, edu and int.
With either of these approaches we would:
* use our normal notify statement which directs them only to our
name-server infrastructure.
* probably refuse zone transfers except from our name-server infrastruct-
ure, unless interested people commit to restricting notifies
* deny queries from outside our network. When we implement split-dns
the root zone would not appear in the external view
* in our monitoring system, check our root serial number against that of
the real roots and alert if it too far out of sync
Comments ?
Are there any gotchas with the root zone on bind ?
Danny
STATS INCOMING/OUTGOING QUERIES FROM OUR NAME-SERVERS (all bind-9.3.2)
=============================================================================
The overall numbers for a day are summarized in the table below, a more
detailed breakdown of root queries per name-server is present later.
total internal fwd fwd outgoing root root
queries queries ns3 rbldnsd queries queries NXDOMAIN
ns1 6.9M 4.7M 2.5M 229K 1.1M 19,390 134
ns2 7.8M 4.8M - 232K 4.4M 47,773 15,833
ns3 2.4M 1.9M - 9K 4.4M 36,057 6,491
Our name-servers are combined resolving+authoritative and deny queries from
outside our network, except for zones we master. They each have a complete
set of local zones, i.e. they do not have to consult other local name-servers
with the exception ns1 forwards to ns3 and ns1/2/3 forward dnsbl queries
to some local rbldnsd servers.
The first two columns are derived from a cron job which analyses the query
logs, logged to a file channel which is considerably more efficient than
syslog. "Total queries" is the total seen in these log files, while "internal
queries" are those originating from within our address-space, i.e. excluding
external queries to the authoritative service. Ideally, internal queries
for locally mastered zones would have been separated out.
Outgoing queries were captured using tcpdump with
"-tttt -nvl udp dst port 53 and src <ip>"
where <ip> is the query-source address. The queries were grouped into three
categories depending on destination ip-address: forwarding to ns3, forwarding
to rbldnsd servers, remaining which should be outgoing.
Queries to the roots were captured using tcpdump (rather than dnstop) with
"-tttt -vln host 198.41.0.4 or ..."
The "root queries" column is is a simple count of the outgoing packets.
Any incoming traffic will be replies as the roots are authoritative-only.
The number of replies is only shown in the middle table below. We want to
know the number of NXDOMAIN replies, but also want to match the reply to the
query so we know what the question was. This is done by recording the id of
outgoing queries and matching them against incoming replies. I would prefer
to check the question but this section does not seem to be supplied by the
roots presumably to leave more space for the referral. dig seems to synthe-
size the question section, placing a ';' at the front. The "unmatched replies"
shown below correspond to replies whose id we could not match, and is probably
higher (1-5%) than I would have expected and probably deserves some inspection.
SUMMARY
=============================================================================
Queries to the root servers occur at a rate of 0.8% - 1.7% of the outgoing
queries from these three name-servers. However the number of root queries
producing NXDOMAIN varies markedly. ns1 has a much lower rate presumably
from a different population of machines. From our name-servers which have
long-running caches, queries to the roots seem to involve either names with
bogus TLDs or name-servers.
A concern in the above table which I have not had time to look into, is
that ns2 and ns3 generate nearly as many outgoing queries as come from
inside our network, as if internal queries rarely look up local names.
NB the 4.4M outgoing from ns3 equals internal queries plus those forwarded
from ns1. There is evidence of a reasonable number of SERVFAIL replies
coming back to ns3, so each query on average might be generating several
outgoing queries.
The more detailed breakdown of root queries is listed below
three tables have been produced for each name-server,
and are placed side-by-side
The first gives a breakdown of root queries by QTYPE
The second is the number of queries to each root
The third lists the most popular NXDOMAIN lookups
NB leftmost parts of the query are truncated to fit
b & c roots showed consistent packet loss for all three
with the exception of the f & k roots used by ns3, each name-server
spreads queries across the 13 roots reasonably evenly, although queries
to b & c are consistently higher (retries?), but the number of replies
from each root is more consistent.
The reason for the high usage of f & k, is that ns3 sends queries out
through a different network provider who is using anycast. traceroute
shows about 40ms to these instances compared to a few hundred milliseconds
for the others.
NS1
==============================================================================
19,390 queries ROOT queries replies top NXDOMAIN queries
17,359 replies a: 1,360 1,358 11 riel-bl1rb74./SOA
104 unmatched replies b: 2,021 1,264 3 6.32.81.187./AAAA
QTYPE: QUERIES NXDOMAIN c: 2,302 1,227 2 3.221.168.2./AAAA
A: 4,492 58 d: 1,388 1,388 2 03.25.120.9./AAAA
AAAA: 12,607 46 e: 1,392 1,390 2 6.32.81.186./AAAA
ANY: 1 0 f: 1,427 1,267 2 216.32.81.187./A
MX: 6 0 g: 1,383 1,381 2 216.93.167.80./A
NS: 5 0 h: 1,380 1,375 2 6.93.167.80./AAAA
PTR: 116 2 i: 1,254 1,254 2 02.100.4.15./AAAA
SOA: 27 27 j: 1,267 1,267 2 ics.internal./SOA
SRV: 1 1 k: 1,304 1,298 2 ics.internal./SOA
=========================== l: 1,501 1,479 2 .ag.ad.local./SOA
TOTAL: 17,255 134 m: 1,411 1,411 1 12.159.234.2./A
1 30.102.11.1./AAAA
NS2
==============================================================================
47,773 queries ROOT queries replies top NXDOMAIN queries
45,873 replies a: 3,188 3,178 629 riel-bl1rb74./SOA
2,262 unmatched replies b: 5,397 4,922 423 bhishek-dell./SOA
QTYPE: QUERIES NXDOMAIN c: 5,745 4,512 204 otebook-webb./SOA
A: 34,366 7,007 d: 3,364 3,363 190 domain.local./SOA
A6: 92 0 e: 5,445 5,438 185 qjoybook7000./SOA
AAAA: 1,071 972 f: 3,432 3,345 153 home-pc./SOA
ANY: 7 4 g: 3,316 3,313 149 domain.local./SOA
MX: 364 348 h: 3,043 3,023 148 compaq.inet./SOA
NS: 42 8 i: 2,204 2,202 128 jolene./SOA
PTR: 264 121 j: 2,204 2,203 105 shivers./SOA
SOA: 7,208 7,208 k: 2,332 2,325 95 domain.local./SOA
SRV: 149 149 l: 4,546 4,492 93 domain.local./SOA
TXT: 16 16 m: 3,557 3,557 90 domain.local./SOA
=========================== 88 domain.local./SOA
TOTAL: 43,579 15,833
NS3
==============================================================================
36,057 queries ROOT queries replies top NXDOMAIN queries
34,486 replies a: 1,229 1,209 636 riel-bl1rb74./SOA
424 unmatched replies b: 1,771 1,471 82 otebook-webb./SOA
QTYPE: QUERIES NXDOMAIN c: 2,491 1,708 72 home-pc./SOA
A: 30,390 3,065 d: 1,215 1,199 59 ics.internal./SOA
A6: 9 4 e: 1,443 1,432 59 .ag.ad.local./SOA
AAAA: 790 717 f: 12,981 12,707 58 ics.internal./SOA
ANY: 3 2 g: 1,204 1,185 57 ics.internal./SOA
MX: 88 70 h: 1,204 1,193 57 ics.internal./SOA
NS: 44 18 i: 806 753 57 ics.internal./SOA
PTR: 167 90 j: 1,047 1,043 57 ics.internal./SOA
SOA: 2,420 2,384 k: 7,833 7,774 56 ics.internal./SOA
SRV: 129 129 l: 1,476 1,468 45 mswork./SOA
TXT: 12 12 m: 1,357 1,344 39 ics.internal./SOA
=========================== 38 ics.internal./SOA
TOTAL: 34,052 6,491
PS it seems curious the l root (198.32.64.12) has a reverse mapping of
AS-20144-has-not-REGISTERED-the-use-of-this-prefix.
--
d.thomas at its.uq.edu.au Danny Thomas,
+61-7-3365-8221 Software Infrastructure,
http://www.its.uq.edu.au ITS, The University of Queensland
More information about the bind-users
mailing list