seeking comments on setting up local copy of root zone

Danny Thomas d.thomas at its.uq.edu.au
Wed May 3 08:32:05 UTC 2006


RATIONALE
=============================================================================
Our primary network provider does not implement anycast routing to any of
the instances of root name-servers in Australia:
  http://www.apnic.net/services/rootserver/
Apart from making use of the these instances that people have gone to the
trouble of deploying, using the anycast instances would reduce load on the
'real' roots including deflection of any DOS traffic generated in that
provider's routing community, but should also improve resiliency. Having
a nearby root would reduce latency for a small proportion of queries, but
having local instances of gTLD and ccTLD/2LD name-servers would presumably
be a bigger win for latency.
NB it is not that the provider won't do anycast, but their current routing
policies would not prefer the Australian instances. Our secondary provider
does anycast to local instances of f & k

In any case we are looking at setting up the root zone locally, mainly to
reduce (potential) load on the root name-servers. We don't generate that
much traffic to the roots in the normal course of events (see below) and
while only a minority is from bogus queries (0.7%, 33%, 18%), there is
always the possibility this proportion could rise significantly. I don't
know whether previous DOS attacks on the root name-servers have included
botnets making large numbers of random  bogus queries, perhaps incidentally,
but the possibility does exist.

Having a local copy of root could help in case of DOS or other problems
with the root name-servers.

Negative caching would tend to reduce root lookups for repetitive
bogus queries, e.g. unqualified hostnames.
While the root zone has a ncache-ttl aka MINIMUM of 1 day, the default bind
setting of max-ncache-ttl should limit that to 3 hours.
However the third column in the results for each name-server show the most
popular queries resulting in NXDOMAIN happen hundreds of times per day so I
must be mis-calculating/understanding the negative caching time.

NB we do master the reverse zones listed in
  http://ietf.org/internet-drafts/draft-andrews-full-service-resolvers-02.txt
in fact we use 10/8, 172.16/12 and 192.168/16 blocks, but we do not master
versions of the rfc2606 reserved zones such as example (except for localhost).
Even though it doesn't seem covered officially yet (Kato's draft has expired
but perhaps dnsop has passed something to IESG), and without a copy of the
root zone it would probably be a good idea to master something for .local,
but .internal seems to have been used by a local OrgUnit.



IMPLEMENTATION CHOICES
=============================================================================
There would seem to be two ways of copying the root zone

1) using a zone transfer from the f-root, policy on availability:
     http://marc.theaimsgroup.com/?l=bind-users&m=11025331560946
   The concern is that if the zone transfer starts failing for any reason,
   the root zone has an expiry time of only one week (i.e. shorter than the
   xmas break here). Unless there is something special about the way the
   root zone is handled, expiry of the root would probably lead to ugly
   consequences ...

2) fetching ftp://ftp.internic.net/domain/root.zone.gz on a regular basis
   (how often are versions produced?), unpacking and if the last record
   is that special TXT record, rndc reloading. The advantage of this
   approach is that transfer failures only result in an out of date
   copy, and I would expect changes in the root zone to be fairly gradual.
   Other important zones are available from that site including arpa,
   in-addr.arpa, edu and int.
   

With either of these approaches we would:
  * use our normal notify statement which directs them only to our
    name-server infrastructure.
  * probably refuse zone transfers except from our name-server infrastruct-
    ure, unless interested people commit to restricting notifies
  * deny queries from outside our network. When we implement split-dns
    the root zone would not appear in the external view
  * in our monitoring system, check our root serial number against that of
    the real roots and alert if it too far out of sync

Comments ?
Are there any gotchas with the root zone on bind ?

Danny



STATS INCOMING/OUTGOING QUERIES FROM OUR NAME-SERVERS (all bind-9.3.2)
=============================================================================
The overall numbers for a day are summarized in the table below, a more
detailed breakdown of root queries per name-server is present later.

      total    internal    fwd      fwd     outgoing     root        root
      queries   queries    ns3    rbldnsd    queries   queries     NXDOMAIN
ns1      6.9M     4.7M    2.5M     229K      1.1M      19,390         134
ns2      7.8M     4.8M       -     232K      4.4M      47,773      15,833
ns3      2.4M     1.9M       -       9K      4.4M      36,057       6,491

Our name-servers are combined resolving+authoritative and deny queries from
outside our network, except for zones we master. They each have a complete
set of local zones, i.e. they do not have to consult other local name-servers
with the exception ns1 forwards to ns3 and ns1/2/3 forward dnsbl queries
to some local rbldnsd servers.

The first two columns are derived from a cron job which analyses the query
logs, logged to a file channel which is considerably more efficient than
syslog. "Total queries" is the total seen in these log files, while "internal
queries" are those originating from within our address-space, i.e. excluding
external queries to the authoritative service. Ideally, internal queries
for locally mastered zones would have been separated out.

Outgoing queries were captured using tcpdump with
  "-tttt -nvl udp dst port 53 and src <ip>"
where <ip> is the query-source address. The queries were grouped into three
categories depending on destination ip-address: forwarding to ns3, forwarding
to rbldnsd servers, remaining which should be outgoing.

Queries to the roots were captured using tcpdump (rather than dnstop) with
  "-tttt -vln host 198.41.0.4 or ..."
The "root queries" column is is a simple count of the outgoing packets.
Any incoming traffic will be replies as the roots are authoritative-only.
The number of replies is only shown in the middle table below. We want to
know the number of NXDOMAIN replies, but also want to match the reply to the
query so we know what the question was. This is done by recording the id of
outgoing queries and matching them against incoming replies. I would prefer
to check the question but this section does not seem to be supplied by the
roots presumably to leave more space for the referral. dig seems to synthe-
size the question section, placing a ';' at the front. The "unmatched replies"
shown below correspond to replies whose id we could not match, and is probably
higher (1-5%) than I would have expected and probably deserves some inspection.


SUMMARY
=============================================================================
Queries to the root servers occur at a rate of 0.8% - 1.7% of the outgoing
queries from these three name-servers. However the number of root queries
producing NXDOMAIN varies markedly. ns1 has a much lower rate presumably
from a different population of machines. From our name-servers which have
long-running caches, queries to the roots seem to involve either names with
bogus TLDs or name-servers.

A concern in the above table which I have not had time to look into, is 
that ns2 and ns3 generate nearly as many outgoing queries as come from
inside our network, as if internal queries rarely look up local names.
NB the 4.4M outgoing from ns3 equals internal queries plus those forwarded
from ns1. There is evidence of a reasonable number of SERVFAIL replies
coming back to ns3, so each query on average might be generating several
outgoing queries.

The more detailed breakdown of root queries is listed below

three tables have been produced for each name-server,
and are placed side-by-side
The first gives a breakdown of root queries by QTYPE
The second is the number of queries to each root
The third lists the most popular NXDOMAIN lookups
  NB leftmost parts of the query are truncated to fit

b & c roots showed consistent packet loss for all three

with the exception of the f & k roots used by ns3, each name-server
spreads queries across the 13 roots reasonably evenly, although queries
to b & c are consistently higher (retries?), but the number of replies
from each root is more consistent.
The reason for the high usage of f & k, is that ns3 sends queries out
through a different network provider who is using anycast. traceroute
shows about 40ms to these instances compared to a few hundred milliseconds
for the others.




NS1
==============================================================================
19,390 queries                 ROOT  queries  replies    top NXDOMAIN queries  
17,359 replies                   a:    1,360    1,358     11 riel-bl1rb74./SOA 
104 unmatched replies            b:    2,021    1,264      3 6.32.81.187./AAAA 
QTYPE:   QUERIES   NXDOMAIN      c:    2,302    1,227      2 3.221.168.2./AAAA 
    A:     4,492         58      d:    1,388    1,388      2 03.25.120.9./AAAA 
 AAAA:    12,607         46      e:    1,392    1,390      2 6.32.81.186./AAAA 
  ANY:         1          0      f:    1,427    1,267      2  216.32.81.187./A 
   MX:         6          0      g:    1,383    1,381      2  216.93.167.80./A 
   NS:         5          0      h:    1,380    1,375      2 6.93.167.80./AAAA 
  PTR:       116          2      i:    1,254    1,254      2 02.100.4.15./AAAA 
  SOA:        27         27      j:    1,267    1,267      2 ics.internal./SOA 
  SRV:         1          1      k:    1,304    1,298      2 ics.internal./SOA 
===========================      l:    1,501    1,479      2 .ag.ad.local./SOA 
TOTAL:    17,255        134      m:    1,411    1,411      1   12.159.234.2./A 
                                                           1 30.102.11.1./AAAA 


NS2
==============================================================================
47,773 queries                 ROOT  queries  replies    top NXDOMAIN queries  
45,873 replies                   a:    3,188    3,178    629 riel-bl1rb74./SOA 
2,262 unmatched replies          b:    5,397    4,922    423 bhishek-dell./SOA 
QTYPE:   QUERIES   NXDOMAIN      c:    5,745    4,512    204 otebook-webb./SOA 
    A:    34,366      7,007      d:    3,364    3,363    190 domain.local./SOA 
   A6:        92          0      e:    5,445    5,438    185 qjoybook7000./SOA 
 AAAA:     1,071        972      f:    3,432    3,345    153      home-pc./SOA 
  ANY:         7          4      g:    3,316    3,313    149 domain.local./SOA 
   MX:       364        348      h:    3,043    3,023    148  compaq.inet./SOA 
   NS:        42          8      i:    2,204    2,202    128       jolene./SOA 
  PTR:       264        121      j:    2,204    2,203    105      shivers./SOA 
  SOA:     7,208      7,208      k:    2,332    2,325     95 domain.local./SOA 
  SRV:       149        149      l:    4,546    4,492     93 domain.local./SOA 
  TXT:        16         16      m:    3,557    3,557     90 domain.local./SOA 
===========================                               88 domain.local./SOA 
TOTAL:    43,579     15,833                                                    


NS3
==============================================================================
36,057 queries                 ROOT  queries  replies    top NXDOMAIN queries  
34,486 replies                   a:    1,229    1,209    636 riel-bl1rb74./SOA 
424 unmatched replies            b:    1,771    1,471     82 otebook-webb./SOA 
QTYPE:   QUERIES   NXDOMAIN      c:    2,491    1,708     72      home-pc./SOA 
    A:    30,390      3,065      d:    1,215    1,199     59 ics.internal./SOA 
   A6:         9          4      e:    1,443    1,432     59 .ag.ad.local./SOA 
 AAAA:       790        717      f:   12,981   12,707     58 ics.internal./SOA 
  ANY:         3          2      g:    1,204    1,185     57 ics.internal./SOA 
   MX:        88         70      h:    1,204    1,193     57 ics.internal./SOA 
   NS:        44         18      i:      806      753     57 ics.internal./SOA 
  PTR:       167         90      j:    1,047    1,043     57 ics.internal./SOA 
  SOA:     2,420      2,384      k:    7,833    7,774     56 ics.internal./SOA 
  SRV:       129        129      l:    1,476    1,468     45       mswork./SOA 
  TXT:        12         12      m:    1,357    1,344     39 ics.internal./SOA 
===========================                               38 ics.internal./SOA 
TOTAL:    34,052      6,491                                                    



PS it seems curious the l root (198.32.64.12) has a reverse mapping of
   AS-20144-has-not-REGISTERED-the-use-of-this-prefix.

-- 
   d.thomas at its.uq.edu.au    Danny Thomas,                                    
          +61-7-3365-8221    Software Infrastructure,
 http://www.its.uq.edu.au    ITS, The University of Queensland



More information about the bind-users mailing list