Disaster Recovery Bind architecture - UPDATED

Fri May 23 16:04:18 UTC 2008

On May 21, 10:31 pm, Emery <atlan... at comcast.net> wrote:
> *****************************************************************************************
> Please disregard my last email. I mistakenly typed secondary where
> primary should have been. This email is correct. :-)
> *****************************************************************************************
>
> Kevin,
>
> Thank you so much for the detailed response! That is the type of help I
> need in making these crucial decisions.
>
> Let me explain further especially because I made a mistake in my prior
> explanation. The scripted solution I referred to in the prior email is
> actually between the primary site master and secondary site internal server.
>
> The primary master and secondary internal master zone files are
> completely different. The primary site has strictly the primary site
> resources and the secondary internal site only has information about the
> resources located at that site.. The reason I am doing the "munging" is
> - since both sites have the same domain name, the servers at the
> secondary site cannot perform lookups on the primary site because BIND
> servers will not perform recursive lookups on domains for which they are
> also authoritative. My only solutions I could come up with were:
>
>          1) change the domain name for the secondary site.
>          2) pass the primary site data file(s) to the secondary server,
> then load it through a $INCLUDE directive.
>
> I chose to use solution 2. Before I can perform the include, I have to
> strip out the first lines, which include the SOA, serial number, TTL,
> refresh, retry and expire information. Once loaded, the internal
> resources at the secondary site can find the resources at the primary
> site. All of this is automated and happens whenever the serial number in
> the primary forward/reverse zone files changes.
>
> A more natural or elegant solution would be for the secondary site
> internal server to be able to perform recursive lookups on the external
> nameserver for resources in the _example.com_ domain, but because both
> the internal and external are authoritative for same domain name, the
> internal nameserver will not look outside of itself for any
> _example.com_ resource.
>
> Because the secondary site external server is a slave to the primary
> site, the secondary site servers could effectively resolve primary site
> resources I configured their resolv.conf files with the external server
> as first choice, but then the problem would be reversed in that the
> external nameserver would not look up internal secondary resources
> because of the authoritative domain issue I pointed out earlier.
>
> I would love to totally separate the two environments, so that they
> would not need to know about each others resources, but there are many
> databases and other applications that are performing replication between
> the sites. If this were not the case, I would make each site totally
> autonomous and not dependent on each other.
>
> With regards to the NAT issue; the external nameserver will hold the
> nat'd addresses of only a few resources. All other servers available to
> the public are web/app servers which are in the DMZ.
>
> Another person (Ken Hays) suggested I look into implementing views,
> which I will do tomorrow.
>
> I hope this clarifies things a little. I value your input.
>
> E. R.
>
> Kevin Darcy wrote:
> > atlantic wrote:
>
> >> Hello,
>
> >> I've searched, but not found anything on this specific topic. I am about
> >> to implement two disaster recovery site nameservers; one internal, one
> >> external. I want to keep the internal entries strictly internal. The
> >> external will serve nat'd addresses of the internal nameserver as well
> >> as function as a slave to the primary site nameserver.
>
> >> I would have no problem implementing this model if the domain names at
> >> the DR site was different from the primary site. My issue is that
> >> because I am using the same domain name, I have had to create a custom
> >> scripted solution to allow the loading of split domain resource records
> >> (using $INCLUDE directives, and sed/awk to remove SOA and header
> >> information from the imported data files.) The fact that this does work
> >> does not negate the issue that I find the solution cumbersome. The issue
> >> would be much more simple if I change the DR site to a different domain
> >> name, since the resource record SOA would be different.
>
> > I'm confused: why do you need to do this "munging" of the zonefile? As
> > far as I can understand it, the only difference between the original
> > version of the zone and the "munged" version would be the SOA record and
> > the apex NS records (that's what you mean by "header information"
> > right?). But nothing really cares about the SOA record (except Dynamic
> > Update clients and, in a multi-level slaving hierarchy, mid-level
> > slaves, who use the MNAME field of the SOA record in determining who
> > gets NOTIFYs), and if you put the "primary" NS(es) and the DR NS(es) at
> > the apex of the zone, Internet resolvers will quickly find and use the
> > DR nameservers if the primary ones(s) is/are down. So there's no real
> > reason for the "header" of the zone to be different on different
> > nameservers, and no "munging" should be required.
>
> > Secondly, I don't know what you're getting at with "The external will
> > serve nat'd addresses of the internal nameserver". NAT or no NAT, why
> > would you want Internet resolvers querying your internal nameserver?
> > That seems like a bad security practice to me. A lot of DNS-based
> > exploits have been identified over the years, so I'd rather only expose
> > nameservers that are on the "edges" of my network.
>
> >> Now that I have stated my issue, my real questions are:
>
> >> 1) How do most businesses address this issue?
>
> > A variety of different ways, I'd imagine. In our case we have two main
> > production datacenters that have (diverse) connectivity to the Internet
> > and for most apps (e.g. web stuff) we use "global", DNS-based load
> > balancing to allow the servers to run in both datacenters with the
> > failover being automatic if the server(s) in one datacenter are down,
> > e.g. in the worst case, the whole datacenter is down. For DNS itself,
> > since it can't really be load-balanced using DNS (slight chicken-and-egg
> > problem there), we have one VIP (virtual IP) for each set of DNS servers
> > at each datacenter, i.e. "local" load-balancing. So Internet DNS
> > resolvers will only see two VIPs associated with the nameservers for our
> > external zones, but there are multiple machines "behind" each VIP so
> > that we have transparent fault-tolerance within any given datacenter,
> > and if one datacenter should go down completely, we still have
> > functioning nameservers in the other datacenter.
>
> >> 2) Is it normal to have a DR DNS function as both a slave to the primary
> >> site and a primary to different DR resources?
>
> > I doubt it. Mixing up master and slave roles on various Internet-facing
> > nameservers seems to me to be unmanageable and arguably insecure. Much
> > simpler for them all to be slaves. (Note that I'm using the term
> > "slaves" loosely here; if one wants to use another replication method
> > besides AXFR/IXFR, then that's fine, and I'd still call the replicas
> > "slaves" in the loose sense).
>
> > In our case, we centralize all of our external DNS maintenance on an
> > internal server (with another internal server as backup), and then all
> > of the Internet-facing nameservers are simply slaves for that data.
>
> >> 3) Is is acceptable to have all three nameservers (primary site, DR
> >> primary, DR secondary) have the serve the same domain name?
>
> > Hmmm... why not? The more authoritative nameservers that are published
> > for the zone, the more the query load is spread out, and the less impact
> > there will be if any given one of them fails or becomes unavailable.
>
> > Some registries have limits on how many nameservers they'll allow in a
> > delegation, but even if you just have a subset of your authoritative
> > nameservers in the delegation records, as long as they are all in the
> > apex NS records they'll get used (assuming that a sufficient number of
> > the resolvers cache NS records according to ranking rules in RFC 2181,
> > which ranks in-zone data above referral data). Don't go overboard with
> > NS records, though; you don't want to have so many that you force older
> > DNS resolvers into TCP retries. Try to keep the referral responses (NS
> > RRset + glue) within 512 bytes, taking into account label compression.
>
> >                            - Kevin

FYI -

I took the advice of a poster (Ken Hays) to use BIND views to solve my
problem and they worked perfectly!

Thank you,

E. R.