True DNS Backup

Tue Mar 11 05:12:55 UTC 2008

On Mar 4, 2008, at 9:14 PM, D. Stussy wrote:

> "Chris Buxton" <cbuxton at menandmice.com> wrote in message
> news:fqkv8k$2g0q$1 at sf1.isc.org...
>> On Mar 3, 2008, at 1:36 PM, D. Stussy wrote:
>>> "Chris Buxton" <cbuxton at menandmice.com> wrote in message
>>> news:fqhb7l$ska$1 at sf1.isc.org...
>>>> On Feb 28, 2008, at 8:35 PM, D. Stussy wrote:
>>>>> There should be another server (in another geographic location) as
>>>>> both of
>>>>> these are on the same local network (regardless of which address  
>>>>> is
>>>>> used to
>>>>> reach them).
>>>>
>>>>
>>>> If the OP plans to use views to route web requests, for example,  
>>>> over
>>>> the same network connection that the DNS requests come over, then
>>>> having an offsite slave will spoil this.
>>>
>>> Noted, but that's not sufficient reason to disregard the "best  
>>> current
>>> practice" for disaster recovery.  Regardless, shouldn't it be the
>>> client
>>> which determines which address is "closer" after it receives ALL
>>> addresses?
>>> A client, when doing a DNS lookup, doesn't necessarily pick the
>>> closest DNS
>>> server first - but a RANDOM one from the list (which may be
>>> topographically
>>> further away than the other choices).  Using views to serve a
>>> preferred
>>> order of addresses makes assumptions that may not be true in many
>>> cases.
>>
>>
>> I said nothing about a preferred order of addresses, which, given  
>> that
>> resolving name servers will effectively randomize them, is a  
>> generally
>> useless practice. And the OP said nothing about disaster recovery,
>> just about handling the case when one or the other ISP connection  
>> (but
>> not both) goes down - fault tolerance (and some load balancing) for
>> upstream connections.
>
> Actually, you implied it when you suggested that the OP planned to  
> route the
> subsequent HTTP connection over the same path as the DNS request  
> came in on.
> That implies an order because it implies a route where multiple  
> routes are
> possible.
>
>> The client (the stub resolver) doesn't pick a DNS server to query,  
>> the
>> resolving name server does. And while for the first query into the
>> zone, the resolver will choose randomly, once it has sufficient data,
>> it will tend to query whichever address responds fastest. This
>> provides the load balancing aspect.
>
> If the stub resolver has multiple DNS servers available to query, it  
> DOES
> pick one.  Then that DNS server chooses one from the zone's NS RRset.
> Sufficient data assumes a continuity of connection or multiple  
> requests in a
> data stream - a condition you can't assume if one and only one web  
> request
> is going to be made to the host.  Obviously, when the first DNS  
> request
> comes in, there hasn't yet been a recent web request (it may have  
> been so
> long ago that the DNS info expired out of the cache).
>
>> For fault tolerance, consider that if a line goes down, the DNS  
>> server
>> will retry the other available name server addresses until it gets an
>> answer - this should typically be speedy enough that the web browser
>> will never notice. In order for the browser to connect to the web
>> server using the line that is still up, the response from the DNS
>> server should contain just one A record, giving the address of the  
>> web
>> server that is available over the same ISP connection as the DNS
>> server itself. This can be accomplished using views on the DNS
>> servers, with different IP interfaces for the different routers.
>
> And just how do you expect the DNS server to know that "route 1"  
> from A to B
> is working when "route 2" is used and the break in route 1 is not  
> adjacent
> to host A or B (where A and B are the server and client)?  Just  
> because the
> traffic isn't seen doesn't mean it's not working.  Your position  
> implies
> that a DNS server should test all routes between itself and any (or  
> all)
> querying clients - and that's just not going to happen (too much  
> network
> "noise").  That's the only way that the DNS server is going to know  
> that it
> should withhold a record it would otherwise provide in the answer.
>
>> (I'm sure this strategy has been described endlessly on the list
>> before, but I don't have a handy link to any archive or FAQ. It was
>> faster to retype it than to find such a reference.)
>
> I doubt it.  What you are stating isn't feasible nor practical.

Yes, it is both feasible and practical. This technique is in  
production use, and is used by at least one load balancing appliance  
vendor.

The resolving DNS server doesn't test the available routes. It simply  
sends a query to one of the four apparent servers. If there is a  
response, then it uses that response to reply back to the stub  
resolver, which forwards it back to the web browser. The web browser  
then makes a connection to the indicated web server address.

The magic here is that the response from servers A1 and B1 only  
contains the address C1 of the web server. Similarly, the response  
from servers A2 and B2 only contains the address C2 of the web server.  
Servers A1, B1, and C1 are available over route 1. Servers A2, B2, and  
C2 are available over route 2.

If route 1 fails but route 2 is up, then an incoming query from a  
resolving name server has to travel over route 2. If the resolving  
name server tries to contact server A1 or server B1, because route 1  
is down, the connection will fail, and the resolver will then retry  
the other listed servers, eventually reaching either server A2 or B2.  
The response from these servers will only contain C2, not C1, and so  
the web browser will connect to the web server using the route still  
available.

To reiterate, there are two copies of the zone, and they differ in  
exactly one respect, the address of the web server. They each list all  
of the available name servers in NS records, and they each list the  
same MX records, etc. It's just the web server address that's different.

Obviously, this is not 100% foolproof. When a route goes down, a  
certain number of web browsers are still going to try that route (and  
fail) until the TTL has expired. And any current sessions with the web  
server are going to fail, because web browsers don't do a fresh lookup  
for each request to the server - they retain that address for as long  
as the browser chooses. (Some browsers pay some attention to TTL's,  
some do not.) But by using a very short TTL (e.g. 5 minutes or less),  
the damage can be minimized as much as is possible without resorting  
to BGP.

Chris Buxton
Professional Services
Men & Mice