Global Availibity
Brad Knowles
brad.knowles at skynet.be
Thu Aug 23 15:12:48 UTC 2001
At 9:29 AM -0500 8/23/01, Joseph K Gainey wrote:
> Client------------[Client DNS]
> |
> +-----------------+ [Office]
> | DNS(MASTER)
> | |
> |
> +---------------------+---------------------+
> (t1)| (t1)| (t1)|
> SITEA(Seattle) SITEB(New York) SITEC(Houston)
> DNS1(SLAVE) DNS2(SLAVE) DNS3(SLAVE)
> WWW(1) WWW(2) WWW(3)
If any and all of these sites can go down, and cannot be trusted
to remain operational, then you've got a very real problem. You
could have Layer Four load-balancing switches at each site that are
constantly monitoring the accessibility of the other sites, and which
are configured to hand-off connections to a less loaded site once you
reach a certain threshold, but someone somewhere is going to have to
have a set of IP addresses that gets this thing started somehow, and
gets those connections pointed towards a hopefully operational server.
I imagine that you could "anycast" a shared IP address of one or
two virtual slave nameservers (actually, IP aliases set up on each of
the slave nameservers, but which are configured to do TCP connections
via their real IP address), and have the IP address of the web server
sit off in a subdomain of its own, and with a very low TTL. You
could then have the L4 load-balancing switches automatically update
this zone whenever they noticed that one of the other sites went
down, so as to remove that IP address from the list to be handed out.
This would at least minimize the chance that someone would get
and cache for a long time a non-functional IP address for your
virtual web server, and then you could leave the actual
load-balancing issues to the L4 load-balancing switches (such as the
RadWARE WSD Pro+, or other related members of the RadWARE family).
But this is getting dangerously close to DNS-based load-balancing
schemes that I am violently opposed to.
If you go this route, make absolutely damn bloody sure that you
don't ever cause DNS packet truncation, because anycasting only works
properly with UDP, and a DNS query would have to be retried with TCP
if it resulted in truncation -- there would be too much chance of a
network route changing in the middle of a TCP connection setup that
would result in your talking to two or more servers answering for the
same IP address, and thus resulting in a connection reset and retry,
which would be far worse than just making things work properly with
UDP.
> We have entered into our contracts with clients that we will have
> 99.99% uptime, if 1/3 (33%) of connection made in a 24hr period fail
> then this is not 99.99% uptime. The problem i've run into is that the
> only way for client's (and thier DNS servers) to not see the down site
> is to remove the down site from dns. Not a problem right, except I'd
> rather not be called at 2am to remove something from dns i'd rather
> have dns do it itself.
Then you should have comparable SLA's from your ISPs, and when
you get hit with the consequences of a failure, you should be able to
get full restitution from the ISP that caused that failure. It's
insane to give out guarantees of 100% (or very near 100%)
reliability, without in turn requiring that same level of reliability
from the sources you're using to try to build that system.
--
Brad Knowles, <brad.knowles at skynet.be>
H4sICIFgXzsCA2RtYS1zaWcAPVHLbsMwDDvXX0H0kkvbfxiwVw8FCmzAzqqj1F4dy7CdBfn7
Kc6wmyGRFEnvvxiWQoCvqI7RSWTcfGXQNqCUAnfIU+AT8OZ/GCNjRVlH0bKpguJkxiITZqes
MxwpSucyDJzXxQEUe/ihgXqJXUXwD9ajB6NHonLmNrUSK9nacHQnH097szO74xFXqtlbT3il
wMsBz5cnfCR5cEmci0Rj9u/jqBbPeES1I4PeFBXPUIT1XDSOuutFXylzrQvGyboWstCoQZyP
dxX4dLx0eauFe1x9puhoi0Ao1omEJo+BZ6XLVNaVpWiKekxN0VK2VMpmAy+Bk7ZV4SO+p1L/
uErNRS/qH2iFU+iNOtbcmVt9N16lfF7tLv9FXNj8AiyNcOi1AQAA
More information about the bind-users
mailing list