Global Availibity

Thu Aug 23 15:52:16 UTC 2001

Like I said no screaming at me please ... this was not my choice.  Now the
fact remains that the failure of more than one site at anytime has extremely
low.  But basically what your telling me is that in fact someone will actually
have to go in and twiddle the records if a site goes down.  fine ... now I do
NOT NOT NOT NOT NOT (have i said it enough?) care about have true load 
balancing because in fact any single sight and there are more than the three
listed can handle the traffic for all of them ... that being said I just need
some dumb dns probe that says are you alive? nope then i'll remove your ip 
address from the list.  it's that simple ... in fact if i really and truely
wanted to i would just write some dumb perl script that does dynamic updates
to the dns servers and bingo its done.  now i'm asking is there some smarter
way to do this?

> At 9:29 AM -0500 8/23/01, Joseph K Gainey wrote:

> 
> >        Client------------[Client DNS]
> >                          |
> >        +-----------------+                       [Office]
> >        |                                         DNS(MASTER)
> >        |                                           |
> >        |
> >        +---------------------+---------------------+
> >    (t1)|                 (t1)|                 (t1)|
> >        SITEA(Seattle)    SITEB(New York) SITEC(Houston)
> >        DNS1(SLAVE)       DNS2(SLAVE)     DNS3(SLAVE)
> >        WWW(1)            WWW(2)          WWW(3)
> 
> 	If any and all of these sites can go down, and cannot be trusted 
> to remain operational, then you've got a very real problem.  You 
> could have Layer Four load-balancing switches at each site that are 
> constantly monitoring the accessibility of the other sites, and which 
> are configured to hand-off connections to a less loaded site once you 
> reach a certain threshold, but someone somewhere is going to have to 
> have a set of IP addresses that gets this thing started somehow, and 
> gets those connections pointed towards a hopefully operational server.
> 
> 
> 	I imagine that you could "anycast" a shared IP address of one or 
> two virtual slave nameservers (actually, IP aliases set up on each of 
> the slave nameservers, but which are configured to do TCP connections 
> via their real IP address), and have the IP address of the web server 
> sit off in a subdomain of its own, and with a very low TTL.  You 
> could then have the L4 load-balancing switches automatically update 
> this zone whenever they noticed that one of the other sites went 
> down, so as to remove that IP address from the list to be handed out.
> 
> 	This would at least minimize the chance that someone would get 
> and cache for a long time a non-functional IP address for your 
> virtual web server, and then you could leave the actual 
> load-balancing issues to the L4 load-balancing switches (such as the 
> RadWARE WSD Pro+, or other related members of the RadWARE family).
> 
> 
> 	But this is getting dangerously close to DNS-based load-balancing 
> schemes that I am violently opposed to.
> 
> 	If you go this route, make absolutely damn bloody sure that you 
> don't ever cause DNS packet truncation, because anycasting only works 
> properly with UDP, and a DNS query would have to be retried with TCP 
> if it resulted in truncation -- there would be too much chance of a 
> network route changing in the middle of a TCP connection setup that 
> would result in your talking to two or more servers answering for the 
> same IP address, and thus resulting in a connection reset and retry, 
> which would be far worse than just making things work properly with 
> UDP.
> 
> >  We have entered into our contracts with clients that we will have
> >  99.99% uptime, if 1/3 (33%) of connection made in a 24hr period fail
> >  then this is not 99.99% uptime.  The problem i've run into is that the
> >  only way for client's (and thier DNS servers) to not see the down site
> >  is to remove the down site from dns.  Not a problem right, except I'd
> >  rather not be called at 2am to remove something from dns i'd rather
> >  have dns do it itself.
> 
> 	Then you should have comparable SLA's from your ISPs, and when 
> you get hit with the consequences of a failure, you should be able to 
> get full restitution from the ISP that caused that failure.  It's 
> insane to give out guarantees of 100% (or very near 100%) 
> reliability, without in turn requiring that same level of reliability 
> from the sources you're using to try to build that system.
> 
> -- 
> Brad Knowles, <brad.knowles at skynet.be>
> 
> H4sICIFgXzsCA2RtYS1zaWcAPVHLbsMwDDvXX0H0kkvbfxiwVw8FCmzAzqqj1F4dy7CdBfn7
> Kc6wmyGRFEnvvxiWQoCvqI7RSWTcfGXQNqCUAnfIU+AT8OZ/GCNjRVlH0bKpguJkxiITZqes
> MxwpSucyDJzXxQEUe/ihgXqJXUXwD9ajB6NHonLmNrUSK9nacHQnH097szO74xFXqtlbT3il
> wMsBz5cnfCR5cEmci0Rj9u/jqBbPeES1I4PeFBXPUIT1XDSOuutFXylzrQvGyboWstCoQZyP
> dxX4dLx0eauFe1x9puhoi0Ao1omEJo+BZ6XLVNaVpWiKekxN0VK2VMpmAy+Bk7ZV4SO+p1L/
> uErNRS/qH2iFU+iNOtbcmVt9N16lfF7tLv9FXNj8AiyNcOi1AQAA