Two site failover / load distribution

Tue Mar 22 00:24:51 UTC 2005

Can2002 wrote:

>I'm trying to put together a low cost HA solution based around two sites
>each with xDSL connections.  At each site I have a web server & DNS server
>running Bind 9.  My goal is to provide a solution that distributes users
>across the two sites and as seamlessly as possible, copy with either site
>failing.
>
>I had originally hoped to achieve this using Round Robin, although searches
>on Usenet indicate this will satisfy the load distribution requirement, but
>not the failure requirement.  
>
It _technically_ meets the failure requirement, but some browsers take 
so ridiculously long to do address failover that in practical terms 
there is no failover. The browser user gives up before the failover 
actually occurs.

>An alternate approach would be to make the DNS
>servers at both sites masters and hold A records only for the relevant site,
>combined with a low TTL (e.g. the A record on the DNS server at site one
>points only at the web server at site one; similar for site two).  This
>  
>
>addresses failures, but not load distribution.
>
In this model, load distribution will occur as a rough function of how 
quickly  the respective *nameservers*  respond (or whether they respond 
at all, hence the implicit failover capability in case of total site 
failure). But this probably has little or no bearing on how quickly the 
*webservers* or other application-level components respond, so you may 
find that even under normal situations, your traffic is heavily skewed 
to one site or the other.

Also, you can have a situation where the nameserver at one of the sites 
is up and running fine, there is network connectivity to the site, but 
the webserver or some other component(s) at the site is down. This 
dual-master model can be refined to have an automatic process which 
monitors the infrastructure and changes the relevant A record -- 
possibly using the Dynamic Update protocol -- if one site or another 
becomes non-functional. Of course, at that point one is starting to 
re-invent commercial load-balancing technology...

>Having searched further it sounds as though something like lbnamed may be
>the solution, but I wondered what experiences others had on the NG?
>
Never used lbnamed. We use commercially-available load-balancing 
devices. However, even with those we end up having to reduce our TTLs to 
anti-social levels in order to get the load-balancing and/or failover 
granularity we require. A-record-based load-balancing/failover is always 
going to be quite imperfect. SRV-record-based load-balancing/failover 
shows more promise, but client-software (e.g. browser) 
developers/providers are taking a long time to adopt it.

                                       - Kevin