Round-robin for high availability?

cdevidal cdevidal at thedoghousemail.com
Fri Jul 14 05:02:45 UTC 2006


==== My real address is Chris (AT) deVidal (DOT) tv ====

I've been experimenting with multiple A records for both
load-distributing AND high availability.

Up until this point I was always told that round-robin is for
load-distributing ONLY and should not be used for high availability
failover.  But in practice this is not proving to be true.  I'm
beginning to think that was just FUD.

Do a lookup on roundrobintest8.strangled.net and
roundrobintest9.strangled.net.  Notice the A records:
roundrobintest8.strangled.net. 3600 IN	A	127.0.0.1
roundrobintest8.strangled.net. 3600 IN	A	63.95.68.129  # Real server

roundrobintest9.strangled.net. 3600 IN	A	10.69.96.69   # Bogus IP
roundrobintest9.strangled.net. 3600 IN	A	63.95.68.129  # Real server

Now, disable anything running on localhost:443 and make sure you do
*not* have a host at 10.69.96.69.

Browse https://roundrobintest8.strangled.net/ and
https://roundrobintest9.strangled.net/  You should never get a DNS
error.  It should always give you first an SSL warning (hostname
mismatch) and login prompt.  Oh it'll pause while it tries the bad IP
but after about 5 seconds it flips to the real server.

Now load up an SSL web server on localhost.  I used Apache+mod_ssl on
Linux and TinySSL on Windows.  Set up an index page with links to
several other pages.

(Sorry to require SSL, it was the only web server I have control over
that no one is using at the moment, so I can kill the web service any
time I want... You could also load up an FTP or SSH server on localhost
instead of SSL.  My server has all three.)

Flush your cache (e.g. ipconfig /flushdns) and reload the website.
Sometimes you will get localhost, sometimes my server.  That's the
load-distributing action we all know and love.

If you don't get localhost, keep flushing your cache until you get it.
Then kill your server and click on a link in the web page that is still
up on your screen.  It will fail back to my server and generate a 404.
That's high availability!  Even though it generates an error, it's
coming from my server nonetheless!

-No- client I've tried (browser, FTP client, MySQL, SSH etc.) fails on
the bad IP (10.69.96.69).  It thinks for a few seconds and then tries
the good IP.

Nor does it fail when the IP is good, as in the case of localhost, but
no service is listening on that port.


I've tried this on:
Windows 95
Windows 98
Windows 2000
Windows XP
Ubuntu 6.06
Debian 3.1
CentOS 3
CentOS 4

With these clients:
Netscape 4.5 (Nice and old!!!)
IE 5.5
IE 6
Firefox 1.0
Firefox 1.5
DOS FTP
Linux FTP
Linux NcFTP
MySQL client
OpenSSH client


My idea is to set up a live server running web/mail/DNS/DB/FTP and a
warm standby, such as:
www.example.com. 3600 IN	A	1.1.1.1
www.example.com. 3600 IN	A	2.2.2.2

The warm standby is powered on but no services are started.  Live is
synchronized to warm standby.  If the live fails I bring up the
standby.  Bing bang boom, the client automatically goes to the standby.

It'll be just web/POP/SSH/FTP because DNS and SMTP already have
built-in load-distributing and high availability capabilities.  No
database ports will be exposed to the outside world but if I do they
should work.


If this works, so cool!  Replacement for expen$ive and complicated HA
solutions :-)

Was clued into this by Mr. Tenereillo:
http://www.tenereillo.com/GSLBPageOfShame.htm


What am I missing?  Do I need to do more testing?

Am I crazy?  Or crazy like a fox?  ;-)

Someone check me on this because I'm not sure I'm testing it right...


CD

R U good enough?
TenThousandDollarOffer.com

==== My real address is Chris (AT) deVidal (DOT) tv ====



More information about the bind-users mailing list