How to connect to a multi-homed server over TCP

With the world wide deployment of IPv6 in parallel with IPv4, it has become apparent that a traditional connection loop is no longer good enough.

In fact, this is a large part of the reason why Google is white listing resolvers and Yahoo only wants to return to AAAA records to DNS queries made over IPv6.  The traditional connection loop does not behave well in the presence of some network errors.  It introduces excessive delays when there are good alternate addresses to use.

This has not been a big problem in the past, as most sites have been single homed, so there were no alternate addresses to try. But with the deployment of IPv6 along side IPv4, almost all sites will become multi-homed, with a minimum of two addresses, so now is the time to fix this problem.

With a traditional connection loop, each address returned from gethostbyname() or getaddrinfo() is tried in turn and the application then stalls until the connection attempt succeeds or fails. Then the next address is tried, etc.  While most successful connections take less than 500 milliseconds, a failed connection attempt can take up to half a minute before we move onto the next address, adding a lot of unnecessary latency.

The connect call can take 30 seconds to fail and if the first address you try is broken you can end up waiting a long time until you try the next address.

    fd = -1;
    for (ai = ai0; ai; ai = ai->ai_next) {
        fd = socket(ai->ai_family, ai->ai_socktype, ai->ai_protocol);
        if (fd < 0)
        if (connect(fd, ai->ai_addr, ai->ai_addrlen) < 0) {
            fd = -1;
        /* success */ 

You see this sort of connection loop in most text books on socket programming and in the man page for getaddrinfo().

The first observation to be made is that we can make these connection attempts in parallel, which works but leads to lots of unnecessary connections being made if we start them all at once.  Most of the time, the first connection attempt will succeed, so we should give it an opportunity to do so before making a second attempt.

The sample code attached takes the output of getaddrinfo() and tries each address in turn, waiting a decreasing amount of time between subsequent connection attempts.  When one of the connection attempts completes, it will abort the others.  The initial timeout is 500 milliseconds which is enough time to connect to a European server from Sydney, Australia using terrestrial paths.

Code samples for poll(), select() and pthread based C are found here:


Leave a reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Last modified: June 17, 2013 at 5:59 pm