Round robin load distribution among servers does not work properly

Tue Apr 7 02:31:57 UTC 2009

Mallappa Pallakke wrote:
> Hi Mark,
>
>    I do not see any additional section in the response. Can you please
> tell me what exactly you are asking me to change?
>   
You're delegating the zone to the same name you're trying to 
round-robin. Named is therefore fetching the name multiple times 
internally whenever it responds to queries, and thus "rotating" the 
response multiple times. You can't really see this "multiple fetching", 
because, as Mark pointed out, the Resource Records that were fetched for 
the Additional Section were suppressed later in the algorithm, because 
they would have been duplicates of what is in the Answer Section. But 
they still "rotate" the RRset when the fetches occur. Since it's 
"invisible", the only way to know that the "multiple fetching" is 
occurring is to be intimately familiar with named's resolver algorithm, 
which Mark is.

You need to get www.mycompany.com out of the zone delegation if you want 
to prevent this "multiple fetching" phenomenon. It's generally bad form 
anyway to use the same device for DNS infrastructure and 
application-level content. If you absolutely must, you could give 
different names to the same IP address, thus preventing the "multiple 
fetch" problem, while still using the device for hosting the zone. This 
might complicate your forward/reverse record consistency however.
>    I selected cyclic instead of random since I want my client requests
> to go to servers in exactly round-robin order. Please tell is there
> anything wrong with this?
>   
Consider what happens when one of the nodes fails. Every time that node 
is given as the first A record in the set, clients will, presumably, 
fail over to the *next* node in the sequence. This is in addition to the 
regular traffic that that "next" node gets whenever its address is first 
in the sequence. So, basically, you shift all (or at least _most_, 
depending on the failover capabilities and/or timeout settings of the 
clients) of the traffic from the failed node to the next node in the 
sequence, which is not very balanced.

When you choose "random", then, in the case of node failure, the load 
gets more evenly -- albeit less predictably -- distributed among the 
remaining nodes.

            - Kevin

> Thanks,
> Mallappa Pallakke
>
>
> On Mon, Apr 6, 2009 at 6:55 PM, Mark Andrews <Mark_Andrews at isc.org> wrote:
>   
>> In message <96c8e9660904061734t61414549o22a535e681f5866b at mail.gmail.com>, Mallappa Pallakke
>>  writes:
>>     
>>> Hi,
>>>
>>>  I tried with 9.5.1.P2, but still I am not getting the expected round
>>> robin results:
>>>
>>>  Please see below my named.conf and zone file:
>>>
>>> named.conf:
>>> =========
>>> options {
>>>        directory "/var/named";
>>>
>>>        // Uncommenting this might help if you have to go through a
>>>        // firewall and things are not working out.  But you probably
>>>        // need to talk to your firewall admin.
>>>
>>>        //query-source port 53;
>>>
>>> rrset-order {
>>>         order cyclic;};    // fixed, random, cyclic
>>> };
>>>
>>> zone "mycompany.com" {
>>>        type master;
>>> //        notify no;
>>>        file "db.mycompany.com";
>>>
>>>        allow-update { any; };
>>> //      allow-update { 127.0.0.1; };
>>> notify yes;
>>> };
>>>
>>>
>>> db.mycompany.com:
>>> ===============
>>> $ORIGIN .
>>> $TTL 0  ;
>>> mycompany.com           IN SOA  www.mycompany.com. hostmaster.mycompany.com. (
>>>                                199813404 ; serial
>>>                                1         ; refresh (1 second)
>>>                                1         ; retry (1 second)
>>>                                1         ; expire (1 second)
>>>                                1         ; minimum (1 second)
>>>                                )
>>>                        NS      www.mycompany.com.
>>> $ORIGIN mycompany.com.
>>> localhost               A       127.0.0.1
>>> $TTL 0  ;
>>> www                     A       10.10.68.1
>>>                        A       10.10.68.2
>>>                        A       10.10.68.3
>>>                        A       10.10.68.4
>>>       
>>        Change the nameservers name to be something other than
>>        www.mycompany.com.
>>
>>        www.mycompany.com is being retrieved twice once for the
>>        answer section and once for the additional section.  Each
>>        retrieval rotates the RRset once. The latter gets thrown
>>        away when named supresses duplicate RRsets in the answer
>>        so you see 2 rotations and 2 divides equally into 4.
>>
>>        B.T.W. one should choose "random" rather than "round-robin" if
>>        you want uniform load on failure.
>>
>>        Mark
>>
>>     
>>> I always get following answers repeatedly. Not getting 10.10.68.2 and
>>> 10.10.68.3 as top records in response messages:
>>> =================================================
>>> atcafs-n4s1:/kwlogs/msp# dig www.mycompany.com
>>>
>>> ; <<>> DiG 9.3.2 <<>> www.mycompany.com
>>> ;; global options:  printcmd
>>> ;; Got answer:
>>> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13961
>>> ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 1, ADDITIONAL: 0
>>>
>>> ;; QUESTION SECTION:
>>> ;www.mycompany.com.             IN      A
>>>
>>> ;; ANSWER SECTION:
>>> www.mycompany.com.      0       IN      A       10.10.68.4
>>> www.mycompany.com.      0       IN      A       10.10.68.1
>>> www.mycompany.com.      0       IN      A       10.10.68.2
>>> www.mycompany.com.      0       IN      A       10.10.68.3
>>>
>>> ;; AUTHORITY SECTION:
>>> mycompany.com.          0       IN      NS      www.mycompany.com.
>>>
>>> ;; Query time: 1 msec
>>> ;; SERVER: 10.10.68.1#53(10.10.68.1)
>>> ;; WHEN: Sun Apr  6 00:21:07 2008
>>> ;; MSG SIZE  rcvd: 113
>>>
>>>
>>> =================================================
>>>
>>> atcafs-n4s1:/kwlogs/msp# dig www.mycompany.com
>>>
>>> ; <<>> DiG 9.3.2 <<>> www.mycompany.com
>>> ;; global options:  printcmd
>>> ;; Got answer:
>>> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 65208
>>> ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 1, ADDITIONAL: 0
>>>
>>> ;; QUESTION SECTION:
>>> ;www.mycompany.com.             IN      A
>>>
>>> ;; ANSWER SECTION:
>>> www.mycompany.com.      0       IN      A       10.10.68.2
>>> www.mycompany.com.      0       IN      A       10.10.68.3
>>> www.mycompany.com.      0       IN      A       10.10.68.4
>>> www.mycompany.com.      0       IN      A       10.10.68.1
>>>
>>> ;; AUTHORITY SECTION:
>>> mycompany.com.          0       IN      NS      www.mycompany.com.
>>>
>>> ;; Query time: 1 msec
>>> ;; SERVER: 10.10.68.1#53(10.10.68.1)
>>> ;; WHEN: Sun Apr  6 00:21:09 2008
>>> ;; MSG SIZE  rcvd: 113
>>>
>>> ===================================================
>>>
>>>
>>> Please let me know anything is missing.
>>>
>>> Regards,
>>> Mallappa Pallakke
>>>
>>>
>>> On Sun, Apr 5, 2009 at 8:55 AM, Kirk <bind at kirkb.net> wrote:
>>>       
>>>> Mallappa Pallakke wrote:
>>>>         
>>>>> Hi,
>>>>>
>>>>>    I was trying to do load balancing of client request among
>>>>> configured servers using internal DNS server, I get proper load
>>>>> balaning (DNS response with top most IP address going with proper
>>>>> round robin fashio) for odd number of IP addresses. But it does not
>>>>> give same bevior for even number of IP addresses.
>>>>>
>>>>> For example:
>>>>>
>>>>>  If I have configured x.y.z.1, x.y.z.2, x.y.z.3, I get following
>>>>> combinations in dig response:
>>>>>
>>>>>  x.y.z.1
>>>>>  x.y.z.2
>>>>>  x.y.z.3
>>>>>
>>>>>  x.y.z.2
>>>>>  x.y.z.3
>>>>>  x.y.z.1
>>>>>
>>>>>  x.y.z.3
>>>>>  x.y.z.1
>>>>>  x.y.z.2
>>>>>
>>>>> And this repeats, giving round robin distribution.
>>>>>
>>>>> However, if I add one more IP address to the zone list (x.y.z.4), I
>>>>> get only following combinations:
>>>>>
>>>>> x.y.z.1
>>>>> x.y.z.2
>>>>> x.y.z.3
>>>>> x.y.z.4
>>>>>
>>>>> and
>>>>>
>>>>> x.y.z.3
>>>>> x.y.z.4
>>>>> x.y.z.1
>>>>> x.y.z.2
>>>>>
>>>>> It gets repeated. I will never get x.y.z.2 and x.y.z.4 as top entries
>>>>> in this response.
>>>>>
>>>>> Can anybody tell me why this limitation and is there any sollution to
>>>>> resove this problem?
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> Mallappa
>>>>>           
>>>> Not sure what version of BIND you are using, but here I am using 9.5.1-P2.
>>>>  I just loaded a zone with 10 www records and different IP's and they are
>>>> handed out round robin just fine.
>>>>
>>>> The idea of using DNS for load balancing has been brought up here so many
>>>> times its hard to count.  The answer is always the same. DNS was *never*
>>>> meant to provide this functionality.  Spend the big bucks and get a device
>>>> meant to do *load balancing*.
>>>>
>>>> Search the archive for previous threads on this subject.
>>>> http://marc.info/?l=bind9-users&w=2&r=1&s=load+balancing&q=b
>>>>
>>>>         
>>> _______________________________________________
>>> bind-users mailing list
>>> bind-users at lists.isc.org
>>> https://lists.isc.org/mailman/listinfo/bind-users
>>>       
>> --
>> Mark Andrews, ISC
>> 1 Seymour St., Dundas Valley, NSW 2117, Australia
>> PHONE: +61 2 9871 4742                 INTERNET: Mark_Andrews at isc.org
>>
>>     
> _______________________________________________
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
>
>
>
>