BIND srtt algorithm not working as expected

Wed May 16 09:25:41 UTC 2018

Hello,

I am researching an issue we are seeing with significant volumes of DNS traffic being sent to non-local forwarders. I think I understand how the srtt algorithm works, but I am seeing more traffic going to the non-local forwarders than I was expecting.

To give you some context, we have 2 forwarders in the UK and 2 in Hong Kong, all 4 servers are responsible for outbound internet resolution. We also have a number of resolving servers (in the UK and Hong Kong) that have these 4 servers listed in their local "forwarders" statement, so I am expecting the HK resolvers to forward mainly to the 2 local HK forwarders, with the occasional query out to the 2 UK forwarders so that the rtt can be measured.

When I do a packet capture on a Hong Kong resolver, over a 5 minute period, 22% of all packets captured are DNS queries being forwarded to the local HK forwarders, and 14% of the packets captured are being sent to the UK forwarders - this seems high to me. I had always believed that the number of queries sent to non-local forwarders would be a lot lower, but from looking into this in detail this doesn't seem to be the case.

When I do a ping from Honk Kong, the rtt to the UK forwarders is 180-190ms, in contrast the local HK forwarder rtt is <1ms. I can see from dumping the cache on the HK resolver that the rtt is indeed much lower to the HK servers:

;       10.<HK IP> [srtt 478560] [flags 00004000] [edns 146/5/4/4/4] [plain 0/0] [udpsize 2448] [ttl -1033437]

;       10.<HK IP> [srtt 648550] [flags 00004000] [edns 153/4/4/4/4] [plain 0/0] [udpsize 2270] [ttl -1033437]

;       10.<UK IP> [srtt 2774590] [flags 00004000] [edns 133/4/4/4/2] [plain 0/0] [udpsize 1160] [ttl -1033437]

;       10.<UK IP> [srtt 3477510] [flags 00004000] [edns 170/6/6/6/4] [plain 0/0] [udpsize 1012] [ttl -1033437]

I did some digging and came across this presentation: https://www.nanog.org/meetings/nanog54/presentations/Tuesday/Yu.pdf

This seems to imply on slide 16 that with lower query rates, BIND 9.8 has a habit of sending fairly significant volumes to DNS servers with higher rtts. I am wondering if this is still the case in BIND 9.10 or 9.11 and whether there is anything that can be done about it?

In BIND 8 I think we could have used the topology statement to influence the behaviour but I gather that is not an option in BIND 9?

Is there a solution to this because the slow responses back from the UK are impacting application performance for users in HK?

We need to keep the UK servers as part of the configuration for failover/redundancy, removing them is not an option.

Thanks,

Paul

Paul Roberts
Calleva Networks Ltd.
Email: paul at callevanetworks.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20180516/333ea885/attachment.html>