Bind 9.10.3 on CentOS 7.1 - Recv-q on vmware

Rasmus Edgar regj at arch-ed.dk
Tue Dec 15 14:20:05 UTC 2015


Hi Bind-⁠users,

A colleague recently posted a question on this list concerning latency 
and full recv-q on vmware using bind 9.10.3 and we have carried out some 
tests. As I have just joined the list recently I am creating this new 
thread.

We started noticing 1s+ latency problems on clients resolving using the 
vmware guest at a load around 6000 qps.

Test setup:

1 x x86_64 vmware guest on Esx 5.5
8xVCPU
8G RAM
vmxnet3 10Gb virtual interface
CentOS 7.1
Bind 9.10.3 resolver

1xIBM x86_64 physical machine
24xCPU cores
16G ram
1Gb interface
CentOS 7.1
Bind 9.10.3 resolver

Both bind servers are on the same VLAN.

Both bind servers have an identical bind configuration.

Test client is on the same VLAN as both servers, but is virtual and 
using the same hypervisor as the vmware guest.

Sysctl tuning:
/⁠etc/⁠sysctl.d/⁠tuning.conf
# 32M receive bufer
net.core.rmem_max=33554432
# 32M send buffer
net.core.wmem_max=33554432
net.core.netdev_max_backlog=2000
net.ipv4.ip_local_port_range=1024 65000
net.netfilter.nf_conntrack_max=1048576

/⁠etc/⁠modprobe.d/⁠nf_conntrack.conf
options nf_conntrack hashsize=262144

How to reproduce:

The tests were done with dnsperf using the following test data:

http://pkgs.fedoraproject.org/repo/pkgs/dnsperf/queryfile-example-10million-201202.bz2/0ff3de3eaf30a4ed94031fb89997369a/queryfile-example-10million-201202.bz2

And the following command:

./dnsperf -f inet -s <redacted ip> -d queryfile-example-10million-201202 
-l 30 -q 15000

The same dnsperf tests were run with powerdns as a resolver on the same 
two servers, and no full recv-q was seen on either the physical or the 
virtual machine. The performance on vmware was on par with a physical 
machine, when testing with powerdns.

We are having a hard time pinpointing the reason why recv-q gets full on 
vmware with bind 9.10.3.

I have attached some data which illustrates how the recv-q fills up on 
the vmware guest compared to the physical machine.The recv-q on the 
physical machine was aprox. half of what was seen on the virtual 
machine. Netstat extracts netstat_*-2-3 are the most illustrative.

Suggestions for further ways to troubleshoot the issue and possible 
solutions are welcome.

Br,
Rasmus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: netstat-stat.tar.gz
Type: application/x-gzip
Size: 2562 bytes
Desc: not available
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20151215/ff7390a0/attachment.bin>


More information about the bind-users mailing list