In my previous blog article I talked about the need to generate an even spread of traffic across the queues of a modern multi-core NIC to achieve optimal performance.
The Intel X710 cards that we use in our performance testing lab distribute the packets to queues based on the value of a hash calculated for each incoming packet based on the source and destination IP addresses and port numbers.
Whilst this works well for traffic arriving from the internet, in a lab environment with a very limited range of traffic sources we found that the hashing produced a very uneven load on each CPU core.
Our lab uses the well-known dnsperf package from Nominum to generate streams of queries, but there are quite low limits on the number of unique source ports that we were able to use without causing dnsperf itself to drop in performance. We were able to set the number of clients (and hence source ports) to 28 but no more.
Since we were also hitting the apparent per-packet limits of what a single instance of dnsperf can generate (approximately 1Mpps) I set out to see what’s possible by bypassing the operating system. The result is
dnsgen, and this is now replacing dnsperf as our preferred test application (the Perflab code has also been updated to allow user selection of traffic generation method).
Unlike dnsperf, it uses
AF_PACKET raw sockets and therefore only runs under Linux. This use of raw sockets is what allows for the use of a far larger range of source ports and higher performance than using “normal” UDP sockets.
To reduce CPU load, dnsgen does not attempt to correlate received packets with those it has transmitted. It simply counts those packets that arrive back on the network interface. Is is therefore best used on a network interface that is directly connected to the server under test and not shared with any other services.
In normal operation the packet-per second value reported is the peak rolling average of the received packet rate observed during the run. To attempt to find this value dnsgen starts sending packets at the specified initial sending rate and then measures the rate at which packets are received. The sending rate is then adjusted every 0.1s to be the midpoint between the maximum observed rate so far and the received rate, plus a specified “increment” rate. Eventually a steady state should be achieved when the difference between the received rate and the transmitted rate is equal to the increment rate, and where that increment represents a small overhead in lost packets. In the alternative “ramp” mode packets are transmitted at the specified starting rate with the rate increasing thereafter by the specified increment every 0.1s without regard to the inbound received rate.
With this code we’re now able to test performance up to the 3 – 4 Mpps range. The packet generation code itself can actually push 10 Mpps but when handling bi-directional traffic flows the total performance does drop somewhat. The source code repository also includes an “echo” server that uses raw sockets to receive and return packets.
NB: This is not an official ISC product release. It’s an application developed for internal use that we believe may be of use to other DNS researchers. While the application already meets our current needs there’s still plenty of room for improvement, both in performance and features. In particular there’s no IPv6 support yet.
The project is hosted on our Github at https://github.com/isc-projects/dnsgen and we’d welcome feedback and contributions from the community.