Operation Cancelled Error

Thu Jul 12 15:30:49 UTC 2012

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Ben,

On 7/12/12 10:32 AM, Ben wrote:

> Still, my question is open..

I'm not from ISC, but I have an idea what causes this (but I'm not an
authoritative source). You can look up the BIND source code.

Every caching DNS Server (BIND or other products) can only work an a
finite number of outstanding queries to other (authoritative) DNS
servers.

If an outstanding query to an external server takes too long (answer
does not come in), the DNS server needs to cancel the operation in
order to free up resources, that can be better used for new queries.
Without this mechanism, it would be possible to one client to eat up
all resources and block the whole server.

The log messages you are seeing are from these long queries that never
got an answer.

This might be because you are abusing a public DNS server for load
testing. That is not good. It is possible that the Google DNS is
rate-limiting or even blacklisting your server (and no-one can blame
them for doing that).

For proper DNS caching benchmarking, you should:

* create a closed DNS system (not connected to any production network
or the Internet), containing authoritative servers with a root zone,
TLD zones (com, net, org ...) and all second level (and further down
level) zones you like to query. The data in the zone doesn't matter,
you can make up data, but make sure that you get the delegation correct.

* Use more than one IP address on each of these servers (or more
physical servers) to give BIND some "round trip time" work to do. Use
high TTLs on the records you are testing, so that they stay in cache.

* load a root-hint zone into your caching DNS BIND server that points
to the root server in the lab.

* make sure the network link between the authoritative servers and the
caching server is fast (if you use cheap 100MBit Ethernet switches,
your are possibly not testing the caching server, you are benchmarking
the switches)

* then start benchmarking with one client, just to fill the cache.
Throw away that result. Now your cache is "hot", it contains the
cached information.

* do more test runs. add more clients for every test run. measure the
queries per second you see from the caching server on every client.
Add up all results. Once the total QPS you see on all clients does not
go up when you add a new client, you have reached the maximum QPS of
that caching server.

* if you use just one client to generate load, you are testing the
speed of one client. That is not realistic. caching servers can often
do a higher total QPS when using multiple clients. I often use 4-6
fast client machines to saturate one caching server.

* use different tools to generate the load traffic (queryperf,
resperf, perf from Unbound ...). Compare the results. make suer they
make sense to you. Do not believe benchmarking tools until you
understand the results. read the source. make sure you are not
benchmarking the load generation tool.

Proper load testing and benchmarking is an art. It requires time and
work. And more time. And more work.

- -- Carsten
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk/+7akACgkQsUJ3c+pomYFqhwCfZqaV+dDqIpak8Ngf7sPhr4Kq
Mq8AoKrfkjiysncAxx3kGHCX5kp+xZZG
=Owgv
-----END PGP SIGNATURE-----