recursive-clients queue size & clean-up
Ladislav Vobr
lvobr at ies.etisalat.ae
Tue Aug 17 08:10:58 UTC 2004
Do any guidelines about how to size your recursive-clients queue exist ?
I have public recursive server with around 2000req/sec.
Does each slot in the recursive-client queue being clean up after the
timeout expire, if there is no response? Or some slots are being
occupied longer, it seems to me that when I reach this limit there is no
really way back to stabilize bind, all cpu will be used and even if I
leave it over night when the traffic sometimes goes as little as 300-400
req/sec it will not recover and still the messages keeps coming from
time to time, cpu is very high (abnormal to the number of incoming
requests) and number of requests logged to the query.log file is almost
just half of what the box is really suppose to receive, (looks like bind
or os dropping the traffic).
There is no weird traffic, maybe there was a weird spike, but it should
recover. When I stop and start service resumes, cpu drops, traffic
comes back to normal rate, not almost like half rate as it was during
the problem, and recursive-client queue is not overflowed.
I have recently moved to Solaris9 with the latest patches, I have tried
several ways how to compile the bind, and I had solaris 8 before, I had
even tried several bind versions, single thread, multithread, 32bit
code, 64 bit code, but I still face this problem from time to time, I
managed to trace it back little, it looks to me like there is always
before this problem happen some spike in the traffic, like temporarily
flood (let's say for few seconds ,minutes - like 500/600 req/sec of
unreachable domain), recursive-client queue gets full and doesn't really
recover afterwards...
Server is e280 2xCPU Sparc3,bind 9.2.1 and 9.2.3
Does rndc flush, flush the recursive client queue as well ?
If I assume 90 seconds timeout for each slot in the queue, it basically
means (11 unreachable req)/sec will fill 1000 slot queue in 90 seconds,
once it is full how it will recover? Unless I have traffic with less
then (11 unreachable req)/second it can not recover. How many such a
requests are in public traffic received with 2000req/sec rate?
definitely 11 such a requests will be there, not just eleven but IMHO
100 (one hundred) maybe 200 or 300... What should be the queue size?
300/11*1000=27272(twentyseventhousand)???
I posted similar issue some time back, but couldn't make some conclusion
from answers. Does it really seems to be so minor thing or there is
really no clue how to set the queue size, since it is not clear how it
is being used?
Do we need commercial support to get somebody answers, yes this is the
way how the queue is managed, these are the guidelines how to set it,
this is the way how to recover, if it became full? The queue size
doesn't purely depend on number of users or requests, but also on the
weirdness of the traffic, which is especially in public environment
increasingly becoming very very weird. If there are guidelines, and
general understanding of the queue management, each of us can tune it as
per his own traffic characteristic.
Ladislav
More information about the bind-users
mailing list