Recursion ceases for 5-10 minutes at random intervals throughout the day
Bill Springall
springall at fuse.net
Fri Feb 15 19:48:24 UTC 2008
Thanks for your reply
Correct, the requests themselves were answered but just with, "Server
Failure", messages. (always seemed to respond quickly) When it has
happened to me, I was unable to get anything but the error message,
although the graphs indicate ~100qps getting success (perhaps cache?)
(Graph: http://home.fuse.net/springall/dns-3.png - 5 min poll)
The server itself has been relatively flat when it comes to memory
usage. It sits at about 750M. I can set up a process memory graph if
needed.
The CPU does jump up to 25% CPU load from 10%, during the last spike I
checked.
Unfortunately, I haven't tried Bind without thread support. We have had
good luck with threads in testing and prod (especially with 2xdual
Opterons), so I haven't tried it.
Thanks again!
- Bill
JINMEI Tatuya / ???? wrote:
> At Wed, 13 Feb 2008 17:32:41 -0500,
> Bill Springall <springall at fuse.net> wrote:
>
>> Each server handles anywhere between 500-1500 qps throughout the
>> day, under normal load. Problem occurs at all loads.
>> I've tried port, "monitoring", tcpdumping the traffic, and sifting
>> through the requests and nothing seems out of the ordinary. Numerous
>> tweaks of the OS have not helped (state table within limits and then
>> disabled, firewall deactivated/activated, eth stats good). When the
>> problems happens I can get onto the machine and it is ok (network
>> upstream good, routing table hasn't inherited anything new, server calm)
>> When I turn logging up to a level that can help, named can't keep up.
>> We are now have a troubleshooting process in the works that
>> involves different hardware and 9.4.2, environment re-architecture, as
>> well as, <shiver>, other caching dns software.
>> Is there a known problem, that I haven't been able to find, that
>> could be causing this? As I understand the, "Server Failure", message
>> is a general message, could someone help to point me to the next thing
>> to try? Any help would be appreciated!
>
> I cannot think of a reason, but please let me ask something first.
>
> - according to your description, the queries were not dropped, but
> were simply responded with server failure, right?
> - how much of memory does named use when this occurs?
> - how busy (in terms of CPU utilization) is named when this occurs?
> - does this change if you disable threads?
>
> Thanks,
>
> ---
> JINMEI, Tatuya
> Internet Systems Consortium, Inc.
>
--
Bill Springall
Systems Engineer/UNIX Administrator
Cincinnati Bell/ZoomTown.com/Fuse.net
Email: springall at fuse.net
Desk: 513.565.9787
______________________________________________________________
Learn avidly.
Question repeatedly what you have learned.
Analyze it carefully.
Then put what you have learned into practice intelligently.
- Confucius
_______________________________________________________________
More information about the bind-users
mailing list