lookup timeouts

Fri Jan 21 09:23:09 UTC 2000

>I'm having some trouble with our main nameserver:
>
>the  setup:  solaris 2.6, bind 8.2.2p3, about 850 zones loaded locally,
>allow-query is set to any.
>
>the problem:
>we have a single domain(it may be more, but no other domain has come to
>out attention) that will time out on any query to our nameserver.  Our
>secondary and tertiary nameservers can find this domain fine, but when
>our primary does a lookup on it, it times out.  I have used ndc to
>perform a named_db.db dump, and have found most of them right in the
>cache part of the dump, including the SOA for the domain we are having
>trouble with.
>the weird part is this:  a complete shutdown and restart will allow
>lookups for this domain to happen correctly for a while, but a HUP will
>not.
>
>
>Has anyone seen this happening?  Does anyone have any ideas on what
>could be causing this?
>
>I'd appreciate any pointers or ideas regarding this seemingly strange
>behavior of bind.
I also have, and have reported to this list, a problem of authoratitive 
lookups becoming slow and then timing out. It was suggested to me that
named needs almost zero swaping. (although I don't see why it should 
start to have a latency of seconds when no other deamon on the system
has) I have not tried testing domain by domain when this happens, I 
had not considered that it may only affect one domain. strace shows
some kind of heat beat loop at these times, so my guess is that 
named is limiting the number of concurent lookups. This is a problem
I have when connections to the rest of the world become slow, but
obviously I would expect local domains to be available imediately. Your 
observation of it being different for each domain makes me wonder about 
the swaping idea. Maybe sections of the cache are on disk and named has a 
"special" problem with geting the OS (Linux here) to perform disk access.
Yours
Ian