URGENT, PLEASE READ: 9.5.0-P1 now available

Rupam Choudhury rupamchoudhury at yahoo.com
Wed Jul 9 20:49:55 UTC 2008


All
I am seeing the same behavior with 9.4.2-P1. I am running it on a Solaris x86 4100M2 server. I get named socket: too many open file descriptors on /var/adm/messages.
I tried to increase the OS limit by putting really high value for rlim_fd_max=1040000 on /etc/system with no luck.
Thanks
Rupam


JINMEI Tatuya / $B?@L at C#:H(B <Jinmei_Tatuya at isc.org> wrote: At Wed, 09 Jul 2008 13:21:07 -0500,
Walter Gould  wrote:

> > This situation could happen if named opens so many open sockets bound
> > to random UDP ports simultaneously.  But as long as the query rate and
> > cache hit rate are moderate, this should be a rare event in
> > practice.  So my first guess is that the system default of the maximum
> > allowable open sockets is too small.  Please check the value (e.g., by
> > 'ulimit -n' that works for many shells), and try a larger value if
> > it's too small.
> >
> > If it's equal to or larger than 1024 and you still see this problem,
> > your operational environment might be such a rare unlucky one.  In
> > that case, I'd recommend you try 9.5.1b1 (or 9.4.3b1), which handles
> > such cases much better.
> 
> # ulimit -n
> 1024

Hmm...I'm curious about whether the server really consumes all the
possible 1024 sockets.  Can you do some diagnosing, including:

- checks whether the server constantly opens such a large number of
  sockets, e.g., by using lsof
- checks how many clients the server is normally handling, by
  executing 'rndc status' several times.  (note: you may have to
  specify a smaller value for the recursive-clients option so that
  there's at least one TCP socket is available for rndc)
- checks query rate, cache hit rate, number of queries sent from the
  server per some time unit.  you can get these numbers by executing
  'rndc stats' periodically and several times

I'd also like to see your named.conf if possible.

> I increased it to 2048 - but got the same results...

This doesn't help for P1s due to the underling API limitation.

> I've not had this happen with any BIND versions that I've upgraded to in 
> the past.

This is because older versions of BIND only uses a fixed (small)
number of sockets.

> Any other suggestions other than using the beta code?

Further diagnosing like above may suggest something, but at the
moment, I have no specific idea of workaround than upgrading to a
beta.

---
JINMEI, Tatuya
Internet Systems Consortium, Inc.






More information about the bind-users mailing list