File Descriptor limit and malfunction bind

JINMEI Tatuya / 神明達哉 jinmei at isc.org
Fri Jan 8 06:54:39 UTC 2010


At Tue, 05 Jan 2010 10:36:27 +0200,
Imri Zvik <imriz at inter.net.il> wrote:

> > i have a high load DNS server running bind 9.4.3 on RH -
> > yesterday we experienced a problem with the bind  (the bind froze) , and
> > when looking at the logs i saw the following error :
> > named error: socket: file descriptor exceeds limit (4096/4096)
> > i looked at my OS file descriptor limit and using ulimit -n   - 1024 .
> > where the number 4096 come from?

It's the hard-coded default maximum number of file descriptor (which
is nearly equal to the maximum allowable number of open sockets).

> If I'm not mistaken, you should either recompile with a higher value for 
> ISC_SOCKET_MAXSOCKETS or restart named with the -S <maxsockets> argument.

I'm afraid it's yes and no.  Yes, you can raise the hard coded default
value by the -S command line option.  (I'm afraid) no, I suspect it
won't solve the problem.  From my past experiences, 4096 should be
sufficient even for a very busy server.  If it still consumes all
available sockets, it's more likely to mean there's some unexpected
serious error (bug) which can't be mitigated by raising that limit.

I've heard of similar reports (seemingly consuming all available
sockets and named "freezes"), but unfortunately I couldn't reproduce
it myself and since it seems to be quite rare I've not figured out the
problem.

One possible workaround one may want to try is to *disable* epoll, the
efficient version of I/O API for Linux:
./configure --disable-epoll

This means named will use the inefficient API of select, but depending
on the machine power and the server load, it may provide acceptable
performance and rather stabler behavior as select is (seemingly)
stabler API.

---
JINMEI, Tatuya
Internet Systems Consortium, Inc.



More information about the bind-users mailing list