Too many open files Re: Patch 9.4.2->9.4.2-P1 breaking tcp-responder?

Thomas Jacob jacob at internet24.de
Sun Jul 27 21:41:05 UTC 2008


On Fri, Jul 25, 2008 at 03:35:41PM -0500, Jason Bratton wrote:
> After posting my email, I finally figured out the problem.  For some 
> reason, I had to set tcp-listen-queue.  I never had it set before, so 
> something changed in the code, but yeah, that fixed it.  I set both 
> tcp-clients and tcp-listen-queue to 1000 and haven't had any problems 
> like that since.

That didn't seem to do it for us, the bind instance in question ran
for about 38 hours and then it refused to accept tcp connections again.

I found the following error message in the logs at about the
time of the outage:

27-Jul-2008 15:35:12.234 resolver: notice: clients-per-query decreased to 17
27-Jul-2008 15:35:34.440 general: error: socket.c:1996: unexpected error:
27-Jul-2008 15:35:34.440 general: error: internal_accept: fcntl() failed: Too many open files
27-Jul-2008 15:35:34.452 general: error: socket.c:1996: unexpected error:
27-Jul-2008 15:35:34.452 general: error: internal_accept: fcntl() failed: Too many open files

Since messages there seem to be several messages about the file handle limit being
exceeded on the list already, I presume it's the same problem that other people
are having with the 9.5.0-P1 patch.

Anybody has any suggestions what I specifically I should be looking at at
the next outage?


> -- Jason
> 
> Thomas Jacob wrote:
> > Hello list,
> > 
> > We're having problems with the -P1 version, some time after
> > starting the server (could be minutes or hours), the tcp request
> > handler seems to get stuck, and all (or almost all) new requests
> > get stuck in the SYN_RECV tcp stat. We haven't found out what
> > exactly triggers this yet, could be load, could be specific
> > types of queries.
> > 
> > This seems to be the same problem as described
> > in the following post by Jason Bratton:
> > 
> > http://marc.info/?l=bind-users&m=121628960603391&w=2
> > 
> > The main difference should be that we're running
> > the version of bind that comes with Ubuntu 8.0.4 LTS x86_64,
> > and the problems happen when upgrading from
> > version bind9_9.4.2-10 to bind9_9.4.2-10ubuntu0.1, a diff
> > between these two shows the exact same -P1 patch as in the upstream
> > version.
> > 
> > Our tcp related settings:
> > 
> > transfers-out 100;
> > transfers-per-ns 100;
> > tcp-clients 5000;
> > recursive-clients 10000
> > 
> > Is anyone else seeing this? Is this really a bind bug? And if yes, is
> > there a workaround?
> > 
> > 
> >      Regards,
> >        Thomas
> > 
> 
> 
> Confidentiality Notice: This e-mail message (including any attached or
> embedded documents) is intended for the exclusive and confidential use of the
> individual or entity to which this message is addressed, and unless otherwise
> expressly indicated, is confidential and privileged information of Rackspace.
> Any dissemination, distribution or copying of the enclosed material is prohibited.
> If you receive this transmission in error, please notify us immediately by e-mail
> at abuse at rackspace.com, and delete the original message.
> Your cooperation is appreciated.
> 
> 


More information about the bind-users mailing list