file descriptor exceeds limit

Mike Hoskins (michoski) michoski at cisco.com
Thu Jun 18 16:27:39 UTC 2015


Inline...responding to each of these including Kathy's soon (thanks to the
community for the responses).  Following with interest as we've seen this
for awhile, though we are possibly a special case which I'll describe more
in another response.


On 6/18/15, 7:00 AM, "Matus UHLAR - fantomas" <uhlar at fantomas.sk> wrote:

>On 17.06.15 22:39, Shawn Zhou wrote:
>>BIND on my resolvers reaches the max open file limit and I am getting
>>lots
>> of SERVFAILs
>>http://pastebin.com/SxRsHLff
>
>>After I increased the max-socks (-s 8192) to 8192, I no longer saw the
>>file
>> limit error from the log anymore; however, I am still many SERVFAILs.
>
>no other errors?


When we've dug into it (really, the investigation is ongoing) we don't
notice anything "abnormal".  That means there are plenty of things being
logged, but nothing you don't always see in the modern world of broken DNS
servers, firewalls, network path, etc.


>>Our resolvers were doing about 15k queries per seconds when this was
>> happening and those were legit traffic.  I am aware that I am setting
>> recursive clients to a very high number.  Those resolvers are running on
>> 12-cores cpu and 24G RAM hardware.  cpu utilization was at about 20% and
>> plenty of RAM left.
>
>>I am wondering if I've reached the limit of BIND for the amount of
>> recursive queries it can serve.  Any other tunings I should try?
>
>maybe changing number of recursive-clients, max-clients-per-query.


Have tweaked all these repeatedly, first following community best practice
and then going for the sky (big iron) just to see what impact it had.
None really.


>Does EDNS work for you? EDNS problems often result to increased number of
>TCP queries which slows down resolution ...


Yeah, works fine and passes all tests (manual digs, OARC, etc).


>
>> By the way, the resolvers are running RHEL 6.x.
>
>precise BIND version would help a bit more... seems RH6.6 contains 9.8.2
>but
>that may be different for older RH6 versions.


We're running centos 6.x, but use the latest BIND 9.9.x releases.



More information about the bind-users mailing list