9.4.3 oddities

Cathy Almond cathya at isc.org
Mon Jan 11 17:35:32 UTC 2010


The problem reported below proves to have been resolved by this change:

2797. [bug] Don't decrement the dispatch manager's maxbuffers.
[RT #20613]

When randomized query ports was implemented, the increase in the number
concurrently-used sockets had an equivalent increased usage need of
another resource - the dispatch manager buffer pool.  This was of course
enlarged too, but an oversight meant that it could be reduced again in
some circumstances.

The reason that the rndc reconfig buys temporary relief is that it runs
through the configuration file again and revisits and reapplies the
initial large pool size decision.

The fix is currently available in 9.7.0rc1 and 9.6.2b1 will be included
in the upcoming BIND Extended Support Versions (ESVs).

Imri Zvik wrote:
> Hi,
> 
> We've recently upgraded our caching servers to 9.4.3-P4/P3 (2 of them running 
> 9.4.3-P4 and 2 running 9.4.3-P3). Few days ago I've noticed something 
> strange - When the server is loaded, some queries randomly fails (SERVFAIL). 
> It seems that only queries for which the answer is NOT cached are affected.
> I've verified with host/dig and tcpdump that there is no network issue (no 
> unanswered packets). Digging deeper into the issue, I've found that the issue 
> appears when the number of sockets used by named approach 1024~ (checked with 
> netstat/lsof). The weirdest part, is that if I run "rndc reconfig", suddenly 
> named is able to use more than 1024 sockets (I've seen it using 4000-5000~ 
> sockets), and the problem goes away for about an hour.
> 
> If I downgrade to 3.4.2-P2 the problems goes away.
> 
> I used the following command to reproduce the problem:
> for i in {1..100000}; do dig mx www.cnn.com @localhost |grep status |grep -v 
> NOERROR; done
> 
> My servers are running RHEL 5.4 (2.6.18-164.9.1.el5) and FreeBSD 7.0 (the 
> problem is seen on both), and they are splitted into two, unrelated, 
> networks, and on two separate physical locations.
> 
> I've compiled bind from the vanilla ISC sources using the following configure 
> command:
> 
> ./configure --enable-threads --enable-largefile --prefix=/usr/local
> 
> I've also tried the following (I've also raised the OS limits, of course):
> STD_CDEFINES="-DISC_SOCKET_FDSETSIZE=1048576" ./configure --enable-threads --enable-largefile --prefix=/usr/local
> 
> As I was seeing the "general: error: socket: file descriptor exceeds limit 
> (4096/4096)" error a couple of days ago.
> 
> My best guess is that the problem is related to the recent move to epoll...
> 
> Any ideas on how I should proceed from here? 
> _______________________________________________
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users




More information about the bind-users mailing list