bind 9.6-esv-r1 segfault

kalpesh varyani kalpesh.link at gmail.com
Tue Nov 1 14:40:47 UTC 2011


Hi,

I seem to have hit the same issue on Bind 9.7.3.

=== [Test environment] ===

- The issued system is cache server. It does not have a zone which it
can respond as a master server.

- The server which receives a recursive query asks a recursive query
from root server to the last server in order.

 Then the last server returns A record, MX record, and TXT record.

Due to high inflow of queries, number of socket connections got exhausted.
"general: error: socket: file descriptor exceeds limit (2048/2048)"

The server received too many queries for the domain for which either they
were not authoritative or it could not find response even if they are
marked as authoritative. Thus they were marked as lame.
"lame-servers: info: lame server resolving "

At this point we started going out of memory.
"resolver: error: could not mark server as lame: out of memory"

Code and dump analysis suggest that threads were stuck in select at the
time of sending query from resolver.

Has there been any workaround or fix for this issue?

Since too many requests are pending for socket fd, I think that running the
nameserver with epoll instead of select and increasing the number of socket
connections should help in reducing the traffic.

I would very much appreciate any suggestion/ideas on this issue.

Regards,
Kalpesh

On Sun, Sep 26, 2010 at 5:53 PM, Sergey V. Lobanov <sergey at lobanov.in>wrote:

> OK. I sent the bug the report to bind9-bugs at isc.org (Ticket [ISC-Bugs
> #22208])
>
> 26.09.10, 01:41, "Cathy Almond" <cathya at isc.org>:
>
> > Hi Sergey,
> >
> >  At the moment this doesn't sound like anything we've seen before.
> >  Please could you report it to bind9-bugs at isc.org:
> >  https://www.isc.org/software/bind/news
> >
> >  We'll need the core dump, the binary that generated it and the libs
> >  associated with the binary (ldd named should capture the list we need)
> >  in order to analyze it.
> >
> >  Thanks,
> >
> >  Cathy
> >
> >  On 24/09/10 19:09, Sergey V. Lobanov wrote:
> >  > Some info from the core dump:
> >  >
> >  > General info:
> >  > Core was generated by `/usr/local/sbin/named -4 -c /etc/named.conf -t
> >  > /var/lib/named -u named -n 4'.
> >  > Program terminated with signal 11, Segmentation fault.
> >  > #0  0x0813d4d7 in resquery_udpconnected (task=0x8230ef88,
> event=0xa5bbf068)
> >  >     at resolver.c:1202
> >  > 1202        QTRACE("udpconnected");
> >  >
> >  > Backtrace:
> >  > #0  0x0813d4d7 in resquery_udpconnected (task=0x8230ef88,
> event=0xa5bbf068)
> >  >     at resolver.c:1202
> >  > #1  0x081c4916 in dispatch (manager=) at task.c:862
> >  > #2  run (manager=) at task.c:1005
> >  > #3  0xb753c725 in start_thread () from /lib/libpthread.so.0
> >  > #4  0xb73181ee in clone () from /lib/libc.so.6
> >  >
> >  > Program listing:
> >  > 1197    resquery_udpconnected(isc_task_t *task, isc_event_t *event) {
> >  > 1198        resquery_t *query = event->ev_arg;
> >  > 1199
> >  > 1200        REQUIRE(event->ev_type == ISC_SOCKEVENT_CONNECT);
> >  > 1201
> >  > 1202        QTRACE("udpconnected");
> >  > 1203
> >  > 1204        UNUSED(task);
> >  > 1205
> >  > 1206        INSIST(RESQUERY_CONNECTING(query));
> >  >
> >  >
> >  > *event:
> >  > $7 = {ev_size = 48, ev_attributes = 0, ev_tag = 0x0, ev_type = 131076,
> >  >   ev_action = 0x813d490 , ev_arg = 0x89db2c80,
> >  >   ev_sender = 0x8232d180, ev_destroy = 0x81a8970 ,
> >  >   ev_destroy_arg = 0x821d0e8, ev_link = {prev = 0xffffffff, next =
> >  > 0xffffffff}}
> >  >
> >  > *query:
> >  > $8 = {magic = 2312763280, fctx = 0xdededede, mctx = 0xdededede,
> >  >   dispatchmgr = 0xdededede, dispatch = 0xdededede,
> >  >   exclusivesocket = 3739147998, addrinfo = 0xdededede, tcpsocket =
> >  > 0xdededede,
> >  >   start = {seconds = 3739147998, nanoseconds = 3739147998}, id =
> 57054,
> >  >   dispentry = 0xdededede, link = {prev = 0xdededede, next =
> 0xdededede},
> >  >   buffer = {magic = 3739147998, base = 0xdededede, length =
> 3739147998,
> >  >     used = 3739147998, current = 3739147998, active = 3739147998,
> link = {
> >  >       prev = 0xdededede, next = 0xdededede}, mctx = 0xdededede},
> >  >   tsig = 0xdededede, tsigkey = 0xdededede, options = 3739147998,
> >  >   attributes = 3739147998, sends = 3739147998, connects = 3739147998,
> >  >   data =
> >  >

>  \33
> >  6\
> >  >
> >  >

>  36\
> >  3
> >  >
> >  > 36\336\336\336\336\336\336\336\336\336\336\336\336\336\336\336\336",
> >  > }
> >  >
> >  > Any ideas?
> >  >
> >  > On 09/24/2010 09:13 AM, Sergey V. Lobanov wrote:
> >  >> Yesterday Bind has crashed with the following error:
> >  >>
> >  >> # grep segfault messages
> >  >> Sep 23 20:21:10 ns kernel: [5079807.029465] named[19531]: segfault at
> >  >> dededf1e ip 0813d4d7 sp b618f320 error 5 in named[8048000+1c9000]
> >  >>
> >  >> Is it possible to determine the cause of this failure?
> >  >>
> >  >> # uname -a
> >  >> Linux ns 2.6.32.13-0.4-pae #1 SMP 2010-06-15 12:47:25 +0200 i686 i686
> >  >> i386 GNU/Linux
> >  >>
> >  >> bind configuration options:
> >  >> $ ./configure --enable-largefile --enable-ipv6 --enable-epoll
> >  >> --enable-threads
> >  >>
> >
> >  _______________________________________________
> >  bind-users mailing list
> >  bind-users at lists.isc.org
> >  https://lists.isc.org/mailman/listinfo/bind-users
> >
> >
>
> --
> wbr,
> Sergey V. Lobanov
> E-mail: sergey at lobanov.in
> Jabber ID: sergey at lobanov.in
> Tel.: +79200222866
>  _______________________________________________
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20111101/3cedb7c3/attachment.html>


More information about the bind-users mailing list