Named process suddenly down

AOYAGI Takashi svu00012 at partner.nri.co.jp
Wed Jan 15 10:18:36 UTC 2014


Hello

I have very serious trouble about BIND.
And we are in a hurry, so please help us.

Enveroment:
"BIND 9.9.3-P2 (Extended Support Version)".


Here is situation:

Named process suddenly down at 23 Dec 19:58, and the core was outputted.
In addition to this, same kind of problem had occurred in several other servers.
Therefore, I guess this problem is not coming from hardware problem which is like memory trouble
or others. Because this incident happened on sevral servers at same time.

I put some information(/var/log/message and core and some other logs) which may indicate the problem, below.
So, please someone who could tell us about the following three questions from our given information.


1) The cause of downed process.
2) This is an discovered issue or undiscoverd?
3) Is there workaround corresponding to this?


And any advise is welcome.

Sincerely,
AOYAGI


messages 
############################################################################
Dec 23 19:58:13 hogedns021 kernel: named[1844]: segfault at 0 ip 000000000048fbb 
a sp 00007f6921778840 error 4 in named[400000+303000] 
Dec 23 19:58:15 hogedns021 abrtd: Directory 'ccpp-2013-12-23-19:58:13-1843' crea 
tion detected 
Dec 23 19:58:15 hogedns021 abrt[19323]: Saved core dump of pid 1843 (/usr/local/ 
sbin/named) to /var/spool/abrt/ccpp-2013-12-23-19:58:13-1843 (166387712 bytes) 
Dec 23 19:58:15 hogedns021 abrtd: Executable '/usr/local/sbin/named' doesn't bel 
ong to any package 
Dec 23 19:58:15 hogedns021 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2013-12 
-23-19:58:13-1843' exited with 1 
Dec 23 19:58:15 hogedns021 abrtd: Corrupted or bad directory /var/spool/abrt/ccp 
p-2013-12-23-19:58:13-1843, deleting 
############################################################################

core info
############################################################################
[root at hogedns021 log]# named -V 
BIND 9.9.3-P2 (Extended Support Version) <id:d8a6fe8b> built with '--prefix=/usr 
/local/' '--disable-openssl-version-check' '--enable-filter-aaaa' '--enable-thre 
ads' '--with-gssapi=no' 'CFLAGS=-DDIG_SIGCHASE' 
using OpenSSL version: OpenSSL 1.0.0 29 Mar 2010 
[root at hogedns021 log]#uname -a 
Linux hogedns021 2.6.32-279.19.1.el6.x86_64 #1 SMP Sat Nov 24 14:35:28 EST 2012 
x86_64 x86_64 x86_64 GNU/Linux 
[root at hogedns021 log]# ls -la /var/named/chroot/var/named/core.1843 
-rw------- 1 dns dns 166387712 Dec 23 19:58 2013 /var/named/chroot/var/named/co 
re.1843 
[root at hogedns021 log]# file /var/named/chroot/var/named/core.1843 
/var/named/chroot/var/named/core.1843: ELF 64-bit LSB core file x86-64, version 
1 (SYSV), SVR4-style, from '/usr/local/sbin/named -u dns -c /etc/named.conf -t / 
var/named/chroot' 
############################################################################

gdb back trace("thread apply all bt")
############################################################################
[root at hogedns021 tmp]# ldd /usr/local/sbin/named
        linux-vdso.so.1 =>  (0x00007fffefdff000)
        libcrypto.so.10 => /usr/lib64/libcrypto.so.10 (0x0000003a0c800000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003a0ac00000)
        libcap.so.2 => /lib64/libcap.so.2 (0x0000003a11400000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003a0b000000)
        libc.so.6 => /lib64/libc.so.6 (0x0000003a0a800000)
        libz.so.1 => /lib64/libz.so.1 (0x0000003a0b400000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003a0a400000)
        libattr.so.1 => /lib64/libattr.so.1 (0x0000003a0f000000)

[root at hogedns021 sue]# ls -la
合計 170176
drwxr-xr-x 2 root root      4096 Dec 24 11:36 .
drwxrwxrwt 5 root root      4096 Dec 24 11:36 ..
-rw------- 1 dns  dns  166387712 Dec 23 19:58 core.1843
-rwxr-xr-x 1 root root    156872 Oct 12  2012 ld-linux-x86-64.so.2
-rwxr-xr-x 1 root root     21152 Aug  8  2011 libattr.so.1
-rwxr-xr-x 1 root root   1922112 Oct 12  2012 libc.so.6
-rwxr-xr-x 1 root root     19016 Aug 23  2011 libcap.so.2
-rwxr-xr-x 1 root root   1665328 Aug 16  2012 libcrypto.so.10
-rwxr-xr-x 1 root root     22536 Oct 12  2012 libdl.so.2
-rw-r--r-- 1 root root     65928 Dec 24 11:35 libnss_files.so.2
-rwxr-xr-x 1 root root    145720 Oct 12  2012 libpthread.so.0
-rwxr-xr-x 1 root root     34008 Dec 24 10:55 libthread_db-1.0.so
-rwxr-xr-x 1 root root     90952 Aug 10  2011 libz.so.1
-rwxr-xr-x 1 root root   3491405 Aug  9 00:28 named


[root at hogedns021 sue]# gdb
GNU gdb (GDB) CentOS (7.0.1-45.el5.centos)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
(gdb) set solib-absolute-prefix /tmp/sue
(gdb) set solib-search-path /tmp/sue
(gdb) file /tmp/sue/named
Reading symbols from /tmp/sue/named...(no debugging symbols found)...done.
(gdb) core-file /tmp/sue/core.1843
[New Thread 1844]
[New Thread 1843]
[New Thread 1847]
[New Thread 1846]
[New Thread 1845]
Reading symbols from /tmp/sue/libcrypto.so.10...(no debugging symbols found)...done.
Loaded symbols for /tmp/sue/libcrypto.so.10
Reading symbols from /tmp/sue/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /tmp/sue/libdl.so.2
Reading symbols from /tmp/sue/libcap.so.2...(no debugging symbols found)...done.
Loaded symbols for /tmp/sue/libcap.so.2
Reading symbols from /tmp/sue/libpthread.so.0...(no debugging symbols found)...done.
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Loaded symbols for /tmp/sue/libpthread.so.0
Reading symbols from /tmp/sue/libc.so.6...(no debugging symbols found)...done.
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Loaded symbols for /tmp/sue/libc.so.6
Reading symbols from /tmp/sue/libz.so.1...(no debugging symbols found)...done.
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Loaded symbols for /tmp/sue/libz.so.1
Reading symbols from /tmp/sue/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Loaded symbols for /tmp/sue/ld-linux-x86-64.so.2
Reading symbols from /tmp/sue/libattr.so.1...(no debugging symbols found)...done.
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Loaded symbols for /tmp/sue/libattr.so.1
Reading symbols from /tmp/sue/libnss_files.so.2...(no debugging symbols found)...done.
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Loaded symbols for /tmp/sue/libnss_files.so.2
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Core was generated by `/usr/local/sbin/named -u dns -c /etc/named.conf -t /var/named/chroot'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000000048fbba in socket_search ()
(gdb) thread apply all bt

Thread 5 (Thread 1845):
#0  0x0000003a0b00e054 in __lll_lock_wait () from /tmp/sue/libpthread.so.0
#1  0x0000003a0b009388 in _L_lock_854 () from /tmp/sue/libpthread.so.0
#2  0x0000003a0b009257 in pthread_mutex_lock () from /tmp/sue/libpthread.so.0
#3  0x0000000000490a9c in deactivate_dispsocket ()
#4  0x0000000000491678 in udp_recv ()
#5  0x0000000000491493 in udp_exrecv ()
#6  0x0000000000637622 in dispatch ()
#7  0x0000000000637916 in run ()
#8  0x0000003a0b007851 in start_thread () from /tmp/sue/libpthread.so.0
#9  0x0000003a0a8e811d in clone () from /tmp/sue/libc.so.6

Thread 4 (Thread 1846):
#0  0x0000003a0b00b7bb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /tmp/sue/libpthread.so.0
#1  0x000000000064fba0 in isc_condition_waituntil ()
#2  0x000000000063a70f in run ()
#3  0x0000003a0b007851 in start_thread () from /tmp/sue/libpthread.so.0
#4  0x0000003a0a8e811d in clone () from /tmp/sue/libc.so.6
---Type <return> to continue, or q <return> to quit---

Thread 3 (Thread 1847):
#0  0x0000003a0a8e8713 in epoll_wait () from /tmp/sue/libc.so.6
#1  0x0000000000649bfa in watcher ()
#2  0x0000003a0b007851 in start_thread () from /tmp/sue/libpthread.so.0
#3  0x0000003a0a8e811d in clone () from /tmp/sue/libc.so.6

Thread 2 (Thread 1843):
#0  0x0000003a0a832c54 in sigsuspend () from /tmp/sue/libc.so.6
#1  0x000000000063b6e2 in isc__app_ctxrun ()
#2  0x000000000063b747 in isc__app_run ()
#3  0x0000000000419fd7 in main ()

Thread 1 (Thread 1844):
#0  0x000000000048fbba in socket_search ()
#1  0x0000000000490148 in get_dispsocket ()
#2  0x0000000000497265 in dns_dispatch_addresponse2 ()
#3  0x0000000000568581 in resquery_send ()
#4  0x0000000000567d5b in fctx_query ()
#5  0x0000000000576f3c in resquery_response ()
---Type <return> to continue, or q <return> to quit---
#6  0x0000000000637622 in dispatch ()
#7  0x0000000000637916 in run ()
#8  0x0000003a0b007851 in start_thread () from /tmp/sue/libpthread.so.0
#9  0x0000003a0a8e811d in clone () from /tmp/sue/libc.so.6
(gdb)
############################################################################


gdb back trace("thread apply all bt full")
############################################################################
# gdb
GNU gdb (GDB) CentOS (7.0.1-45.el5.centos)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
(gdb) set solib-absolute-prefix /tmp/sue
(gdb) set solib-search-path /tmp/sue
(gdb) file /tmp/
.ICE-unix/               libthread_db-1.0.so_org  sue-bak/
.font-unix/              named_bak_20131224_sue   sue.tar.gz
libnss_files.so.2        named_debug_cadns        sue2.tar.gz
libpthread-2.12.so       sue/                     test/
(gdb) file /tmp/named_debug_cadns
Reading symbols from /tmp/named_debug_cadns...done.
(gdb) core-file /tmp/sue/core.1843
warning: core file may not match specified executable file.
[New Thread 1844]
[New Thread 1843]
[New Thread 1847]
[New Thread 1846]
[New Thread 1845]
Reading symbols from /tmp/sue/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /tmp/sue/ld-linux-x86-64.so.2
Core was generated by `/usr/local/sbin/named -u dns -c /etc/named.conf -t /var/named/chroot'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000000048fbba in load_text (lctx=0xffffffffffffff90) at master.c:1611
1611                            GETTOKEN(lctx->lex, 0, &token, ISC_FALSE);
(gdb) thread apply all bt full

Thread 5 (Thread 1845):
#0  0x0000003a0b00e054 in ?? ()
No symbol table info available.
#1  0x00000000000035c8 in ?? ()
No symbol table info available.
#2  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 4 (Thread 1846):
#0  0x0000003a0b00b7bb in ?? ()
No symbol table info available.
#1  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 3 (Thread 1847):
#0  0x0000003a0a8e8713 in ?? ()
No symbol table info available.
#1  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 2 (Thread 1843):
#0  0x0000003a0a832c54 in ?? ()
---Type <return> to continue, or q <return> to quit---
No symbol table info available.
#1  0x0000003a0a883a12 in ?? ()
No symbol table info available.
#2  0x00007ffffd02c180 in ?? ()
No symbol table info available.
#3  0x0000000000000057 in ?? ()
No symbol table info available.
#4  0x000000000063b6e2 in getservbyname ()
No symbol table info available.
#5  0x00007f6921782010 in ?? ()
No symbol table info available.
#6  0x0000000000915de0 in ?? ()
No symbol table info available.
#7  0x00000000004058f0 in ?? ()
No symbol table info available.
#8  0x0000000000610ad7 in ?? ()
No symbol table info available.
#9  0x00007ffffd02c170 in ?? ()
No symbol table info available.
#10 0x00007f692179b030 in ?? ()
No symbol table info available.
#11 0x0000000021796540 in ?? ()
No symbol table info available.
---Type <return> to continue, or q <return> to quit---
#12 0x0000000000000000 in ?? ()
No symbol table info available.

Thread 1 (Thread 1844):
#0  0x000000000048fbba in load_text (lctx=0xffffffffffffff90) at master.c:1611
        rdclass = 0
        type = 20
        covers = <value optimized out>
        ttl_offset = 0
        new_name = <value optimized out>
        current_has_delegation = isc_boolean_false
        done = isc_boolean_false
        finish_origin = isc_boolean_false
        finish_include = isc_boolean_false
        read_till_eol = 561482224
        initialws = isc_boolean_false
        include_file = 0x0
        token = {type = 561485888, value = {as_char = -25 '\347',
            as_ulong = 4682727, as_region = {
              base = 0x4773e7 "D$\034\205\300\017\205", <incomplete sequence \313>, length = 6863648}, as_textregion = {
              base = 0x4773e7 "D$\034\205\300\017\205", <incomplete sequence \313>, length = 6863648}, as_pointer = 0x4773e7}}
---Type <return> to continue, or q <return> to quit---
        result = 65552
        glue_list = {head = 0x7f69193cd310, tail = 0x7f6921779760}
        current_list = {head = 0x908e70, tail = 0xb85300}
        this = <value optimized out>
        rdatalist = 0x0
        new_rdatalist = 0x7f6910fe7f80
        rdlcount = 0
        rdlcount_save = 0
        rdatalist_size = 3
        buffer = {magic = 0, base = 0x7f6900000000, length = 6863648,
          used = 0, current = 561485664, active = 32617, link = {
            prev = 0x7f6921779760, next = 0x68bb20}, mctx = 0x1421779770}
        target = {magic = 0, base = 0x7f69217796d0, length = 561485744,
          used = 32617, current = 4682727, active = 0, link = {
            prev = 0x908e70, next = 0x7ffffd02c100}, mctx = 0x7f6921779740}
        target_ft = <value optimized out>
        target_save = Asked for position 0 of stack, stack only has 0 elements on it.
(gdb)
(gdb) quit

###messages
Dec 23 19:58:13 hogedns021 kernel: named[1844]: segfault at 0 ip 000000000048fbb a sp 00007f6921778840 error 4 in named[400000+303000] Dec 23 19:58:15 hogedns021 abrtd: Directory 'ccpp-2013-12-23-19:58:13-1843' crea tion detected Dec 23 19:58:15 hogedns021 abrt[19323]: Saved core dump of pid 1843 (/usr/local/
sbin/named) to /var/spool/abrt/ccpp-2013-12-23-19:58:13-1843 (166387712 bytes) Dec 23 19:58:15 hogedns021 abrtd: Executable '/usr/local/sbin/named' doesn't bel ong to any package Dec 23 19:58:15 hogedns021 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2013-12 -23-19:58:13-1843' exited with 1 Dec 23 19:58:15 hogedns021 abrtd: Corrupted or bad directory /var/spool/abrt/ccp p-2013-12-23-19:58:13-1843, deleting ### 

###info
[root at hogedns021 log]# named -V
BIND 9.9.3-P2 (Extended Support Version) <id:d8a6fe8b> built with '--prefix=/usr /local/' '--disable-openssl-version-check' '--enable-filter-aaaa' '--enable-thre ads' '--with-gssapi=no' 'CFLAGS=-DDIG_SIGCHASE' 
using OpenSSL version: OpenSSL 1.0.0 29 Mar 2010
[root at hogedns021 log]# uname -a
Linux hogedns021 2.6.32-279.19.1.el6.x86_64 #1 SMP Sat Nov 24 14:35:28 EST 2012
x86_64 x86_64 x86_64 GNU/Linux
[root at hogedns021 log]# ls -la /var/named/chroot/var/named/core.1843
-rw------- 1 dns dns 166387712 Dec 23 19:58 2013 /var/named/chroot/var/named/co
re.1843
[root at hogedns021 log]# file /var/named/chroot/var/named/core.1843
/var/named/chroot/var/named/core.1843: ELF 64-bit LSB core file x86-64, version
1 (SYSV), SVR4-style, from '/usr/local/sbin/named -u dns -c /etc/named.conf -t / var/named/chroot' 
############################################################################




More information about the bind-users mailing list