BIND 9.6.1-P1 crashing

Dario Miculinic dario.miculinic at t-com.hr
Tue Jan 5 07:24:16 UTC 2010


I dont't have the same core dump, but this is from one that happend yesterday:

#0  0x080db986 in ttl_sooner (v1=0x0, v2=0x59375628) at rbtdb.c:752
752     ttl_sooner(void *v1, void *v2) {
(gdb) where
#0  0x080db986 in ttl_sooner (v1=0x0, v2=0x59375628) at rbtdb.c:752
#1  0x0819e708 in isc_heap_delete (heap=0xb0f54068, index=1) at heap.c:218
#2  0x080e039f in free_rdataset (rbtdb=0xb0f4f008, mctx=0x864bea0, rdataset=0x59375628) at rbtdb.c:1273
#3  0x080e04c3 in clean_stale_headers (rbtdb=0xb0f4f008, mctx=0x864bea0, top=0x4af6f3e0) at rbtdb.c:1331
#4  0x080e10c4 in decrement_reference (rbtdb=0xb0f4f008, node=0x411b1368, least_serial=0,
     nlock=isc_rwlocktype_read, tlock=isc_rwlocktype_none, pruning=isc_boolean_false) at rbtdb.c:1348
#5  0x080ea711 in detachnode (db=0xb0f4f008, targetp=0xb42fe2e4) at rbtdb.c:4877
#6  0x080ea9b1 in rdataset_disassociate (rdataset=0xb05c5a48) at rbtdb.c:7173
#7  0x0812e55a in dns_rdataset_disassociate (rdataset=0xb05c5a48) at rdataset.c:101
#8  0x08132f9e in fctx_destroy (fctx=0xb05c5988) at resolver.c:3081
#9  0x0813548e in fctx_doshutdown (task=0xb0f0ea30, event=0xb05c59e0) at resolver.c:3246
#10 0x081b9221 in run (uap=0xb7f09008) at task.c:862
#11 0x0094c73b in start_thread () from /lib/libpthread.so.0
#12 0x008a1cfe in clone () from /lib/libc.so.6


This is the output of "thread apply all bt full" command (it's quite long):

(gdb) thread apply all bt full

Thread 11 (process 11988):
#0  0x00fe4410 in __kernel_vsyscall ()
No symbol table info available.
#1  0x007f9367 in sigsuspend () from /lib/libc.so.6
No symbol table info available.
#2  0x081bcc74 in isc_app_run () at app.c:534
         event = (isc_event_t *) 0x0
         next_event = <value optimized out>
         task = (isc_task_t *) 0x0
         sset = {__val = {0 <repeats 32 times>}}
         strbuf = 
"č\220a\b\000\020\005\000\000đ˙˙\000\000\003\000\030\021ńˇy\000\000\000\002\000\000\000ü\023ňˇ\000\000\000\000\000\000\003\000P \032\b at qđˇž\000\000\000\030\000\000\000ô\037\221\000 at 1\221\000Pqđˇ\bR˝ż\207˝\203\000\021\000\000\000\024\000\000\000`\000\000\000Đěa\by.\000\000Pqđˇ8R˝ż\bŻ\031\bČs\001\000\b_ňˇÔ.\000\000č\220a\b\024\000\000"
#3  0x08059f7c in main (argc=0, argv=0xbfbd53c4) at ./main.c:932
         result = <value optimized out>

Thread 10 (process 11989):
#0  0x00fe4410 in __kernel_vsyscall ()
No symbol table info available.
#1  0x009509e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
No symbol table info available.
#2  0x081b90ae in run (uap=0xb7f09008) at task.c:810
No locals.
#3  0x0094c73b in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#4  0x008a1cfe in clone () from /lib/libc.so.6
---Type <return> to continue, or q <return> to quit---
No symbol table info available.

Thread 9 (process 11990):
#0  0x00fe4410 in __kernel_vsyscall ()
No symbol table info available.
#1  0x009509e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
No symbol table info available.
#2  0x081b90ae in run (uap=0xb7f09008) at task.c:810
No locals.
#3  0x0094c73b in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#4  0x008a1cfe in clone () from /lib/libc.so.6
No symbol table info available.

Thread 8 (process 11991):
#0  0x00fe4410 in __kernel_vsyscall ()
No symbol table info available.
#1  0x009509e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
No symbol table info available.
#2  0x081b90ae in run (uap=0xb7f09008) at task.c:810
No locals.
#3  0x0094c73b in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#4  0x008a1cfe in clone () from /lib/libc.so.6
No symbol table info available.

Thread 7 (process 11992):
#0  0x00fe4410 in __kernel_vsyscall ()
---Type <return> to continue, or q <return> to quit---
No symbol table info available.
#1  0x009509e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
No symbol table info available.
#2  0x081b90ae in run (uap=0xb7f09008) at task.c:810
No locals.
#3  0x0094c73b in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#4  0x008a1cfe in clone () from /lib/libc.so.6
No symbol table info available.

Thread 6 (process 11993):
#0  0x00fe4410 in __kernel_vsyscall ()
No symbol table info available.
#1  0x009509e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
No symbol table info available.
#2  0x081b90ae in run (uap=0xb7f09008) at task.c:810
No locals.
#3  0x0094c73b in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#4  0x008a1cfe in clone () from /lib/libc.so.6
No symbol table info available.

Thread 5 (process 11994):
#0  0x00fe4410 in __kernel_vsyscall ()
No symbol table info available.
#1  0x009509e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
No symbol table info available.
#2  0x081b90ae in run (uap=0xb7f09008) at task.c:810
---Type <return> to continue, or q <return> to quit---
No locals.
#3  0x0094c73b in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#4  0x008a1cfe in clone () from /lib/libc.so.6
No symbol table info available.

Thread 4 (process 11996):
#0  0x00fe4410 in __kernel_vsyscall ()
No symbol table info available.
#1  0x00953e38 in sendmsg () from /lib/libpthread.so.0
No symbol table info available.
#2  0x081c5962 in doio_send (sock=0xae734710, dev=0xae035788) at socket.c:1630
         cc = -1282421656
         iov = {{iov_base = 0xafa08cd4, iov_len = 48}, {iov_base = 0xb7f21030, iov_len = 236}, {iov_base = 0x3c0,
     iov_len = 2968577464}, {iov_base = 0xb0f0e5c0, iov_len = 2968577472}, {iov_base = 0x94ea47, iov_len = 0}, {
     iov_base = 0x861a0b0, iov_len = 2927195036}, {iov_base = 0x2edc, iov_len = 140611816}, {iov_base = 0xec,
     iov_len = 0}}
         write_count = 48
         msghdr = {msg_name = 0x0, msg_namelen = 0, msg_iov = 0xb38fcc68, msg_iovlen = 1, msg_control = 0x0,
   msg_controllen = 0, msg_flags = 0}
         addrbuf = "Ě\217ł[Ź\032\bü\220a\b\020GsŽxÍ\217łňB\034\b\030GsŽ4\000\000\000ťś\037\bě\000\000\000 
\214\023\bč\220a\b\bÍ\217ł\021Ř\031\bč\220a\bě\000\000"
         attempts = 0
         send_errno = 0
         strbuf = 
"ŘR\"Ž\037y\n\bŕBôˇ\000\000\000\0000\000\000\000P \032\b8Şŕ\202ž\000\000\0000\000\000\000\233\207\n\b@Ě\217ł\020GsŽ¸Ě\217łţV\033\b@Ě\217ł0\000\000\000Ŕ\000\000\000÷­\n\b\000\000\000\000¨$YŽGę\224\000\000\000\000\000°\224a\bXUCüđ\000\000\000P \032\b\210W\003Žž\000\000\000đ\000\000\000\b\020ňˇ0\020ňˇ°\000\000"
#3  0x081c5f41 in socket_send (sock=0xae734710, dev=0xae035788, task=0xb0f0e5b8, address=0xae79739c, pktinfo=0x0,
---Type <return> to continue, or q <return> to quit---
     flags=0) at socket.c:4291
         io_state = <value optimized out>
         have_lock = isc_boolean_false
         ntask = (isc_task_t *) 0x0
         result = <value optimized out>
#4  0x08136ae5 in resquery_send (query=0xafa08c60) at resolver.c:1921
         fctx = (fetchctx_t *) 0xad698268
         result = <value optimized out>
         qname = (dns_name_t *) 0x0
         qrdataset = (dns_rdataset_t *) 0x0
         r = {base = 0xafa08cd4 "\2076", length = 48}
         res = (dns_resolver_t *) 0xb0f0d008
         task = (isc_task_t *) 0xb0f0e5b8
         socket = (isc_socket_t *) 0xae734710
         tcpbuffer = {magic = 0, base = 0x274, length = 2528, used = 135051914, current = 2909287056,
   active = 2909373308, link = {prev = 0x94ea47, next = 0x0}, mctx = 0x88700f8}
         address = (isc_sockaddr_t *) 0xae79739c
         buffer = (isc_buffer_t *) 0xafa08c98
         ipaddr = {family = 2, type = {in = {s_addr = 4232271192}, in6 = {in6_u = {
         u6_addr8 = "XUCü?\000\000\000\b°đˇxĐ\217ł", u6_addr16 = {21848, 64579, 63, 0, 45064, 47088, 53368, 45967},
         u6_addr32 = {4232271192, 63, 3086004232, 3012546680}}},
     un = 
"XUCü?\000\000\000\b°đˇxĐ\217ł)ĺ\031\b\bĆrŽh\203v­\000\000\000\000˙˙˙˙\220Đ\217ł\bĆrŽř\001\000\000źĐ\217ł\000\000\000\000\bĆrŽŘĐ\217ł8Ń\217ł(Ń\217ł\000\000\000\0008Ń\217łĽÖ\034\b(Ń\217ł\000\000\000\000\001\000\000\000\004Ń\217ł8ĆrŽ\001\000\000\001"}, 
zone = 0}
         tsigkey = (dns_tsigkey_t *) 0x0
         peer = (dns_peer_t *) 0x0
         useedns = 2909373032
         cctx = {magic = 0, allowed = 0, edns = -1, table = {0x0 <repeats 64 times>}, initialnodes = {{r = {
---Type <return> to continue, or q <return> to quit---
         base = 0xafa03320 "\003www\vcorrupttube\003com", length = 21}, offset = 12, count = 0, labels = 4 '\004',
       next = 0x0}, {r = {base = 0xafa03324 "\vcorrupttube\003com", length = 17}, offset = 16, count = 1,
       labels = 3 '\003', next = 0x0}, {r = {base = 0xafa03330 "\003com", length = 5}, offset = 28, count = 2,
       labels = 2 '\002', next = 0x0}, {r = {base = 0x0, length = 0}, offset = 0, count = 0, labels = 0 '\0',
       next = 0xb38fd044}, {r = {base = 0xad616b08 "NbdanSNDř\220a­\023", length = 23}, offset = 53688,
       count = 45967, labels = 244 'ô', next = 0xb38fd168}, {r = {base = 0xad616b0c "nSNDř\220a­\023", length = 28},
       offset = 65433, count = 19265, labels = 1 '\001', next = 0x1}, {r = {base = 0x0, length = 0}, offset = 53144,
       count = 45967, labels = 104 'h', next = 0x0}, {r = {base = 0x4b41ff99 "ŰAK\200ŻFy(ç\020\035ćź\n",
         length = 0}, offset = 27404, count = 44385, labels = 88 'X', next = 0x0}, {r = {
         base = 0xa30f328 "\220\004", length = 9515960}, offset = 21358, count = 17486, labels = 104 'h',
       next = 0x1}, {r = {base = 0x1 <Address 0x1 out of bounds>, length = 1}, offset = 53188, count = 45967,
       labels = 68 'D', next = 0xffffffff}, {r = {base = 0xffffffff <Address 0xffffffff out of bounds>, length = 0},
       offset = 0, count = 0, labels = 0 '\0', next = 0x0}, {r = {base = 0x0, length = 4104}, offset = 53416,
       count = 45967, labels = 8 '\b', next = 0xb7f0c008}, {r = {base = 0xb38fd0a8 "", length = 3012546744},
       offset = 8, count = 0, labels = 8 '\b', next = 0xb38fd0b8}, {r = {
         base = 0x9538a4 "X=\001đ˙˙s\001Ăč\204˘˙˙\201ÁB\207", length = 8}, offset = 36582, count = 131,
       labels = 7 '\a', next = 0xb38fd0a4}, {r = {base = 0x8 <Address 0x8 out of bounds>, length = 135914793},
       offset = 50680, count = 2783, labels = 236 'ě', next = 0x3c0}, {r = {base = 0xb38fd0d8 "ŕ\t",
         length = 2910225256}, offset = 50696, count = 44658, labels = 72 'H', next = 0x81bb002}}, count = 3,
   mctx = 0x86190e8}
         secure_domain = isc_boolean_false
#5  0x08137579 in fctx_query (fctx=0xad698268, addrinfo=0xae797398, options=0) at resolver.c:1505
         res = (dns_resolver_t *) 0xb0f0d008
         task = (isc_task_t *) 0xb0f0e5b8
         result = 2927167124
         query = (resquery_t *) 0x30
         addr = {type = {sa = {sa_family = 61584, sa_data = "5\000i\003\000\000\220\006yŽ\bka­"}, sin = {
       sin_family = 61584, sin_port = 53, sin_addr = {s_addr = 873}, sin_zero = "\220\006yŽ\bka­"}, sin6 = {
---Type <return> to continue, or q <return> to quit---
       sin6_family = 61584, sin6_port = 53, sin6_flowinfo = 873, sin6_addr = {in6_u = {
           u6_addr8 = "\220\006yŽ\bka­HŇ\217ł\000\000\000", u6_addr16 = {1680, 44665, 27400, 44385, 53832, 45967, 0,
             0}, u6_addr32 = {2927167120, 2908842760, 3012547144, 0}}}, sin6_scope_id = 143405224}, sunix = {
       sun_family = 61584,
       sun_path = 
"5\000i\003\000\000\220\006yŽ\bka­HŇ\217ł\000\000\000\000¨0\214\b\231˙AKÂ\005\000\000-o\020\büŇ\217ł\000\000\000\0008Ó\217ł!É\177\000\200 
\221\000Ŕ 
\221\000ô\037\221\000|\204i­\000\000\000\000\fŇ\217łÖĹ\177\000\b#\221\000\004Ň\217łë?\017\017ô\037\221\000\030Ň\217ł\024Ě\177\000\n\r"}}, 
length = 3012547128, link = {prev = 0x81af974,
     next = 0xb38fd2fc}}
         have_addr = isc_boolean_false
         srtt = <value optimized out>
#6  0x08137784 in fctx_try (fctx=0xad698268, retrying=isc_boolean_false) at resolver.c:3013
         result = <value optimized out>
         addrinfo = (dns_adbaddrinfo_t *) 0x0
#7  0x081b9221 in run (uap=0xb7f09008) at task.c:862
No locals.
#8  0x0094c73b in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#9  0x008a1cfe in clone () from /lib/libc.so.6
No symbol table info available.

Thread 3 (process 11997):
#0  0x00fe4410 in __kernel_vsyscall ()
No symbol table info available.
#1  0x00950d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
No symbol table info available.
#2  0x081cda66 in isc_condition_waituntil (c=0xb7f0a040, m=0xb7f0a010, t=0xb7f0a038) at condition.c:59
         presult = <value optimized out>
         result = 0
---Type <return> to continue, or q <return> to quit---
         ts = {tv_sec = 1262616473, tv_nsec = 971230000}
         strbuf = 
"4\000\000\000ŕ\000\000\000ůë\031\bôż\225\000\030Ăď˛Gę\224\000\000\000\000\0000\225a\b\000\000\000\000Ý.\000\000č\220a\b(J.Ž\210äăŽ8Ăď˛\002°\033\bŘäăŽxJ.ŽHĂď˛\002°\033\bÉ\001\000\000\b°đˇxĂď˛Lć\031\b\210äăŽ(J.ŽxĂď˛\bç\031\b\210äăŽHť\005Žťś\037\b\210äăŽ\222\003\000"
#3  0x081bb583 in run (uap=0xb7f0a008) at timer.c:722
         now = {seconds = 1262616473, nanoseconds = 955861000}
         result = <value optimized out>
#4  0x0094c73b in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#5  0x008a1cfe in clone () from /lib/libc.so.6
No symbol table info available.

Thread 2 (process 11998):
#0  0x00fe4410 in __kernel_vsyscall ()
No symbol table info available.
#1  0x008a2376 in epoll_wait () from /lib/libc.so.6
No symbol table info available.
#2  0x081ca494 in watcher (uap=0xb7f0c008) at socket.c:3468
         manager = <value optimized out>
         cc = 1
         strbuf = '\0' <repeats 127 times>
#3  0x0094c73b in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#4  0x008a1cfe in clone () from /lib/libc.so.6
No symbol table info available.

Thread 1 (process 11995):
#0  0x080db986 in ttl_sooner (v1=0x0, v2=0x59375628) at rbtdb.c:752
---Type <return> to continue, or q <return> to quit---
No locals.
#1  0x0819e708 in isc_heap_delete (heap=0xb0f54068, index=1) at heap.c:218
         elt = (void *) 0x0
         less = <value optimized out>
#2  0x080e039f in free_rdataset (rbtdb=0xb0f4f008, mctx=0x864bea0, rdataset=0x59375628) at rbtdb.c:1273
         size = <value optimized out>
         idx = 3
#3  0x080e04c3 in clean_stale_headers (rbtdb=0xb0f4f008, mctx=0x864bea0, top=0x4af6f3e0) at rbtdb.c:1331
         d = (rdatasetheader_t *) 0x4b726d
         down_next = (rdatasetheader_t *) 0x0
#4  0x080e10c4 in decrement_reference (rbtdb=0xb0f4f008, node=0x411b1368, least_serial=0,
     nlock=isc_rwlocktype_read, tlock=isc_rwlocktype_none, pruning=isc_boolean_false) at rbtdb.c:1348
         result = <value optimized out>
         write_locked = <value optimized out>
         nodelock = (rbtdb_nodelock_t *) 0x88099a4
         no_reference = <value optimized out>
#5  0x080ea711 in detachnode (db=0xb0f4f008, targetp=0xb42fe2e4) at rbtdb.c:4877
         rbtdb = (dns_rbtdb_t *) 0x0
         node = (dns_rbtnode_t *) 0x411b1368
         inactive = <value optimized out>
         nodelock = (rbtdb_nodelock_t *) 0x88099a4
#6  0x080ea9b1 in rdataset_disassociate (rdataset=0xb05c5a48) at rbtdb.c:7173
         db = (dns_db_t *) 0xb0f4f008
         node = (dns_dbnode_t *) 0x411b1368
#7  0x0812e55a in dns_rdataset_disassociate (rdataset=0xb05c5a48) at rdataset.c:101
No locals.
#8  0x08132f9e in fctx_destroy (fctx=0xb05c5988) at resolver.c:3081
         res = (dns_resolver_t *) 0xb0f0d008
---Type <return> to continue, or q <return> to quit---
         bucketnum = <value optimized out>
         sa = (isc_sockaddr_t *) 0x0
         next_sa = (isc_sockaddr_t *) 0x0
#9  0x0813548e in fctx_doshutdown (task=0xb0f0ea30, event=0xb05c59e0) at resolver.c:3246
         fctx = (fetchctx_t *) 0xb05c5988
         bucket_empty = <value optimized out>
         res = (dns_resolver_t *) 0xb0f0d008
         bucketnum = 28
         validator = <value optimized out>
#10 0x081b9221 in run (uap=0xb7f09008) at task.c:862
No locals.
#11 0x0094c73b in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#12 0x008a1cfe in clone () from /lib/libc.so.6
No symbol table info available.




JINMEI Tatuya / 神明達哉 wrote:
> At Wed, 30 Dec 2009 10:23:17 +0100,
> Dario Miculinic <dario.miculinic at t-com.hr> wrote:
> 
>> I'm administrating 4 DNS servers running CentOS release 5.4 and Red Hat Enterprise Linux Server release 5.2. with BIND 
>> version 9.6.1-P1. On 3 of them BIND crashed 7 times in last 10 days. There's nothing in log files, but we have core dump 
>> file. I found this in the core dump:
>>
>> #0  0x080db986 in ttl_sooner (v1=0x0, v2=0x3385b628) at rbtdb.c:752
>> 752     ttl_sooner(void *v1, void *v2) {
>> (gdb) where
>> #0  0x080db986 in ttl_sooner (v1=0x0, v2=0x3385b628) at rbtdb.c:752
> 
> What's the result of the following gdb command?
> 
> (gdb) thread apply all bt full
> 
> We've seen crash like this one, but we've not figured out how this
> happens.  This is pretty likely an inter-thread race, and it may be
> tricky.  According to the v1/v2 values in your stack trace, a full
> backtrace with information of other threads may provide more useful
> hint.
> 
> If you need immediate workaround rather than chasing the bug,
> rebuilding named with --disable-atomic may help (we cannot be sure
> because we don't yet know how this bug happens in the first place).
> This will use locks in a more conservative way and may avoid the
> tricky race condition at the cost of lower performance (so if you want
> to try that you'll also need to watch the server load).
> 
> ---
> JINMEI, Tatuya
> Internet Systems Consortium, Inc.
> 




More information about the bind-users mailing list