BIND 9.5.0 Crashes Solaris 9
bsfinkel at anl.gov
bsfinkel at anl.gov
Tue Jul 8 18:41:35 UTC 2008
We are running BIND 9.5.0 on Solaris 9. Every two hours we have a cron
job that does
mv /var/log/named.query.log ...
returncode=$?
/export/home/named/rndc reconfig
to rename the current query.log and run "reconfig" to start a new
query log. In /var/adm/messages I see
Jul 8 11:58:15 oberon named[18737]:
[ID 873579 daemon.info] too many timeouts resolving
'5.224.241.207.sbl.spamhaus.org/TXT' (in 'sbl.spamhaus.org'?):
disabling EDNS
Jul 8 11:58:16 oberon last message repeated 2 times
Jul 8 11:59:01 oberon named[18737]:
[ID 873579 daemon.info] received control channel command 'reconfig'
Jul 8 11:59:01 oberon named[18737]:
[ID 873579 daemon.info] loading configuration from
'/export/home/named.oberon/named.conf.oberon'
Jul 8 11:59:01 oberon named[18737]:
[ID 873579 daemon.info] default max-cache-size (33554432) applies
Jul 8 11:59:01 oberon named[18737]:
[ID 873579 daemon.info] default max-cache-size (33554432) applies:
view _bind
Jul 8 11:59:01 oberon named[18737]:
[ID 873579 daemon.info] reloading configuration succeeded
Jul 8 11:59:01 oberon named[18737]:
[ID 873579 daemon.info] any newly configured zones are now loaded
It appears that at 11:59:01 the "reconfig" has completed successfully.
We have a monitor script that is running. It checks (via "ps -ef")
whether BIND is running. If is sees that BIND is not running, it
starts it. The monitor script sleeps for 60 seconds before its next
check. This monitor script is not running via cron, so I do not know
exactly at what point in the minute it does its checks. A few times
now the monitor script has discovered that BIND was not running;
this has occurred just after a "reconfig" to rename the querylog
and start a new one. What I see in /var/adm/messages after the
Jul 8 11:59:01 "... zones are now loaded" message:
Jul 8 11:59:57 oberon.it.anl.gov named[24233]:
[ID 873579 daemon.notice] starting BIND 9.5.0
-c /export/home/named.oberon/named.conf.oberon
There are no messages in /var/adm/messages that BIND 9.5.0 has crashed.
I looked at the core file (160146520 Jul 8 11:59 core) with
gdb, and it tells me (there are some long lines with hex characters):
oberon.it.anl.gov# /usr/afsws/local/bin/gdb bind/sbin/named core
GNU gdb 6.7.1
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.9"...
warning: Can't read pathname for load map: I/O error.
Reading symbols from /usr/lib/libnsl.so.1...done.
Loaded symbols for /usr/lib/libnsl.so.1
Reading symbols from /usr/lib/libsocket.so.1...done.
Loaded symbols for /usr/lib/libsocket.so.1
Reading symbols from /usr/lib/libpthread.so.1...done.
Loaded symbols for /usr/lib/libpthread.so.1
Reading symbols from /usr/lib/libthread.so.1...done.
Loaded symbols for /usr/lib/libthread.so.1
Reading symbols from /usr/lib/libc.so.1...done.
Loaded symbols for /usr/lib/libc.so.1
Reading symbols from /usr/lib/libdl.so.1...done.
Loaded symbols for /usr/lib/libdl.so.1
Reading symbols from /usr/lib/libmp.so.2...done.
Loaded symbols for /usr/lib/libmp.so.2
Reading symbols from /usr/platform/SUNW,Sun-Fire-V240/lib/libc_psr.so.1...done.
Loaded symbols for /usr/platform/SUNW,Sun-Fire-V240/lib/libc_psr.so.1
warning: Can't read pathname for load map: I/O error.
warning: Can't read pathname for load map: I/O error.
Core was generated by `/export/home/named.oberon/bind/sbin/named -c /export/home/named.oberon/named.co'.
Program terminated with signal 11, Segmentation fault.
#0 0x0007d4d4 in dns_dispatch_detach (dispp=0x1c) at dispatch.c:1931
1931 REQUIRE(dispp != NULL && VALID_DISPATCH(*dispp));
(gdb) where
#0 0x0007d4d4 in dns_dispatch_detach (dispp=0x1c) at dispatch.c:1931
#1 0x00104300 in disppooltimer_update (task=0x900dfd0, event=0x0)
at resolver.c:7548
#2 0x0016ec84 in dispatch (manager=0x1eacb8) at task.c:862
#3 0x0016ee30 in run (uap=0x1eacb8) at task.c:1005
#4 0xff355378 in _lwp_start () from /usr/lib/libthread.so.1
#5 0xff355378 in _lwp_start () from /usr/lib/libthread.so.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) thread apply all bt full
Thread 5 (process 149809 ):
#0 0xff351bb0 in pthread_mutex_lock () from /usr/lib/libthread.so.1
No symbol table info available.
#1 0x000815b8 in dns_iptable_detach (tabp=0x17332e0) at iptable.c:152
tab = (dns_iptable_t *) 0x105cd18
refs = 1773520
#2 0x00069638 in destroy (dacl=0x17332b8) at acl.c:458
i = 0
#3 0x0006977c in dns_acl_detach (aclp=0x205a58) at acl.c:471
acl = (dns_acl_t *) 0x17332b8
refs = 0
#4 0x0011e010 in destroy (view=0x205930) at view.c:295
name = (dns_name_t *) 0x2059e8
i = 256
#5 0x0011b5d8 in req_shutdown (task=0x205970, event=0x0) at view.c:542
view = (dns_view_t *) 0x205930
done = isc_boolean_true
#6 0x0016ec84 in dispatch (manager=0x1eacb8) at task.c:862
dispatch_count = 2
done = isc_boolean_false
requeue = isc_boolean_false
finished = isc_boolean_false
#7 0x0016ee30 in run (uap=0x1eacb8) at task.c:1005
---Type <return> to continue, or q <return> to quit---
No locals.
#8 0xff355378 in _lwp_start () from /usr/lib/libthread.so.1
No symbol table info available.
#9 0xff355378 in _lwp_start () from /usr/lib/libthread.so.1
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 4 (process 84273 ):
#0 0xff21cadc in _libc_sigtimedwait () from /usr/lib/libc.so.1
No symbol table info available.
#1 0xff34e49c in sigwait () from /usr/lib/libthread.so.1
No symbol table info available.
#2 0xff2177c0 in __posix_sigwait () from /usr/lib/libc.so.1
No symbol table info available.
#3 0x00171d50 in isc_app_run () at app.c:503
result = -4195472
event = (isc_event_t *) 0x0
next_event = (isc_event_t *) 0xffbffb70
task = (isc_task_t *) 0x0
sset = {__sigbits = {16387, 0, 0, 0}}
strbuf = "\000\000\b<\000\000\b<\000\000\b<\000\000\000\000ÿÿqqÿÿqqÿÿqqÿÿqq\000\000\000\000\035Íe\000ÿ¿û\030\000\003Qð\000\035\214\000\000\030S¨\000\037\034¸\000\035\214\000\000\030`\000\000\035\216ð", '\0' <repeats 17 times>, "\035\214\000\000\035\217\030\000\000\000\000\000\000\000\005\000\000\000n\000\000\000lÿ¿û\22---Type <return> to continue, or q <return> to quit---
0\000\003V8\000\000\000\000\000\035¶p\000\000\000\000ÿ\024â@"
sig = 1936384
#4 0x0003564c in main (argc=1593344, argv=0x185000) at main.c:879
result = 0
Thread 3 (process 346417 ):
#0 0xff21e23c in _poll () from /usr/lib/libc.so.1
No symbol table info available.
#1 0xff1d24d8 in _select () from /usr/lib/libc.so.1
No symbol table info available.
#2 0xff34e1cc in select () from /usr/lib/libthread.so.1
No symbol table info available.
#3 0x0017cedc in watcher (uap=0x226cc0) at socket.c:2527
done = isc_boolean_false
ctlfd = 8488
cc = 1825792
readfds = {fds_bits = {938475552, -255050557, 98304,
0 <repeats 29 times>}}
writefds = {fds_bits = {0 <repeats 32 times>}}
msg = -2
fd = -1
maxfd = 116
strbuf = '\0' <repeats 127 times>
#4 0xff355378 in _lwp_start () from /usr/lib/libthread.so.1
---Type <return> to continue, or q <return> to quit---
No symbol table info available.
#5 0xff355378 in _lwp_start () from /usr/lib/libthread.so.1
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 2 (process 280881 ):
#0 0xff3554b4 in __lwp_park () from /usr/lib/libthread.so.1
No symbol table info available.
#1 0xff3526c0 in cond_wait_queue () from /usr/lib/libthread.so.1
No symbol table info available.
#2 0xff352c38 in cond_wait_common () from /usr/lib/libthread.so.1
No symbol table info available.
#3 0xff3530c8 in _ti_cond_timedwait () from /usr/lib/libthread.so.1
No symbol table info available.
#4 0xff3530fc in cond_timedwait () from /usr/lib/libthread.so.1
No symbol table info available.
#5 0xff35313c in pthread_cond_timedwait () from /usr/lib/libthread.so.1
No symbol table info available.
#6 0x00181cfc in isc_condition_waituntil (c=0x1eccf0, m=0x1eccc0, t=0x1ecce8)
at condition.c:59
presult = 77
result = 2018544
ts = {tv_sec = 1215536341, tv_nsec = 278829000}
strbuf = "\000\216µí\000\000\000\000Hs Y\004²M\020ÿ\fþÈÿ5.°\000\000\000\---Type <return> to continue, or q <return> to quit---
000\000\000\000\000\000í5\210\000\000\000\001\005;F\210\000\001\000\000\000\000\000\000\000\000\000\001\000\033´\000\000\035¤\000\000\036̸ÿ\fÿ\210\000\033¸`\000\033ºà\000\000\000\000\000\000\000\000ÿ\fÿ(\000\027\016\210\000\036Ìè", '\0' <repeats 12 times>, "ÿ\fÿ(\000\027\0170\000\000\000\000\000\000\000"
#7 0x00170eb0 in run (uap=0x1eccb8) at timer.c:719
now = {seconds = 1215536341, nanoseconds = 78794000}
result = 77
#8 0xff355378 in _lwp_start () from /usr/lib/libthread.so.1
No symbol table info available.
#9 0xff355378 in _lwp_start () from /usr/lib/libthread.so.1
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 1 (process 215345 ):
#0 0x0007d4d4 in dns_dispatch_detach (dispp=0x1c) at dispatch.c:1931
killit = 28
#1 0x00104300 in disppooltimer_update (task=0x900dfd0, event=0x0)
at resolver.c:7548
res = (dns_resolver_t *) 0x7aa440
addr4 = {type = {sa = {sa_family = 2,
sa_data = '\0' <repeats 13 times>}, sin = {sin_family = 2, sin_port = 0,
sin_addr = {S_un = {S_un_b = {s_b1 = 0 '\0', s_b2 = 0 '\0',
s_b3 = 0 '\0', s_b4 = 0 '\0'}, S_un_w = {s_w1 = 0, s_w2 = 0},
S_addr = 0}}, sin_zero = "\000\000\000\000\000\000\000"}, sin6 = {
---Type <return> to continue, or q <return> to quit---
sin6_family = 2, sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {
_S6_un = {_S6_u8 = '\0' <repeats 15 times>, _S6_u32 = {0, 0, 0, 0},
__S6_align = 0}}, sin6_scope_id = 0, __sin6_src_id = 0}, sunix = {
sun_family = 2, sun_path = '\0' <repeats 107 times>}}, length = 16,
link = {prev = 0xffffffff, next = 0xffffffff}}
addr6 = {type = {sa = {sa_family = 26,
sa_data = '\0' <repeats 13 times>}, sin = {sin_family = 26,
sin_port = 0, sin_addr = {S_un = {S_un_b = {s_b1 = 0 '\0',
s_b2 = 0 '\0', s_b3 = 0 '\0', s_b4 = 0 '\0'}, S_un_w = {s_w1 = 0,
s_w2 = 0}, S_addr = 0}},
sin_zero = "\000\000\000\000\000\000\000"}, sin6 = {sin6_family = 26,
sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {_S6_un = {
_S6_u8 = '\0' <repeats 15 times>, _S6_u32 = {0, 0, 0, 0},
__S6_align = 0}}, sin6_scope_id = 0, __sin6_src_id = 0}, sunix = {
sun_family = 26, sun_path = '\0' <repeats 107 times>}}, length = 32,
link = {prev = 0xffffffff, next = 0xffffffff}}
disp4 = (dns_dispatch_t *) 0x72cbe58
disp6 = (dns_dispatch_t *) 0x7aeb80
result = 1933312
nxt = 7
attrs = 1933312
attrmask = 28
#2 0x0016ec84 in dispatch (manager=0x1eacb8) at task.c:862
dispatch_count = 0
---Type <return> to continue, or q <return> to quit---
done = isc_boolean_false
requeue = isc_boolean_false
finished = isc_boolean_false
#3 0x0016ee30 in run (uap=0x1eacb8) at task.c:1005
No locals.
#4 0xff355378 in _lwp_start () from /usr/lib/libthread.so.1
No symbol table info available.
#5 0xff355378 in _lwp_start () from /usr/lib/libthread.so.1
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) quit
Is there any other information that I can provide to debug this?
Thanks.
----------------------------------------------------------------------
Barry S. Finkel
Computing and Information Systems Division
Argonne National Laboratory Phone: +1 (630) 252-7277
9700 South Cass Avenue Facsimile:+1 (630) 252-4601
Building 222, Room D209 Internet: BSFinkel at anl.gov
Argonne, IL 60439-4828 IBMMAIL: I1004994
More information about the bind-users
mailing list