Ongoing CPU usage issues...
Kelsey Cummings
kgc at sonic.net
Mon Apr 25 17:09:02 UTC 2005
All three of my primary name servers went into the CPU peg state overnight.
I wasn't really prepared to get detailed debugging information from them
while in the semi-broken state but I did grab an strace -c from one of the
servers while it was broken and then again after I restarted it. I
couldn't let it run too long and this information doesn't mean to much to
me but maybe it'll help someone else.
While one of the threads was maxing the CPU (I really should have let this
run longer.)
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
90.65 10.071460 2643 3811 3811 rt_sigsuspend
4.58 0.509362 48 10517 sendmsg
1.96 0.218140 5 47573 gettimeofday
1.75 0.194695 32 6178 write
0.86 0.095313 6 16498 6172 recvmsg
0.08 0.008782 2 3812 rt_sigprocmask
0.07 0.007632 2 3811 3811 sigreturn
0.02 0.002156 20 106 53 utime
0.01 0.001192 99 12 send
0.01 0.001108 17 65 kill
0.00 0.000047 2 24 rt_sigaction
0.00 0.000046 15 3 accept
0.00 0.000023 2 12 time
0.00 0.000020 2 12 getpid
0.00 0.000020 2 9 fcntl64
0.00 0.000007 2 3 close
------ ----------- ----------- --------- --------- ----------------
100.00 11.110003 92446 13847 total
real 0m45.105s
user 0m0.760s
sys 0m1.680s
After restarting bind:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
95.63 67.550079 2598 25996 25996 rt_sigsuspend
2.22 1.571516 49 31969 sendmsg
1.05 0.742186 26 28402 write
0.45 0.317469 2 199020 gettimeofday
0.33 0.232188 4 59225 27813 recvmsg
0.08 0.055330 1581 35 fsync
0.07 0.046812 2 25997 rt_sigprocmask
0.07 0.046250 2 25997 25997 sigreturn
0.04 0.029707 11 2710 1355 utime
0.03 0.020598 108 190 send
0.02 0.011762 21 548 kill
0.00 0.003050 87 35 rename
0.00 0.002125 61 35 open
0.00 0.001430 40 36 36 connect
0.00 0.000723 2 380 rt_sigaction
0.00 0.000723 2 296 fcntl64
0.00 0.000610 3 233 brk
0.00 0.000604 5 122 close
0.00 0.000571 16 36 socket
0.00 0.000426 8 51 accept
0.00 0.000309 2 190 time
0.00 0.000301 9 35 old_mmap
0.00 0.000268 1 190 getpid
0.00 0.000257 7 35 munmap
0.00 0.000216 6 36 bind
0.00 0.000157 4 35 35 rmdir
0.00 0.000096 3 36 getsockopt
0.00 0.000087 2 35 getsockname
0.00 0.000086 2 36 setsockopt
0.00 0.000083 2 35 _llseek
0.00 0.000080 2 35 fstat64
------ ----------- ----------- --------- --------- ----------------
100.00 70.636099 402011 81232 total
real 2m18.274s
user 0m3.180s
sys 0m6.060s
If someone posts some clear instructions on what steps to take to gather
the needed information to help track down this bug I'll do my best to
follow them next time my bind boxes act up.
Anyone having these problems on an OS besides Linux or FreeBSD? Maybe it's
a gcc/glibc or pthreads problem? Does ISC or anyone else recomend a
specific compiler/library version for compiling bind? Perhaps versions not
to use?
--
Kelsey Cummings - kgc at sonic.net sonic.net, inc.
System Architect 2260 Apollo Way
707.522.1000 (Voice) Santa Rosa, CA 95407
707.547.2199 (Fax) http://www.sonic.net/
Fingerprint = D5F9 667F 5D32 7347 0B79 8DB7 2B42 86B6 4E2C 3896
More information about the bind-users
mailing list