x86 Linux bind9 -futex take much sys CPU time and cause errors
郑中华
yakut at pku.edu.cn
Tue Jan 18 05:21:34 UTC 2005
Hi list,
I use one x86 4 cpu machine and Linux for bind9 testing.
In a bind9 performance testing I monitored one of named threads with
oprofile and strace for a while, and got below message:
CPU: P4 / Xeon with 2 hyper-threads, speed 2995.29 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not
stopped) with a unit mask of 0x01 (mandatory) count 100000
samples % app name symbol name
6228485 39.6674 libdns.so.16.0.0 (no symbols)
2569301 16.3631 tg3 (no symbols)
1851630 11.7925 libisc.so.7.1.5 (no symbols)
1554621 9.9009 libpthread-2.3.4.so pthread_mutex_lock
1481064 9.4325 named (no symbols)
902274 5.7463 libpthread-2.3.4.so pthread_mutex_unlock
281170 1.7907 ld-2.3.4.so anonymous symbol from
section .text
188602 1.2012 oprofiled (no symbols)
148667 0.9468 libc-2.3.4.so memcpy
89062 0.5672 oprofile (no symbols)
88651 0.5646 libpthread-2.3.4.so __lll_mutex_lock_wait
47170 0.3004 libpthread-2.3.4.so
__pthread_disable_asynccancel
43055 0.2742 libpthread-2.3.4.so __pthread_enable_asynccancel
30374 0.1934 libpthread-2.3.4.so __errno_location
27777 0.1769 libpthread-2.3.4.so sendmsg
25916 0.1651 libpthread-2.3.4.so __i686.get_pc_thunk.bx
19277 0.1228 libc-2.3.4.so gettimeofday
# strace -p 1797 -c
Process 1797 attached - interrupt to quit
Process 1797 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
69.11 5.681319 13 442646 137491 futex
19.30 1.586145 13 118467 sendmsg
10.86 0.892565 8 118579 72 recvmsg
0.72 0.059597 3 23725 gettimeofday
0.01 0.000738 15 48 write
------ ----------- ----------- --------- --------- ----------------
100.00 8.220364 703465 137563 total
I traced down the futex errors:
There're many sth. like 'futex(0x818f1bc, FUTEX_WAIT, 585151, NULL) = -
1 EAGAIN (Resource temporarily unavailable)'.
At this time the CPU utilization is:
CPU states: cpu user nice system irq softirq iowait
idle
total 39.1% 0.0% 39.1% 1.0% 15.9% 0.0%
4.7%
cpu00 30.5% 0.0% 20.3% 4.2% 44.9% 0.0%
0.0%
cpu01 39.8% 0.0% 48.3% 0.0% 5.9% 0.0%
5.9%
cpu02 45.1% 0.0% 41.7% 0.0% 5.9% 0.0%
7.2%
cpu03 41.1% 0.0% 46.1% 0.0% 6.7% 0.0%
5.9%
Mem: 7997692k av, 298108k used, 7699584k free, 0k shrd,
11956k buff
49004k active, 190420k inactive
Swap: 2096472k av, 0k used, 2096472k free
227280k cached
I cannot think out what named threads are competing for, one
possibility is data cache, but I have a 100k records queryperf input
file and another 100k records domain data file, this might not make
sense, because in such read operation, I don't believe that one named
thread will lock all data cache; another concern is network, named use
udp for communication, at the time of running, netstat shows below:
Proto Recv-Q Send-Q Local Address Foreign Address
udp 4736 296 10.101.3.103:domain
*:*
udp 3552 296 10.101.2.103:domain
*:*
udp 4440 0 10.101.1.103:domain
*:*
udp 4440 0 10.100.0.19:domain *:*
I've tried both broadcom and intel 1000m card, nothing different. I've
also tried bind 9.2.4, rhel3, rhel4, sles9,sle8, nothing different. Do
you have some suggestions ?
thx.
More information about the bind-users
mailing list