debugging bind9 debian (1:9.5.1.dfsg.P3-1+lenny1) hangs

Sven Eschenberg sven at whgl.uni-frankfurt.de
Fri Jan 8 05:45:25 UTC 2010


Thanks Nick for the 'insight' of epoll.

Usually I'd expect an application to have a 'knowledge' of which FS were 
added to a set for watching. In such a case I would not really expect 
any return values indicating that one tried to remove an FD, that was 
never added.
Could this indicate a mismanagement of the FD sets?

Regards

-Sven

P.S.: I am aware this is not a fatal error in any means, but as you 
pointed out, it creates unnecessary stress on the system.

Nick Jones schrieb:
> On Wed, 2010-01-06 at 19:47 -0800, Mike wrote:
>> Can someone help me understand? Here is a snippet from the strace output:
>>
>>
>> 19120 <... futex resumed> )             = 1
>> 19120 epoll_ctl(8, EPOLL_CTL_DEL, 517, {EPOLLIN, {u32=517, u64=517}}) = 0
> For EPOLL_CTL_DEL, the last argument, the epoll_event structure, is
> ignored, thus needn't be populated with the fd.  For safety it should be
> memset(..., 0, sizeof(struct epoll_event));  According to the man page,
> for kernels 2.6.9 and onward, this structure can be NULL, but for an
> application not tied to a particular platform there is no certainty that
> this will be run on such a kernel, so the bind developers have rightly
> used a valid structure, but again, probably should be memset 0.
> 
>> 19120 epoll_wait(8, {{EPOLLIN, {u32=5, u64=13240600445115695109}}}, 64, 
>> -1) = 1
> The fact that the u64 field is such a wild value suggests that the
> epoll_event structure used when doing epoll_ctl(..., EPOLL_CTL_ADD, ...)
> was not memset 0, it really should be.
> 
>> 19120 read(5, "\5\2\0\0\375\377\377\377"..., 8) = 8
>> 19120 epoll_ctl(8, EPOLL_CTL_DEL, 517, {EPOLLIN, {u32=517, u64=517}}) = 
>> -1 ENOENT (No such file or directory)
>> 19120 epoll_ctl(8, EPOLL_CTL_DEL, 517, {EPOLLOUT, {u32=517, u64=517}}) = 
>> -1 ENOENT (No such file or directory)
> We are calling EPOLL_CTL_DEL again on fd 517, we have already done that.
> In addition, we are calling it with differing event flags, which is not
> necessary, as EPOLL_CTL_DEL deletes the fd entirely.
> 
>> 19120 read(5, "\5\2\0\0\373\377\377\377"..., 8) = 8
>> 19120 epoll_ctl(8, EPOLL_CTL_DEL, 517, {EPOLLIN, {u32=517, u64=517}}) = 
>> -1 ENOENT (No such file or directory)
>> 19120 epoll_ctl(8, EPOLL_CTL_DEL, 517, {EPOLLOUT, {u32=517, u64=517}}) = 
>> -1 ENOENT (No such file or directory)
> And again... all these system calls add stress to the system
> 
>> 19120 close(517)                        = 0
> Now 517 is closed, return value of 0 implies it was a valid fd
> 
>> 19120 read(5, 0xb64cb28c, 8)            = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> 19120 epoll_wait(8, {{EPOLLIN, {u32=520, u64=2233382994440}}}, 64, -1) = 1
> Again, looks like the epoll_event structure wasn't memset 0 before fd
> 520 was EPOLL_CTL_ADD'd.  It's a bit scary that we are getting a read on
> fd 520, but the maxevents value is 64.  I've never tested this
> behaviour, but I would hope the kernel would round robin notifications
> on the set of fds for the process, not simply report events on the first
> 64 fds in numerical order.
> 
>> 19120 futex(0xb74d2048, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0xb74d2044, 
>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
>> 19118 <... futex resumed> )             = 0
>> 19118 futex(0xb74d2010, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
>> 19120 <... futex resumed> )             = 1
>> 19120 futex(0xb74d2010, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>> 19118 <... futex resumed> )             = 0
>> 19118 futex(0xb74d2010, FUTEX_WAKE_PRIVATE, 1) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 futex(0xb5b5ac70, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
>> 19120 <... futex resumed> )             = 1
>> 19120 futex(0xb5b5ac70, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>> 19118 <... futex resumed> )             = 0
>> 19118 recvmsg(520, {msg_name(16)={sa_family=AF_INET, sin_port=htons(53), 
>> sin_addr=inet_addr("213.130.47.22")}, 
>> msg_iov(1)=[{"\345\23\204\20\0\1\0\1\0\5\0\v\5auth5\2ns\3gin\3ntt\3net"..., 
>> 4096}], msg_controllen=20, {cmsg_len=20, cmsg_level=SOL_SOCKET, 
>> cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 402
>> 19118 futex(0xb5b5ac70, FUTEX_WAKE_PRIVATE, 1) = 0
>> 19118 recvmsg(520, 0xb74ccf54, 0)       = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> 19118 write(7, "\10\2\0\0\375\377\377\377"..., 8) = 8
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 futex(0xb8e61e90, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
>> 19120 <... futex resumed> )             = 1
>> 19120 futex(0xb8e61e90, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>> 19118 <... futex resumed> )             = 0
>> 19118 futex(0xb8e61e90, FUTEX_WAKE_PRIVATE, 1) = 0
>> 19118 write(7, "\10\2\0\0\373\377\377\377"..., 8) = 8
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 futex(0xb74d2048, FUTEX_WAIT_PRIVATE, 85, NULL <unfinished ...>
>> 19120 <... futex resumed> )             = 1
>> 19120 epoll_ctl(8, EPOLL_CTL_DEL, 520, {EPOLLIN, {u32=520, u64=520}}) = 0
>> 19120 epoll_wait(8, {{EPOLLIN, {u32=5, u64=13240600445115695109}}}, 64, 
>> -1) = 1
>> 19120 read(5, "\10\2\0\0\375\377\377\377"..., 8) = 8
>> 19120 epoll_ctl(8, EPOLL_CTL_DEL, 520, {EPOLLIN, {u32=520, u64=520}}) = 
>> -1 ENOENT (No such file or directory)
>> 19120 epoll_ctl(8, EPOLL_CTL_DEL, 520, {EPOLLOUT, {u32=520, u64=520}}) = 
>> -1 ENOENT (No such file or directory)
>> 19120 read(5, "\10\2\0\0\373\377\377\377"..., 8) = 8
>> 19120 epoll_ctl(8, EPOLL_CTL_DEL, 520, {EPOLLIN, {u32=520, u64=520}}) = 
>> -1 ENOENT (No such file or directory)
>> 19120 epoll_ctl(8, EPOLL_CTL_DEL, 520, {EPOLLOUT, {u32=520, u64=520}}) = 
>> -1 ENOENT (No such file or directory)
> The same pattern again of EPOLL_CTL_DEL mutiple times.  I wonder if the
> goal was to manage registration of read and write flags separately?
> With epoll, I've found that one has to maintain a record of what the
> user of the fd requests (read/write) and what has already been set in
> previous calls to epoll_ctl, if there is a difference, then use
> EPOLL_CTL_MOD or EPOLL_CTL_DEL
> 
>> 19120 close(520)                        = 0
>> 19120 read(5, 0xb64cb28c, 8)            = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> 19120 epoll_wait(8,  <unfinished ...>
>> 19119 <... futex resumed> )             = -1 ETIMEDOUT (Connection timed 
>> out)
>> 19119 gettimeofday({1262638342, 857733}, NULL) = 0
>> 19119 futex(0xb74d2048, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0xb74d2044, 
>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
>> 19118 <... futex resumed> )             = 0
>> 19118 futex(0xb74d2010, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
>> 19119 <... futex resumed> )             = 1
>> 19119 futex(0xb74d2010, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>> 19118 <... futex resumed> )             = 0
>> 19118 futex(0xb74d2010, FUTEX_WAKE_PRIVATE, 1) = 0
>> 19118 gettimeofday({1262638342, 857733}, NULL) = 0
>> 19118 futex(0xb74d4010, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
>> 19119 <... futex resumed> )             = 1
>> 19119 futex(0xb74d4010, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>> 19118 <... futex resumed> )             = 0
>> 19118 futex(0xb74d4040, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
>> 19119 <... futex resumed> )             = 1
>> 19119 clock_gettime(CLOCK_REALTIME, {1262638342, 857733237}) = 0
>> 19119 futex(0xb74d4040, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>> 19118 <... futex resumed> )             = 0
>> 19118 futex(0xb74d4044, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0xb74d4040, 
>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 0
>> 19118 futex(0xb74d4010, FUTEX_WAKE_PRIVATE, 1) = 0
>> 19118 gettimeofday({1262638342, 857733}, NULL) = 0
>> 19118 read(9, 0xb74cc958, 40)           = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40)           = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40)           = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40)           = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40)           = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40)           = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40)           = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40)           = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40)           = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40)           = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40)           = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40)           = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40)           = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> 19118 socket(PF_INET6, SOCK_DGRAM, IPPROTO_UDP) = 6
>> 19118 fcntl64(6, F_DUPFD, 512)          = 515
> fd 515 begins its life here...
 > [cropped]
> > _______________________________________________
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users




More information about the bind-users mailing list