debugging bind9 debian (1:9.5.1.dfsg.P3-1+lenny1) hangs
Sven Eschenberg
sven at whgl.uni-frankfurt.de
Fri Jan 8 05:45:25 UTC 2010
Thanks Nick for the 'insight' of epoll.
Usually I'd expect an application to have a 'knowledge' of which FS were
added to a set for watching. In such a case I would not really expect
any return values indicating that one tried to remove an FD, that was
never added.
Could this indicate a mismanagement of the FD sets?
Regards
-Sven
P.S.: I am aware this is not a fatal error in any means, but as you
pointed out, it creates unnecessary stress on the system.
Nick Jones schrieb:
> On Wed, 2010-01-06 at 19:47 -0800, Mike wrote:
>> Can someone help me understand? Here is a snippet from the strace output:
>>
>>
>> 19120 <... futex resumed> ) = 1
>> 19120 epoll_ctl(8, EPOLL_CTL_DEL, 517, {EPOLLIN, {u32=517, u64=517}}) = 0
> For EPOLL_CTL_DEL, the last argument, the epoll_event structure, is
> ignored, thus needn't be populated with the fd. For safety it should be
> memset(..., 0, sizeof(struct epoll_event)); According to the man page,
> for kernels 2.6.9 and onward, this structure can be NULL, but for an
> application not tied to a particular platform there is no certainty that
> this will be run on such a kernel, so the bind developers have rightly
> used a valid structure, but again, probably should be memset 0.
>
>> 19120 epoll_wait(8, {{EPOLLIN, {u32=5, u64=13240600445115695109}}}, 64,
>> -1) = 1
> The fact that the u64 field is such a wild value suggests that the
> epoll_event structure used when doing epoll_ctl(..., EPOLL_CTL_ADD, ...)
> was not memset 0, it really should be.
>
>> 19120 read(5, "\5\2\0\0\375\377\377\377"..., 8) = 8
>> 19120 epoll_ctl(8, EPOLL_CTL_DEL, 517, {EPOLLIN, {u32=517, u64=517}}) =
>> -1 ENOENT (No such file or directory)
>> 19120 epoll_ctl(8, EPOLL_CTL_DEL, 517, {EPOLLOUT, {u32=517, u64=517}}) =
>> -1 ENOENT (No such file or directory)
> We are calling EPOLL_CTL_DEL again on fd 517, we have already done that.
> In addition, we are calling it with differing event flags, which is not
> necessary, as EPOLL_CTL_DEL deletes the fd entirely.
>
>> 19120 read(5, "\5\2\0\0\373\377\377\377"..., 8) = 8
>> 19120 epoll_ctl(8, EPOLL_CTL_DEL, 517, {EPOLLIN, {u32=517, u64=517}}) =
>> -1 ENOENT (No such file or directory)
>> 19120 epoll_ctl(8, EPOLL_CTL_DEL, 517, {EPOLLOUT, {u32=517, u64=517}}) =
>> -1 ENOENT (No such file or directory)
> And again... all these system calls add stress to the system
>
>> 19120 close(517) = 0
> Now 517 is closed, return value of 0 implies it was a valid fd
>
>> 19120 read(5, 0xb64cb28c, 8) = -1 EAGAIN (Resource
>> temporarily unavailable)
>> 19120 epoll_wait(8, {{EPOLLIN, {u32=520, u64=2233382994440}}}, 64, -1) = 1
> Again, looks like the epoll_event structure wasn't memset 0 before fd
> 520 was EPOLL_CTL_ADD'd. It's a bit scary that we are getting a read on
> fd 520, but the maxevents value is 64. I've never tested this
> behaviour, but I would hope the kernel would round robin notifications
> on the set of fds for the process, not simply report events on the first
> 64 fds in numerical order.
>
>> 19120 futex(0xb74d2048, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0xb74d2044,
>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
>> 19118 <... futex resumed> ) = 0
>> 19118 futex(0xb74d2010, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
>> 19120 <... futex resumed> ) = 1
>> 19120 futex(0xb74d2010, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>> 19118 <... futex resumed> ) = 0
>> 19118 futex(0xb74d2010, FUTEX_WAKE_PRIVATE, 1) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 futex(0xb5b5ac70, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
>> 19120 <... futex resumed> ) = 1
>> 19120 futex(0xb5b5ac70, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>> 19118 <... futex resumed> ) = 0
>> 19118 recvmsg(520, {msg_name(16)={sa_family=AF_INET, sin_port=htons(53),
>> sin_addr=inet_addr("213.130.47.22")},
>> msg_iov(1)=[{"\345\23\204\20\0\1\0\1\0\5\0\v\5auth5\2ns\3gin\3ntt\3net"...,
>> 4096}], msg_controllen=20, {cmsg_len=20, cmsg_level=SOL_SOCKET,
>> cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 402
>> 19118 futex(0xb5b5ac70, FUTEX_WAKE_PRIVATE, 1) = 0
>> 19118 recvmsg(520, 0xb74ccf54, 0) = -1 EAGAIN (Resource
>> temporarily unavailable)
>> 19118 write(7, "\10\2\0\0\375\377\377\377"..., 8) = 8
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 futex(0xb8e61e90, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
>> 19120 <... futex resumed> ) = 1
>> 19120 futex(0xb8e61e90, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>> 19118 <... futex resumed> ) = 0
>> 19118 futex(0xb8e61e90, FUTEX_WAKE_PRIVATE, 1) = 0
>> 19118 write(7, "\10\2\0\0\373\377\377\377"..., 8) = 8
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 gettimeofday({1262638342, 657830}, NULL) = 0
>> 19118 futex(0xb74d2048, FUTEX_WAIT_PRIVATE, 85, NULL <unfinished ...>
>> 19120 <... futex resumed> ) = 1
>> 19120 epoll_ctl(8, EPOLL_CTL_DEL, 520, {EPOLLIN, {u32=520, u64=520}}) = 0
>> 19120 epoll_wait(8, {{EPOLLIN, {u32=5, u64=13240600445115695109}}}, 64,
>> -1) = 1
>> 19120 read(5, "\10\2\0\0\375\377\377\377"..., 8) = 8
>> 19120 epoll_ctl(8, EPOLL_CTL_DEL, 520, {EPOLLIN, {u32=520, u64=520}}) =
>> -1 ENOENT (No such file or directory)
>> 19120 epoll_ctl(8, EPOLL_CTL_DEL, 520, {EPOLLOUT, {u32=520, u64=520}}) =
>> -1 ENOENT (No such file or directory)
>> 19120 read(5, "\10\2\0\0\373\377\377\377"..., 8) = 8
>> 19120 epoll_ctl(8, EPOLL_CTL_DEL, 520, {EPOLLIN, {u32=520, u64=520}}) =
>> -1 ENOENT (No such file or directory)
>> 19120 epoll_ctl(8, EPOLL_CTL_DEL, 520, {EPOLLOUT, {u32=520, u64=520}}) =
>> -1 ENOENT (No such file or directory)
> The same pattern again of EPOLL_CTL_DEL mutiple times. I wonder if the
> goal was to manage registration of read and write flags separately?
> With epoll, I've found that one has to maintain a record of what the
> user of the fd requests (read/write) and what has already been set in
> previous calls to epoll_ctl, if there is a difference, then use
> EPOLL_CTL_MOD or EPOLL_CTL_DEL
>
>> 19120 close(520) = 0
>> 19120 read(5, 0xb64cb28c, 8) = -1 EAGAIN (Resource
>> temporarily unavailable)
>> 19120 epoll_wait(8, <unfinished ...>
>> 19119 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed
>> out)
>> 19119 gettimeofday({1262638342, 857733}, NULL) = 0
>> 19119 futex(0xb74d2048, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0xb74d2044,
>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
>> 19118 <... futex resumed> ) = 0
>> 19118 futex(0xb74d2010, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
>> 19119 <... futex resumed> ) = 1
>> 19119 futex(0xb74d2010, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>> 19118 <... futex resumed> ) = 0
>> 19118 futex(0xb74d2010, FUTEX_WAKE_PRIVATE, 1) = 0
>> 19118 gettimeofday({1262638342, 857733}, NULL) = 0
>> 19118 futex(0xb74d4010, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
>> 19119 <... futex resumed> ) = 1
>> 19119 futex(0xb74d4010, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>> 19118 <... futex resumed> ) = 0
>> 19118 futex(0xb74d4040, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
>> 19119 <... futex resumed> ) = 1
>> 19119 clock_gettime(CLOCK_REALTIME, {1262638342, 857733237}) = 0
>> 19119 futex(0xb74d4040, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>> 19118 <... futex resumed> ) = 0
>> 19118 futex(0xb74d4044, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0xb74d4040,
>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 0
>> 19118 futex(0xb74d4010, FUTEX_WAKE_PRIVATE, 1) = 0
>> 19118 gettimeofday({1262638342, 857733}, NULL) = 0
>> 19118 read(9, 0xb74cc958, 40) = -1 EAGAIN (Resource
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40) = -1 EAGAIN (Resource
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40) = -1 EAGAIN (Resource
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40) = -1 EAGAIN (Resource
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40) = -1 EAGAIN (Resource
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40) = -1 EAGAIN (Resource
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40) = -1 EAGAIN (Resource
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40) = -1 EAGAIN (Resource
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40) = -1 EAGAIN (Resource
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40) = -1 EAGAIN (Resource
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40) = -1 EAGAIN (Resource
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40) = -1 EAGAIN (Resource
>> temporarily unavailable)
>> 19118 read(9, 0xb74cc958, 40) = -1 EAGAIN (Resource
>> temporarily unavailable)
>> 19118 socket(PF_INET6, SOCK_DGRAM, IPPROTO_UDP) = 6
>> 19118 fcntl64(6, F_DUPFD, 512) = 515
> fd 515 begins its life here...
> [cropped]
> > _______________________________________________
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
More information about the bind-users
mailing list