Bind server crashing (lots of EAGAIN, ENOENT, ...). With strace log.

K L kl.forwarder at gmail.com
Thu Nov 14 12:04:41 UTC 2013


Found the problem.

According to the strace log, named was sending logging to syslog. This
couldn't be delivered somehow (have not investigated why). When I changed
the default logging channel to a local file, named started working properly
again. Diff:

===================================================================
--- named.conf.erb (revision 2263)
+++ named.conf.erb (working copy)
@@ -50,11 +50,16 @@
};

 -logging {
- channel queries_syslog {
- syslog daemon;
+logging{
+ channel bindlog {
+ file "/var/log/named/bind.log" versions 3 size 5m;
severity info;
+ print-time yes;
+ print-severity yes;
+ print-category yes;
};
- category queries { queries_syslog; };
+ category default{
+ bindlog;
+ };
};

---------
It is working for me now.


On Tue, Nov 5, 2013 at 1:31 PM, K L <kl.forwarder at gmail.com> wrote:

> All,
>
> I am hoping you can help me. I had working DNS servers, now my internal
> master server stopped. Restarting takes +1min. I have reinstalled it,
> rebooted the machine, that did not help. Server has 3 (virtual) cores and
> does not swap when the 'crash' happens.
>
> What I mean by crash: the process is still running, but the server is not
> responding to queries. Even a `/etc/init.d/named status` takes 28 - 60
> seconds.
>
> Here is a strace log from when it happens:
> http://pastebin.com/raw.php?i=7i0PgALG . Example:
> 6500 recvmsg(518, {msg_name(16)={sa_family=AF_INET, sin_port=htons(53),
> sin_addr=inet_addr("10.0.101.50")},
> msg_iov(1)=[{"~\223\201\200\0\1\0\1\0\5\0\6\3ns3\5cymru\3com\0\0\1\0\1\300"...,
> 4096}], msg_controllen=32, {cmsg_len=32, cmsg_level=SOL_SOCKET,
> cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 252
> 6500 recvmsg(518, 0x7fd4b6588900, 0) = -1 EAGAIN (Resource temporarily
> unavailable)
>
> I am not a C programmer, but from this, what I think I see is a packet is
> being delivered to named, and that fails.
>
> What could the problem be? Is this a bind problem? OS/System problem maybe?
> I don't recall any (kernel) parameters since it worked.
>
> Regards,
> kl
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20131114/96a4ec5b/attachment.html>


More information about the bind-users mailing list