Bind 9.11/RHEL7 Server Freezes FUTEX_WAKE_PRIVATE

Mon Aug 1 22:21:04 UTC 2022

Hi Peter.
Off the top of my head, could it be this?

random-device

The source of entropy to be used by the server. Entropy is primarily needed
for DNSSEC operations, such as TKEY transactions and dynamic update of
signed zones. This options specifies the device (or file) from which to
read entropy. If this is a file, operations re- quiring entropy will fail
when the file has been exhausted. If not specified, the default value
is /dev/random
(or equivalent) when present, and none otherwise. The random- device option
takes effect during the initial configuration load at server startup time
and is ignored on subsequent reloads.

BIND will need a good source of randomness for crypto operations.

Cheers, Greg

On Mon, 1 Aug 2022 at 23:08, White, Peter <pwhite at penguinrandomhouse.com>
wrote:

> I’m running BIND 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.9 (Extended Support
> Version) on RHEL 7 in a chroot jail.
>
>
>
> As of late, at times running some rndc commands are causing my server to
> lock up. It’s usually an “rndc addzone” that triggers the issue. I’ll also
> mention that I have recently started signing some domains with DNSSEC, so I
> suspect it may be somehow related.
>
>
>
> Here is an example of a command that frequently triggers my issue,
> although it doesn’t trigger it every time.
>
>
>
> rndc addzone '"example.com" in external {type master; file "dnssec/
> example.com";key-directory "keys"; auto-dnssec maintain; inline-signing
> yes;};'
>
>
>
> During these times, named will not respond to any rndc commands, nothing
> is logged to the bind logs (I’m running trace level 3 ), and will not
> answer queries. Everything seems just frozen in time. Waiting for a period
> of time, varying from a few seconds to many minutes, the server picks back
> up again and operates normally. The following are my observations to this
> point.
>
>
>
> CPU and memory show as being fine.
>
>
>
> top - 17:57:37 up 33 min,  3 users,  load average: 0.00, 0.01, 0.05
>
> Tasks:* 125 *total,   *2 *running,* 123 *sleeping,   *0 *stopped,   *0 *
> zombie
>
> %Cpu(s):  *0.2 *us,  *0.3 *sy,  *0.0 *ni,* 98.5 *id,  *0.0 *wa,  *0.0 *hi,
> *0.0 *si,  *1.0 *s
>
> KiB Mem :  *1842956 *total,   *439452 *free,   *665760 *used,   *737744 *
> buff/cache
>
> KiB Swap:  *8384508 *total,  *8384508 *free,        *0 *used.  *1013652 *avail
> Mem
>
>
>
> Strace shows the following over and over again.
>
>
>
> strace -p 1156 -f
>
>
>
> [pid  1159] futex(0x7fc1c15a307c,
> FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 16657, {tv_sec=1659390139,
> tv_nsec=255860000}, 0xffffffff) = -1 ETIMEDOUT (Connection timed out)
>
>
>
> Any pointers here would be greatly appreciated. I’m about at my wits end
> with this one, and rebuilding this server on a newer build of RHEL or
> recompiling BIND is not a journey that I would like to take at the moment.
> --
> Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe
> from this list
>
> ISC funds the development of this software with paid support
> subscriptions. Contact us at https://www.isc.org/contact/ for more
> information.
>
>
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20220801/35f5b9ca/attachment.htm>