Bind 9.11/RHEL7 Server Freezes FUTEX_WAKE_PRIVATE

White, Peter pwhite at penguinrandomhouse.com
Tue Aug 2 00:41:39 UTC 2022


Greg, What other awesome stuff do you have on the top of your head? This makes sense as it’s running on EC2 @AWS (I.e. poor source of randomness on VM’s).

And thanks to Grant for the haveged suggestion.  Initial tests with haveged running seem to be positive.

I’ll report back here if the problem continues.

Thanks so much for your help!


From: Greg Choules <gregchoules+bindusers at googlemail.com>
Date: Monday, August 1, 2022 at 6:21 PM
To: White, Peter <pwhite at penguinrandomhouse.com>
Cc: bind-users at lists.isc.org <bind-users at lists.isc.org>
Subject: Re: Bind 9.11/RHEL7 Server Freezes FUTEX_WAKE_PRIVATE
CAUTION: This email originated from outside of Penguin Random House. Please be extra cautious when opening file attachments or clicking on links.

Hi Peter.
Off the top of my head, could it be this?

random-device

The source of entropy to be used by the server. Entropy is primarily needed for DNSSEC operations, such as TKEY transactions and dynamic update of signed zones. This options specifies the device (or file) from which to read entropy. If this is a file, operations re- quiring entropy will fail when the file has been exhausted. If not specified, the default value is /dev/random (or equivalent) when present, and none otherwise. The random- device option takes effect during the initial configuration load at server startup time and is ignored on subsequent reloads.

BIND will need a good source of randomness for crypto operations.

Cheers, Greg

On Mon, 1 Aug 2022 at 23:08, White, Peter <pwhite at penguinrandomhouse.com<mailto:pwhite at penguinrandomhouse.com>> wrote:

I’m running BIND 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.9 (Extended Support Version) on RHEL 7 in a chroot jail.

As of late, at times running some rndc commands are causing my server to lock up. It’s usually an “rndc addzone” that triggers the issue. I’ll also mention that I have recently started signing some domains with DNSSEC, so I suspect it may be somehow related.

Here is an example of a command that frequently triggers my issue, although it doesn’t trigger it every time.

rndc addzone '"example.com<http://example.com>" in external {type master; file "dnssec/example.com<http://example.com>";key-directory "keys"; auto-dnssec maintain; inline-signing yes;};'

During these times, named will not respond to any rndc commands, nothing is logged to the bind logs (I’m running trace level 3 ), and will not answer queries. Everything seems just frozen in time. Waiting for a period of time, varying from a few seconds to many minutes, the server picks back up again and operates normally. The following are my observations to this point.

CPU and memory show as being fine.


top - 17:57:37 up 33 min,  3 users,  load average: 0.00, 0.01, 0.05

Tasks: 125 total,   2 running, 123 sleeping,   0 stopped,   0 zombie

%Cpu(s):  0.2 us,  0.3 sy,  0.0 ni, 98.5 id,  0.0 wa,  0.0 hi,  0.0 si,  1.0 s

KiB Mem :  1842956 total,   439452 free,   665760 used,   737744 buff/cache

KiB Swap:  8384508 total,  8384508 free,        0 used.  1013652 avail Mem

Strace shows the following over and over again.


strace -p 1156 -f



[pid  1159] futex(0x7fc1c15a307c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 16657, {tv_sec=1659390139, tv_nsec=255860000}, 0xffffffff) = -1 ETIMEDOUT (Connection timed out)


Any pointers here would be greatly appreciated. I’m about at my wits end with this one, and rebuilding this server on a newer build of RHEL or recompiling BIND is not a journey that I would like to take at the moment.
--
Visit https://lists.isc.org/mailman/listinfo/bind-users<https://lists.isc.org/mailman/listinfo/bind-users> to unsubscribe from this list

ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/<https://www.isc.org/contact/> for more information.


bind-users mailing list
bind-users at lists.isc.org<mailto:bind-users at lists.isc.org>
https://lists.isc.org/mailman/listinfo/bind-users<https://lists.isc.org/mailman/listinfo/bind-users>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20220802/1cf27c59/attachment-0001.htm>


More information about the bind-users mailing list