Increase in open files / sockets after version upgrade (9.16 to 9.18)

Thu Nov 3 09:15:17 UTC 2022

Hey, a few days ago I upgraded multiple DNS-Servers from version 9.16.1-0ubuntu2.11 to 9.18.1-1ubuntu1.2 (And from Ubuntu 20.04 to Ubuntu 22.04) and observed a change in behavior that I am not able to explain.There was no change in the bind configuration being used. While operating bind 9.16 and bind 9.18 in parallel, I can only see the increase in open files / sockets on machines running bind 9.18. The amount of open files / sockets using bind 9.16 seems to be consistent.

This issue can be seen on those graphs:

Those graphs show the amount of open file descriptors by the bind process. While investigating the logs, I was able to correlate the grows in file descriptors with those log messages: 28-Oct-2022 03:19:49.384 general: info: zone sub.<mydomain.com>/IN/inside: notify from 10.12.34.52#53821: serial 1541576593
28-Oct-2022 03:20:08.428 general: info: zone sub.<mydomain.com>/IN/inside: notify from 10.12.34.52#45864: serial 1541576594
28-Oct-2022 03:20:17.389 general: info: zone sub.<mydomain.com>/IN/inside: notify from 10.12.34.52#45864: serial 1541576595
28-Oct-2022 03:21:16.257 general: info: zone sub.<mydomain.com>/IN/inside: notify from 10.12.34.52#54654: serial 1541576596
28-Oct-2022 03:24:30.641 general: info: zone sub.<mydomain.com>/IN/inside: notify from 10.12.34.52#36460: serial 1541576598
28-Oct-2022 03:24:35.641 general: info: zone sub.<mydomain.com>/IN/inside: notify from 10.12.34.52#36460: zone is up to date

I checked an lsof of the bind process and stumbled upon thousands of these:

named   3842408 bind 1882u     IPv4          959669053      0t0       UDP <myHost>:57462->172.18.117.2:domainnamed   3842408 bind 1883u     IPv4          959669222      0t0       UDP <myHost>:45831->172.18.12.80:domainnamed   3842408 bind 1884u     IPv4          959669224      0t0       UDP <myHost>:48081->172.18.12.81:domainnamed   3842408 bind 1885u     IPv4          959669226      0t0       UDP <myHost>:50683->172.18.48.20:domainnamed   3842408 bind 1886u     IPv4          959669228      0t0       UDP <myHost>:37361->172.18.48.40:domainnamed   3842408 bind 1887u     IPv4          959669230      0t0       UDP <myHost>:45471->172.18.48.41:domainnamed   3842408 bind 1888u     IPv4          959669367      0t0       UDP <myHost>:43025->172.19.2.2:domainnamed   3842408 bind 1889u     IPv4          959669369      0t0       UDP <myHost>:41729->172.19.2.4:domainnamed   3842408 bind 1890u     IPv4          959669539      0t0       UDP <myHost>:33132->172.20.1.1:domainnamed   3842408 bind 1891u     IPv4          959669541      0t0       UDP <myHost>:33077->172.26.22.22:domainnamed   3842408 bind 1892u     IPv4          959669375      0t0       UDP <myHost>:44034->172.19.96.4:domainnamed   3842408 bind 1893u     IPv4          959669543      0t0       UDP <myHost>:35650->172.26.84.10:domainnamed   3842408 bind 1894u     IPv4          959669545      0t0       UDP <myHost>:34926->172.26.110.10:domainnamed   3842408 bind 1895u     IPv4          959669547      0t0       UDP <myHost>:39270->172.27.78.10:domainnamed   3842408 bind 1896u     IPv4          959669549      0t0       UDP <myHost>:59812->172.27.78.20:domainnamed   3842408 bind 1897u     IPv4          959669551      0t0       UDP <myHost>:41163->172.19.48.10:domain
Those destination IPs are configured in my configuration to a) be notified upon zone change and b) be allowed to perform zone transfers.
I figured out, that those IPs are old and not in use any more, so I am aware that I should remove them from my configuration completely.So connections to those IPs should just time out.

And it seems like, in bind 9.16 they timed out quite quickly, as the sockets were not kept open for too long, while in bind 9.18 it seems to be hours until they disappear again.
Before actually removing those IPs from my configuration, I would know, if I can set a proper timeout somewhere, to prevent this from happening again.
The options-part of my bind config currently looks like this:

options {    check-names master ignore;    check-names slave ignore;    check-names response ignore;    dnssec-validation no;
    directory "/var/cache/bind";    auth-nxdomain no;    zone-statistics yes;    files 4096;    allow-recursion {        localnets;        localhost;        internal;        myNetA;        myNetB;    };    check-spf ignore;    masterfile-format text;
    listen-on port 53 { any; };
    notify yes;    notify-source <myIp>;
    query-source address <myIp>;};

Was there any change of a default timeout that I missed in the change logs?
It would be amazing, if you could help me to prevent listeners from piling up, in case notify-addresses can't be reached.

Thanks a lot
Marno
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20221103/43556d2d/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Bildschirmfoto 2022-10-28 um 13.22.07.png
Type: image/png
Size: 110780 bytes
Desc: not available
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20221103/43556d2d/attachment-0001.png>