Increase in open files / sockets after version upgrade (9.16 to 9.18)
Marno Krahmer
speedtouch92 at yahoo.de
Tue Nov 8 08:44:38 UTC 2022
Hey again,
I tried setting the following timeouts, hoping to achieve an improvement:
max-transfer-idle-in 5; max-transfer-idle-out 5; max-transfer-time-in 5; max-transfer-time-out 5;
But this did neither decrease the amount of open outgoing connections, nor did it shorten the time, they stay open.
Anything else I can try?
Cheers
Marno
Am Donnerstag, 3. November 2022 um 10:16:12 MEZ hat Marno Krahmer via bind-users <bind-users at lists.isc.org> Folgendes geschrieben:
Hey, a few days ago I upgraded multiple DNS-Servers from version 9.16.1-0ubuntu2.11 to 9.18.1-1ubuntu1.2 (And from Ubuntu 20.04 to Ubuntu 22.04) and observed a change in behavior that I am not able to explain.There was no change in the bind configuration being used. While operating bind 9.16 and bind 9.18 in parallel, I can only see the increase in open files / sockets on machines running bind 9.18. The amount of open files / sockets using bind 9.16 seems to be consistent.
This issue can be seen on those graphs:
Those graphs show the amount of open file descriptors by the bind process. While investigating the logs, I was able to correlate the grows in file descriptors with those log messages: 28-Oct-2022 03:19:49.384 general: info: zone sub.<mydomain.com>/IN/inside: notify from 10.12.34.52#53821: serial 1541576593
28-Oct-2022 03:20:08.428 general: info: zone sub.<mydomain.com>/IN/inside: notify from 10.12.34.52#45864: serial 1541576594
28-Oct-2022 03:20:17.389 general: info: zone sub.<mydomain.com>/IN/inside: notify from 10.12.34.52#45864: serial 1541576595
28-Oct-2022 03:21:16.257 general: info: zone sub.<mydomain.com>/IN/inside: notify from 10.12.34.52#54654: serial 1541576596
28-Oct-2022 03:24:30.641 general: info: zone sub.<mydomain.com>/IN/inside: notify from 10.12.34.52#36460: serial 1541576598
28-Oct-2022 03:24:35.641 general: info: zone sub.<mydomain.com>/IN/inside: notify from 10.12.34.52#36460: zone is up to date
I checked an lsof of the bind process and stumbled upon thousands of these:
named 3842408 bind 1882u IPv4 959669053 0t0 UDP <myHost>:57462->172.18.117.2:domainnamed 3842408 bind 1883u IPv4 959669222 0t0 UDP <myHost>:45831->172.18.12.80:domainnamed 3842408 bind 1884u IPv4 959669224 0t0 UDP <myHost>:48081->172.18.12.81:domainnamed 3842408 bind 1885u IPv4 959669226 0t0 UDP <myHost>:50683->172.18.48.20:domainnamed 3842408 bind 1886u IPv4 959669228 0t0 UDP <myHost>:37361->172.18.48.40:domainnamed 3842408 bind 1887u IPv4 959669230 0t0 UDP <myHost>:45471->172.18.48.41:domainnamed 3842408 bind 1888u IPv4 959669367 0t0 UDP <myHost>:43025->172.19.2.2:domainnamed 3842408 bind 1889u IPv4 959669369 0t0 UDP <myHost>:41729->172.19.2.4:domainnamed 3842408 bind 1890u IPv4 959669539 0t0 UDP <myHost>:33132->172.20.1.1:domainnamed 3842408 bind 1891u IPv4 959669541 0t0 UDP <myHost>:33077->172.26.22.22:domainnamed 3842408 bind 1892u IPv4 959669375 0t0 UDP <myHost>:44034->172.19.96.4:domainnamed 3842408 bind 1893u IPv4 959669543 0t0 UDP <myHost>:35650->172.26.84.10:domainnamed 3842408 bind 1894u IPv4 959669545 0t0 UDP <myHost>:34926->172.26.110.10:domainnamed 3842408 bind 1895u IPv4 959669547 0t0 UDP <myHost>:39270->172.27.78.10:domainnamed 3842408 bind 1896u IPv4 959669549 0t0 UDP <myHost>:59812->172.27.78.20:domainnamed 3842408 bind 1897u IPv4 959669551 0t0 UDP <myHost>:41163->172.19.48.10:domain
Those destination IPs are configured in my configuration to a) be notified upon zone change and b) be allowed to perform zone transfers.
I figured out, that those IPs are old and not in use any more, so I am aware that I should remove them from my configuration completely.So connections to those IPs should just time out.
And it seems like, in bind 9.16 they timed out quite quickly, as the sockets were not kept open for too long, while in bind 9.18 it seems to be hours until they disappear again.
Before actually removing those IPs from my configuration, I would know, if I can set a proper timeout somewhere, to prevent this from happening again.
The options-part of my bind config currently looks like this:
options { check-names master ignore; check-names slave ignore; check-names response ignore; dnssec-validation no;
directory "/var/cache/bind"; auth-nxdomain no; zone-statistics yes; files 4096; allow-recursion { localnets; localhost; internal; myNetA; myNetB; }; check-spf ignore; masterfile-format text;
listen-on port 53 { any; };
notify yes; notify-source <myIp>;
query-source address <myIp>;};
Was there any change of a default timeout that I missed in the change logs?
It would be amazing, if you could help me to prevent listeners from piling up, in case notify-addresses can't be reached.
Thanks a lot
Marno
--
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list
ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information.
bind-users mailing list
bind-users at lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20221108/36ace01e/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Bildschirmfoto 2022-10-28 um 13.22.07.png
Type: image/png
Size: 110780 bytes
Desc: not available
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20221108/36ace01e/attachment-0001.png>
More information about the bind-users
mailing list