bind-9.10.0-P2 memory leak?

Tue Sep 9 09:05:01 UTC 2014

On Tuesday 09/09/2014 at 9:22 am, Mike Hoskins (michoski)  wrote:
> Do you guys have max-cache-size set?  I didn't see it in the 
> borderworlds
> named.conf.  I've seen similar growth problems when testing 9.x before
> setting that (experiment at the time just to see what would happen, 
> and
> confirmed this behavior).  Set sensible resource limits based on 
> available
> resources.
>
> -----Original Message-----
> From: Vinícius Ferrão <ferrao at if.ufrj.br>
> Date: Tuesday, September 9, 2014 at 10:17 AM
> To: Thomas Schulz <schulz at adi.com>
> Cc: "bind-users at isc.org" <bind-users at isc.org>
> Subject: Re: bind-9.10.0-P2 memory leak?
>
>>
>> I'm having the exactly same issue. Take a look at my post 
>> @ServerFault:
>> http://serverfault.com/questions/616752/bind-9-10-constantly-killed-on-fre
>> ebsd-10-0-with-out-of-swap-space
>>
>> Sent from my iPhone
>>
>> On 09/09/2014, at 11:15, "Thomas Schulz" <schulz at adi.com> wrote:
>>
>>>
>>>>
>>>> Hello
>>>>
>>>> I recently upgraded my authoritative nameservers to bind-9.10.0-P2 and
>>>> after a while one of them ended up using all its swap and the named
>>>> process got killed. The other servers are seeing similar behaviour,
>>>> but
>>>> I restarted named on all of them to postpone further crashes.
>>>>
>>>> I am using rate-limiting as well DLZ with PostgreSQL. The server has
>>>> two
>>>> views. The operating system is FreeBSD 8.4.
>>>>
>>>> My configuration:
>>>> http://borderworlds.dk/~xi/named-leak/named.conf
>>>>
>>>> Log of the memory usage:
>>>> http://borderworlds.dk/~xi/named-leak/named-mem-usage.log
>>>>
>>>> As you can see, in less than a week, named has grown more than 900MB
>>>> in
>>>> size.
>>>>
>>>> Is anyone else experiencing something similar?
>>>>
>>>> If I need to provide more information, I will be happy to do so.
>>>>
>>>> --
>>>> Christian Laursen
>>>
>>> What version did you upgrade from? I am seeing bind 9.9.5 and 9.9.6
>>> grow without any evidence that it will ever stop. See my mail to this
>>> list with the subject "Re: Process size versus cache size." Mine is
>>> growing slower than yours, but it is now up to 548 MB.
>>>
>>> Tom Schulz
>>> Applied Dynamics Intl.
>>> schulz at adi.com

 freebsd 10.0, bind-9.10.0-p2

logging the rss field for named process:

less /var/tmp/bind_rss_history.txt

2014-09-06  17:03:34     338224
2014-09-06  18:00:00     395828
2014-09-06  19:00:00     444008
2014-09-06  20:00:00     487236
2014-09-06  21:00:00     525892
2014-09-06  22:00:00     567940
2014-09-06  23:00:00     611120
2014-09-07  00:00:00     644772
2014-09-07  01:00:00     674904
2014-09-07  02:00:00     700492
2014-09-07  03:00:00     726364
2014-09-07  04:00:00     748328
2014-09-07  05:00:00     774316
2014-09-07  06:00:00     799064
2014-09-07  07:00:00     827808
2014-09-07  08:00:00     867444
2014-09-07  09:00:00     917444
2014-09-07  10:00:00     972268
2014-09-07  11:00:00    1029304
2014-09-06  17:03:34     338224
2014-09-06  18:00:00     395828
2014-09-06  19:00:00     444008
2014-09-06  20:00:00     487236
2014-09-06  21:00:00     525892
2014-09-06  22:00:00     567940
2014-09-06  23:00:00     611120
2014-09-07  00:00:00     644772
2014-09-07  01:00:00     674904
2014-09-07  02:00:00     700492
2014-09-07  03:00:00     726364
2014-09-07  04:00:00     748328
2014-09-07  05:00:00     774316
2014-09-07  06:00:00     799064
2014-09-07  07:00:00     827808
2014-09-07  08:00:00     867444
2014-09-07  09:00:00     917444
2014-09-07  10:00:00     972268
2014-09-07  11:00:00    1029304
2014-09-06  17:03:34     338224
2014-09-06  18:00:00     395828
2014-09-06  19:00:00     444008
2014-09-06  20:00:00     487236
2014-09-06  21:00:00     525892
2014-09-06  22:00:00     567940
2014-09-06  23:00:00     611120
2014-09-07  00:00:00     644772
2014-09-07  01:00:00     674904
2014-09-07  02:00:00     700492
2014-09-07  03:00:00     726364
2014-09-07  04:00:00     748328
2014-09-07  05:00:00     774316
2014-09-07  06:00:00     799064
2014-09-07  07:00:00     827808
2014-09-07  08:00:00     867444
2014-09-07  09:00:00     917444
2014-09-07  10:00:00     972268
2014-09-07  11:00:00    1029304
2014-09-07  12:00:00    1088408
2014-09-07  13:00:00    1142456
2014-09-07  14:00:00    1184344
2014-09-07  15:00:00    1226052
2014-09-07  16:00:00    1267760
2014-09-07  17:00:00    1309736
2014-09-07  18:00:00    1347532
2014-09-07  19:00:00    1383300
2014-09-07  20:00:00    1418932
2014-09-07  21:00:00    1459112
2014-09-07  22:00:00    1506108
2014-09-07  23:00:00    1544512
2014-09-08  00:00:00    1576344
2014-09-08  01:00:00    1600988
2014-09-08  02:00:00    1623128
2014-09-08  03:00:00    1644520
2014-09-08  04:00:00    1665716
2014-09-08  05:00:00    1688844
2014-09-08  06:00:00    1713836
2014-09-08  07:00:00    1748720
2014-09-08  08:00:00     240072
2014-09-08  09:00:00     371388
2014-09-08  10:00:00     456952
2014-09-08  11:00:00     530696
2014-09-08  12:00:00     599792
2014-09-08  13:00:00     666280
2014-09-08  14:00:00     727884
2014-09-08  15:00:00     789672
2014-09-08  16:00:00     853456
2014-09-08  17:00:00     916520
2014-09-08  18:00:00     967940
2014-09-08  19:00:00    1011616
2014-09-08  20:00:00    1051452
2014-09-08  21:00:00    1095352
2014-09-08  22:00:00    1146388
2014-09-08  23:00:00    1198776
2014-09-09  00:00:00    1241256
2014-09-09  01:00:00    1279640
2014-09-09  02:00:00    1312936
2014-09-09  03:00:00    1342592
2014-09-09  04:00:00    1372092
2014-09-09  05:00:00    1407444
2014-09-09  06:00:00    1441632
2014-09-09  07:00:00    1483464

This never happened with earlier BIND9, and our mx1 uses this 
recursive BIND machine for all domain/ptr  lookups

I've never seen any bind take over 1GB of RAM.

max-cache-size isn't the solution, only a band-aid

the sawtooth above is from restarting named.

named has halted twice in the past couple weeks, we suspected some 
kind of attack, the only trace we had was in syslog with something 
like "swap space failed, named halted", but with a dedicated DNS box 
and 3 GB, there should never be any swapping.  I set a watcher for 
"swap used > 1%".  Got an alert, I saw the named rss to be 1.9GB.  
restarted bind and wrote the rss named logging script.

Len
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20140909/aa21b875/attachment-0001.html>