dnsperf and BIND memory consumption

JINMEI Tatuya / 神明達哉 Jinmei_Tatuya at isc.org
Thu Aug 7 07:56:08 UTC 2008


At Thu, 7 Aug 2008 00:58:23 -0400,
Vinny Abello <vinny at tellurian.com> wrote:

> OK. I've recompiled BIND 9.5.0-P2 (from ports) without threads
> enabled. I no longer see the memory leak at all. I'm running dnsperf
> and I see a constant of 18MB which is much more reasonable for what
> I am doing. For me it's easy to reproduce. Some more information
> that may help reproduce it:

> FreeBSD 7.0 STABLE AMD64 (cvsup'ed within the past week)
> BIND 9.5.0-P2 installed via ports with threads enabled
> Server is a Dell PowerEdge 2850 with 2 CPU's, Hyperthreading disabled, 4GB of RAM and a 36GB RAID1 array on a Perc4 controller (LSI MegaRAID chipset)
> Dnsperf run from a different server on the same network segment over Gig-E

This looks quite similar to the one we heard before.  I suspect this
is due to some bad interaction between BIND9 and the FreeBSD's thread
library or its kernel, rather than application memory leak (in which
case you can confirm it by stopping named while its memory is growing
and seeing it crash).  Here is what I suggested at that time to
identify the memory eater (but unfortunately we couldn't get any
feedback on it at that time), could you try it?

=======================================================================
- create a symbolic link from "/etc/malloc.conf" to "X":
 # ln -s X /etc/malloc.conf
- start named with a moderate limitation of virtual memory size, e.g.
 # /usr/bin/limits -v 384m $path_to_named/named <command line options>

Then the named process will eventually abort itself with a core dump
due to malloc failure.  Please show us the stack trace at that point.
Hopefully it will reveal the malloc call that keeps consuming memory.

Notes:
- of course, this is a very radical way of diagnosing; you need to
 keep watching the process because it's "guaranteed" to be aborted.
- the VM size must be carefully chosen so that malloc failure won't
 happen due to normal named processing.  I think 384MB is reasonable
 enough according to the statistics you provided so far, but I'm not
 100% sure about that.
- it's better to keep my latest patch to adb.c and to run named with
 '-n 1' so that the mutex_init in adb.c won't trigger the malloc
 failure.
- the global symbolic link from /etc/make.conf affects other
 processes.  So, if you're running a different process than named
 that can consume a lot of memory or can cause malloc failure, we
 should find an alternative approach (there are some, but they are
 more complicated so let's discuss those only when they are really
 necessary).
=======================================================================

BTW, you should be able to find the previous discussion on this matter
by searching the bind-users at isc.org list with the subject of
"max-cache-size doesn't work with 9.5.0b1".

---
JINMEI, Tatuya
Internet Systems Consortium, Inc.

p.s. I'm pretty sure it's different from the 'memory leak' issue of
BIND9/Windows.  Let's forget it in this context.


More information about the bind-users mailing list