dnsperf and BIND memory consumption

Thu Aug 7 14:33:25 UTC 2008

> -----Original Message-----
> From: bind-users-bounce at isc.org [mailto:bind-users-bounce at isc.org] On
> Behalf Of JINMEI Tatuya / ????
> Sent: Thursday, August 07, 2008 3:56 AM
> To: Vinny Abello
> Cc: bind-users at isc.org
> Subject: Re: dnsperf and BIND memory consumption
>
> At Thu, 7 Aug 2008 00:58:23 -0400,
> Vinny Abello <vinny at tellurian.com> wrote:
>
> > OK. I've recompiled BIND 9.5.0-P2 (from ports) without threads
> > enabled. I no longer see the memory leak at all. I'm running dnsperf
> > and I see a constant of 18MB which is much more reasonable for what
> > I am doing. For me it's easy to reproduce. Some more information
> > that may help reproduce it:
>
> > FreeBSD 7.0 STABLE AMD64 (cvsup'ed within the past week)
> > BIND 9.5.0-P2 installed via ports with threads enabled
> > Server is a Dell PowerEdge 2850 with 2 CPU's, Hyperthreading
> disabled, 4GB of RAM and a 36GB RAID1 array on a Perc4 controller (LSI
> MegaRAID chipset)
> > Dnsperf run from a different server on the same network segment over
> Gig-E
>
> This looks quite similar to the one we heard before.  I suspect this
> is due to some bad interaction between BIND9 and the FreeBSD's thread
> library or its kernel, rather than application memory leak (in which
> case you can confirm it by stopping named while its memory is growing
> and seeing it crash).  Here is what I suggested at that time to
> identify the memory eater (but unfortunately we couldn't get any
> feedback on it at that time), could you try it?

Sure, I can give it a shot.

> =======================================================================
> - create a symbolic link from "/etc/malloc.conf" to "X":
>  # ln -s X /etc/malloc.conf

What exactly is this trying to accomplish here? JFYI, I don't have a file /etc/malloc.conf on my server. Did you mean /etc/make.conf? Where is X being referenced?

> - start named with a moderate limitation of virtual memory size, e.g.
>  # /usr/bin/limits -v 384m $path_to_named/named <command line options>
>
> Then the named process will eventually abort itself with a core dump
> due to malloc failure.  Please show us the stack trace at that point.
> Hopefully it will reveal the malloc call that keeps consuming memory.

How would I show the trace that you require once this happens?

>
> Notes:
> - of course, this is a very radical way of diagnosing; you need to
>  keep watching the process because it's "guaranteed" to be aborted.
> - the VM size must be carefully chosen so that malloc failure won't
>  happen due to normal named processing.  I think 384MB is reasonable
>  enough according to the statistics you provided so far, but I'm not
>  100% sure about that.
> - it's better to keep my latest patch to adb.c and to run named with
>  '-n 1' so that the mutex_init in adb.c won't trigger the malloc
>  failure.
> - the global symbolic link from /etc/make.conf affects other
>  processes.  So, if you're running a different process than named
>  that can consume a lot of memory or can cause malloc failure, we
>  should find an alternative approach (there are some, but they are
>  more complicated so let's discuss those only when they are really
>  necessary).

Shouldn't be a problem here. Again, it's just being tested and this is the only thing the server is doing.

> =======================================================================
>
> BTW, you should be able to find the previous discussion on this matter
> by searching the bind-users at isc.org list with the subject of
> "max-cache-size doesn't work with 9.5.0b1".

I may have to go back and find this thread.

>
> ---
> JINMEI, Tatuya
> Internet Systems Consortium, Inc.
>
> p.s. I'm pretty sure it's different from the 'memory leak' issue of
> BIND9/Windows.  Let's forget it in this context.

Fair enough. I'll trust you on that.