Need to improve named performance
Kevin Darcy
kcd at chrysler.com
Sun Nov 11 20:48:26 UTC 2012
On 11/10/2012 1:39 PM, Ed LaFrance wrote:
> Hello all -
>
> First post to this list, hope I'm on the right place.
>
> Running BIND 9.3.6-P1-RedHat-9.3.6-16.P1.el5 on a quadcore xeon server
> (3Ghz) with 2GB RAM. Named is being used only for rDNS queries against
> our address space.
>
> The issue is that named is not keeping up with rdns requests. The
> nameserver is only doing rdns, and it's the only public process on the
> server (no webhosting, monitoring, etc).
>
> When I check the router above this server I'll see 200 - 500
> legitimate connections to this server at any given time. This is
> what's happening: named is not keeping up with the requests, so the
> network receive queue fills up - I can see this with netstat:
>
> netstat -tulpn | grep :53
> Proto Recv-Q Send-Q Local Address Foreign Address
> PID/Program name
> ...
> udp 110048 0 xxx.xxx.xxx.xxx:53 0.0.0.0:* 3918/named
> udp 110048 0 xxx.xxx.xxx.xxx:53 0.0.0.0:* 3918/named
>
> (two different IPs are on this machine to handle rDNS reqeusts)
>
> Once the queue gets near the max value set by sysctl, udp packets
> start to drop - this can also be seen in netstat:
>
> netstat -su
> ...
> Udp:
> 5157567 packets received
> 9761 packets to unknown port received.
> 1164232 packet receive errors
> 5157554 packets sent
>
> The errors apparently correspond to drops; the only increase when the
> queue is full.
>
> Of course by this point dns queries are timing out. I've tried
> increasing the queue size with sysctl using this command:
>
> sysctl -w net.core.rmem_max=1048576 net.core.rmem_default=10485
>
> then restarting named; that did eliminate the drops, but the queue
> grows gigantic and I get pretty much 100% dns lookup timeouts at that
> point.
>
> The server loading is about 2.0 - busy, not not overwhelmed, I can run
> a shell or even a gui session on it with ease so it's by no means
> maxed out. Here's the first slice of top output:
>
> top - 09:13:38 up 18:40, 1 user, load average: 2.09, 2.05, 2.00
> Tasks: 175 total, 1 running, 174 sleeping, 0 stopped, 0 zombie
> Cpu(s): 0.2%us, 0.2%sy, 0.0%ni, 74.8%id, 24.7%wa, 0.0%hi, 0.2%si,
> 0.0%st
> Mem: 2074984k total, 1743584k used, 331400k free, 166588k buffers
> Swap: 4128760k total, 28k used, 4128732k free, 1270032k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 4509 named 24 0 71004 4580 2036 S 1.3 0.2 0:46.74 named
> 6877 root 15 0 2428 1064 788 R 0.7 0.1 0:00.04 top
> 467 root 10 -5 0 0 0 D 0.3 0.0 2:59.13 kjournald
> 2460 root 18 0 1816 584 484 D 0.3 0.0 3:30.35 syslogd
> 1 root 15 0 2160 644 556 S 0.0 0.0 0:01.08 init
>
> The bottom line is: I need to improve named performance. Tcpdump only
> shows about 20 requests per second on average, I would estimate. This
> should be handled easily, but instead it's gagging on it and the
> requests are stacking up. If you have any ideas, I welcome your input.
> Here's named.conf, it's pretty basic for the global config, the data
> for each zone is stored separately elsewhere:
>
> options {
> directory "/var";
> auth-nxdomain no;
> pid-file "/var/run/named/named.pid";
> allow-recursion {
> localnets;
> };
>
> allow-transfer {
> "none";
> };
> };
>
> key "rndc-key" {
> algorithm hmac-md5;
> secret "xxxxxxxxxxxxxxxxxxxxxx";
> };
>
> controls {
> inet 127.0.0.1 port 953
> allow { 127.0.0.1; } keys { "rndc-key"; };
> };
>
> zone "." {
> type hint;
> file "named.root";
> };
>
> zone "0.0.127.IN-ADDR.ARPA" {
> type master;
> file "localhost.rev";
> };
I wouldn't expect a nameserver process on Linux, hosting only a few
reverse zones and doing nothing else, to be 71 megabytes in size; I just
checked one of ours, serving *all* of our internal zone data, forward
and reverse authoritative, plus some cached data for a significant
number of zones delegated to business partners, and it's less than 100
Mb in size.
Verify from your query logs, or by dumping cache, that it's *only* doing
what it is supposed to do, and no more. If you've got a bunch of data in
your cache, or a bunch of queries, that's unrelated to serving your
reverse DNS, then that's probably the root cause of your problem.
Consider turning off recursion, or severely limiting it, in order to
enforce that the nameserver is only serving its intended purpose. 2Gb of
memory is a little lean for a nameserver serving a *generic*
Internet-name-lookup role...
I guess another possibility is that you've gone crazy with your reverse
zones (e.g. using $GENERATE willy-nilly), and thus are using up way more
memory than you really need, to serve your reverse-resolution needs.
- Kevin
More information about the bind-users
mailing list