Probably solved (Urgent Solaris/BIND issue)

James Noyes jnoyes-bind at retrogeeks.com
Wed Jul 13 13:35:53 UTC 2005


It looks like the issue may be solved, and Karl's comments about limiting
cache size pointed me to my mistake.

It appears I was bitten by a combination of enormous amounts of caching
combined with a leftover "datasize" configuration directive from when BIND 8
was in place.  The BIND 9 config says:

   "This is a hard limit on server memory usage. If the server attempts to
   allocate memory in excess of this limit, the allocation will fail, which
   may in turn leave the server unable to perform DNS service."

And it certainly does just that.  I had a "datasize 20M" directive, and sure
enough, after 20M of memory allocation, BIND came to a screaming halt.

It appears that these external servers simply have far more data cached in
them than the internals ever do, and the internals have just never bumped
into the limit.  It also appears to be related to the rate of DNS traffic
flow, which is probably why it ran happily for 6 weeks, then failed.

I still have yet to set a max-cache-size parameter and restart to see if I
can keep total memory usage down, but at least I'm not dishing out SERVFAILs
any more.

As far as the other input I received:

- I'll be installing BIND 9.3.x today
  (I asked about stability because initially the 9.3.x branch was tagged as
  "not yet ready for production - use 9.2.x instead")
- I will probably not switch on the internal malloc() option just yet
- I plan to set a max-cahce-size options and see if the total heap usage
  stays at a resonable level that way.  I'm currently at 35 and 28 megs of
  heap in use on these machines.
- I'll probably stick with these same Solaris machines
  (I understood that the suggestion to try a different OS was simply to
  eliminate the OS as a possible problem source.  Unfortunately, these
  machines do provide more than DNS service on their IP's, so I would have
  had to wait and make significant network-related changes to do this.
  The shared system is one of the reasons that BIND runs chrooted on them.)

Thanks for all the input, and I'll probably check in one more time after
experimenting with limiting the cache size.
-- 
James Noyes
(jnoyes42 at retrogeeks.com)



More information about the bind-users mailing list