8.2.2-P5 Solaris2.6 db_load problems

Heikki Hannikainen hessu at hes.iki.fi
Fri Apr 28 07:59:18 UTC 2000


On Wed, 26 Apr 2000, Jim Reid wrote:

> >>>>> "Heikki" == Heikki Hannikainen <hessu at hes.iki.fi> writes:
> 
>     Heikki> 26-Apr-2000 14:49:28.840 load: db_load could not open: domain.fi.le: Resource temporarily unavailable
> 
> Looks like your 2.6 box is running out of file descriptors. Check the
> man page for open() and find out why it returns EAGAIN: "Resource
> temporarily unavailable". You probably have to fix some kernel value
> to increase the number of file descriptors. Or you could use lsof to
> find out where they've all gone.

  Okay, i found the problem. It's not the kernel limits (they've been
increased anyway over the defaults), but the fact that db_load uses
stdio's fopen() to open the db files on disk. And Solaris (2.5.1 and 2.6
at least) STDIO has a limit of 255 files (see stdio(3S)):

     Note that no more than  255  files  may  be
     opened  using  fopen(),  and only file descriptors 0 through
     255 can be used in a stream.

  truss -p <named-pid> 2>&1 | grep open, and following the log file
revealed that the zones for which open() returned a fd > 255 failed (open
succeeds but the fopen() returns the error).

open("zone1.com", O_RDONLY)                    = 272
open("zone2.fi", O_RDONLY)                   = 272   
open("1.2.4", O_RDONLY)                   = 74
open("3.4.5", O_RDONLY)                   = 75
open("zone", O_RDONLY)                            = 289  

  Now, the strange thing is that lsof shows a LARGE number of TCP
connections (established and SYN_SENT) to the domain port of a single
other DNS server, for which i can find NO slave entries in my
configuration.

root at host:~#> grep named lsof.out|wc -l
   2118
root at host:~#> grep named-xf lsof.out|wc -l
   1346

 That leaves 772 fd's for named.

root at host:~#> grep named lsof.out|grep TCP|wc -l
   2021

  Of which quite a lot are TCP. And almost all of the TCP connections are
to the domain port of a single DNS server which does not appear in my
configuration. Both named and a single named-xfer make connections to that
server, and that single named-xfer, according to it's argument list, is
supposed to be getting a zone from a _different_ server at a different
organisation.

  So, two problems... one would need to skip using stdio, or perhaps link
against something like the sfio library
(http://www.research.att.com/sw/tools/sfio/). Second, why do all these TCP
connections take place...

  - Hessu





More information about the bind-users mailing list