8.2.2-P5 Solaris2.6 db_load problems
Heikki Hannikainen
hessu at hes.iki.fi
Fri Apr 28 07:59:18 UTC 2000
On Wed, 26 Apr 2000, Jim Reid wrote:
> >>>>> "Heikki" == Heikki Hannikainen <hessu at hes.iki.fi> writes:
>
> Heikki> 26-Apr-2000 14:49:28.840 load: db_load could not open: domain.fi.le: Resource temporarily unavailable
>
> Looks like your 2.6 box is running out of file descriptors. Check the
> man page for open() and find out why it returns EAGAIN: "Resource
> temporarily unavailable". You probably have to fix some kernel value
> to increase the number of file descriptors. Or you could use lsof to
> find out where they've all gone.
Okay, i found the problem. It's not the kernel limits (they've been
increased anyway over the defaults), but the fact that db_load uses
stdio's fopen() to open the db files on disk. And Solaris (2.5.1 and 2.6
at least) STDIO has a limit of 255 files (see stdio(3S)):
Note that no more than 255 files may be
opened using fopen(), and only file descriptors 0 through
255 can be used in a stream.
truss -p <named-pid> 2>&1 | grep open, and following the log file
revealed that the zones for which open() returned a fd > 255 failed (open
succeeds but the fopen() returns the error).
open("zone1.com", O_RDONLY) = 272
open("zone2.fi", O_RDONLY) = 272
open("1.2.4", O_RDONLY) = 74
open("3.4.5", O_RDONLY) = 75
open("zone", O_RDONLY) = 289
Now, the strange thing is that lsof shows a LARGE number of TCP
connections (established and SYN_SENT) to the domain port of a single
other DNS server, for which i can find NO slave entries in my
configuration.
root at host:~#> grep named lsof.out|wc -l
2118
root at host:~#> grep named-xf lsof.out|wc -l
1346
That leaves 772 fd's for named.
root at host:~#> grep named lsof.out|grep TCP|wc -l
2021
Of which quite a lot are TCP. And almost all of the TCP connections are
to the domain port of a single DNS server which does not appear in my
configuration. Both named and a single named-xfer make connections to that
server, and that single named-xfer, according to it's argument list, is
supposed to be getting a zone from a _different_ server at a different
organisation.
So, two problems... one would need to skip using stdio, or perhaps link
against something like the sfio library
(http://www.research.att.com/sw/tools/sfio/). Second, why do all these TCP
connections take place...
- Hessu
More information about the bind-users
mailing list