BIND 8.2.3-REL Problem "db_load could not open..."

Kevin Darcy kcd at daimlerchrysler.com
Thu Apr 19 20:31:22 UTC 2001


Barry Finkel wrote:

> I wrote (in part):
>
> >> Apr 13 09:20:53 titania.ctd.anl.gov named[2559]:
> >>    Lame server on '14.249.26.209.in-addr.arpa'
> >>    (in '249.26.209.in-addr.arpa'?): [205.160.188.2].53 'dns2.utelfla.com'
> >> Apr 13 09:20:58 titania.ctd.anl.gov named-xfer[19924]:
> >>    send AXFR query 0 to 146.137.96.48
> >> Apr 13 09:20:58 titania.ctd.anl.gov named[2559]: db_load could not open:
> >>    named._sites.bio: No such file or directory
> >> Apr 13 09:20:58 titania.ctd.anl.gov named[2559]: db_load could not open:
> >>    named._sites.bio: No such file or directory
> >> Apr 13 09:20:58 titania.ctd.anl.gov named[2559]: db_load could not open:
> >>    named.bio: No such file or directory
> >> Apr 13 09:20:58 titania.ctd.anl.gov named[2559]: db_load could not open:
> >>    named.bio: No such file or directory
> >> Apr 13 09:20:58 titania.ctd.anl.gov named[2559]: db_load could not open:
> >>    named.anlgov: No such file or directory
> >> Apr 13 09:20:58 titania.ctd.anl.gov named[2559]: db_load could not open:
> >>    named.anlgov: No such file or directory
> >> Apr 13 09:20:58 titania.ctd.anl.gov named[2559]: db_load could not open:
> >>    named.root: No such file or directory
> >> Apr 13 09:20:58 titania.ctd.anl.gov named[2559]: db_load could not open:
> >>    named.root: No such file or directory
> >> Apr 13 09:20:59 titania.ctd.anl.gov named[2559]: No root nameservers for class IN
> >> Apr 13 09:20:59 titania.ctd.anl.gov named[2559]: sysquery:
> >>    findns error (SERVFAIL) on dns1.anl.gov?
> >> ----------------------------------------------------------------------
> >> MANY lines of messages have been deleted here.
> >> ----------------------------------------------------------------------
> >> Apr 13 09:59:50 titania.ctd.anl.gov named[2559]: sysquery:
> >>    findns error (SERVFAIL) on ns2.es.net?
> >> Apr 13 09:59:50 titania.ctd.anl.gov named[2559]: sysquery: nlookup error on ?
> >> Apr 13 09:59:50 titania.ctd.anl.gov last message repeated 10 times
> >>
> >> <<Here we entered "ndc reload" to reload the zones.>>
> >>
> >> Apr 13 09:59:51 titania.ctd.anl.gov named[2559]: reloading nameserver
> >> Apr 13 09:59:52 titania.ctd.anl.gov named[2559]: hint zone ""
> >>    (in) loaded (serial 0)
> >> Apr 13 09:59:56 titania.ctd.anl.gov named[2559]:
> >>    slave zone "anl.gov" (in) loaded (serial 2001041200)
> >> Apr 13 09:59:57 titania.ctd.anl.gov named[2559]:
> >>    slave zone "bio.anl.gov" (in) loaded (serial 2001041100)
>
> Kevin Darcy replied:
>
> >Barry, please learn to strip out the "noise" from these logs. Anything to do with
> >"lame"ness is probably not your problem, and NOTIFY- or AXFR-related messages are
> >basically just informational. After all that junk is ignored, basically what we're
> >left with is a bunch of db_load errors, followed by loss of hints information and
> >catatonia, at approximately 9:20:58. Did you reload the nameserver at that
> >time? Looks like the files "disappeared" during the reload. Are you NFS-mounting
> >your DNS directory? Having disk problems? Everything here points to some sort of
> >filesystem or disk problem, not a DNS problem, _per_se_. Without access to any
> >zonefiles on a reload, named understandably went berserk...
>
> And I now reply:
>
> I left the records in the log to show that BIND 8.2.3-REL appeared to
> be operating normally until the db_load errors.  The server was not
> reloaded manually at that time.  We had another occurrence Tuesday
> morning at 6:20AM (when no one was around).  The DNS directory is
> not NFS-mounted, and there are no disk problems that are evident
> anywhere else in the logs.  The OS is
>
>      SunOS 5.6 Generic_105181-19 sun4m sparc SUNW,SPARCstation-5
>
> If there had been a disk problem, then I would have expected the
>
>      ndc reload
>
> to have failed also, as that command tells BIND to re-read its
> configuration and all of its zone files on disk.  In the Tuesday
> failure, I see similar log lines at the time of failure:
>
>      Apr 17 06:21:06 titania.ctd.anl.gov named-xfer[3552]:
>        send AXFR query 0 to 146.137.96.48
>      Apr 17 06:21:06 titania.ctd.anl.gov named[2559]:
>        db_load could not open: _sites.anl.gov: No such file or directory
>      Apr 17 06:21:06 titania.ctd.anl.gov named[2559]:
>        db_load could not open: _sites.anl.gov: No such file or directory
>      Apr 17 06:21:06 titania.ctd.anl.gov named[2559]:
>        db_load could not open: named.anlgov: No such file or directory
>      Apr 17 06:21:06 titania.ctd.anl.gov named[2559]:
>        db_load could not open: named.anlgov: No such file or directory
>      Apr 17 06:21:06 titania.ctd.anl.gov named[2559]:
>        db_load could not open: named.root: No such file or directory
>      Apr 17 06:21:06 titania.ctd.anl.gov named[2559]:
>        db_load could not open: named.root: No such file or directory
>      Apr 17 06:21:07 titania.ctd.anl.gov named[2559]:
>        sysquery: nlookup error on ?
>
> I have looked at the named-xfer.c source for the message
>
>      send AXFR query 0 to 146.137.96.48
>
> and I am not sure what the "0" signifies.

It signifies whether the transfer is compressed or not (ZXFR). 0 (the default) means
that it is not. I doubt that this has any bearing on your problem.

> The IP address in the message
> refers to our Win2k DNS box, and I am assuming (from the next message
> in the syslog) that the AXFR query was for the zone
>
>      Fri:  _sites.bio.anl.gov
>      Tue:  _sites.anl.gov
>
> I am wondering if this strange BIND behavior is due to the response that
> was returned to named-xfer from the Win2k DNS box.  We have eight
> "_" zones on that Win2k DNS box, and there are any number of successful
> zone transfers for those eight zones to this BIND DNS server.  We have
> not seen any failure on our other BIND 8.2.3-REL slave server, but with
> only two failures on one server, I do not have enough data to conclude
> that the problem will not occur on the other server.
>
> What I have done is write a shell script that is run every five minutes
> via cron.  If it sees a SERVFAIL line in the last 100 lines of the
> syslog, then it issues the "ndc reload" command and sends me a page.
> I need to add one more step to get a core dump before the reload,
> but I am not sure exactly what command would give the desired info.

It seems like named-xfer is returning a success code to its parent process (i.e.
named) but not actually creating the zone file. Very strange. I doubt that forcing a
coredump of named is going to help here, since the problem appears to be in named-xfer
rather than named. I have some ideas about how I'd go about trying to debug this
problem, but perhaps you'd be better off reporting it to bind-bugs. Maybe they've
already heard about something like this, or maybe they would have better ideas on how
to debug it...


- Kevin




More information about the bind-users mailing list