BIND 8.2.3-REL Problem "db_load could not open..."

Barry Finkel b19141 at achilles.ctd.anl.gov
Thu Apr 19 12:51:30 UTC 2001


I wrote (in part):

>> Apr 13 09:20:53 titania.ctd.anl.gov named[2559]:
>>    Lame server on '14.249.26.209.in-addr.arpa'
>>    (in '249.26.209.in-addr.arpa'?): [205.160.188.2].53 'dns2.utelfla.com'
>> Apr 13 09:20:58 titania.ctd.anl.gov named-xfer[19924]:
>>    send AXFR query 0 to 146.137.96.48
>> Apr 13 09:20:58 titania.ctd.anl.gov named[2559]: db_load could not open:
>>    named._sites.bio: No such file or directory
>> Apr 13 09:20:58 titania.ctd.anl.gov named[2559]: db_load could not open:
>>    named._sites.bio: No such file or directory
>> Apr 13 09:20:58 titania.ctd.anl.gov named[2559]: db_load could not open:
>>    named.bio: No such file or directory
>> Apr 13 09:20:58 titania.ctd.anl.gov named[2559]: db_load could not open:
>>    named.bio: No such file or directory
>> Apr 13 09:20:58 titania.ctd.anl.gov named[2559]: db_load could not open:
>>    named.anlgov: No such file or directory
>> Apr 13 09:20:58 titania.ctd.anl.gov named[2559]: db_load could not open:
>>    named.anlgov: No such file or directory
>> Apr 13 09:20:58 titania.ctd.anl.gov named[2559]: db_load could not open:
>>    named.root: No such file or directory
>> Apr 13 09:20:58 titania.ctd.anl.gov named[2559]: db_load could not open:
>>    named.root: No such file or directory
>> Apr 13 09:20:59 titania.ctd.anl.gov named[2559]: No root nameservers for class IN
>> Apr 13 09:20:59 titania.ctd.anl.gov named[2559]: sysquery:
>>    findns error (SERVFAIL) on dns1.anl.gov?
>> ----------------------------------------------------------------------
>> MANY lines of messages have been deleted here.
>> ----------------------------------------------------------------------
>> Apr 13 09:59:50 titania.ctd.anl.gov named[2559]: sysquery:
>>    findns error (SERVFAIL) on ns2.es.net?
>> Apr 13 09:59:50 titania.ctd.anl.gov named[2559]: sysquery: nlookup error on ?
>> Apr 13 09:59:50 titania.ctd.anl.gov last message repeated 10 times
>>
>> <<Here we entered "ndc reload" to reload the zones.>>
>>
>> Apr 13 09:59:51 titania.ctd.anl.gov named[2559]: reloading nameserver
>> Apr 13 09:59:52 titania.ctd.anl.gov named[2559]: hint zone ""
>>    (in) loaded (serial 0)
>> Apr 13 09:59:56 titania.ctd.anl.gov named[2559]:
>>    slave zone "anl.gov" (in) loaded (serial 2001041200)
>> Apr 13 09:59:57 titania.ctd.anl.gov named[2559]:
>>    slave zone "bio.anl.gov" (in) loaded (serial 2001041100)

Kevin Darcy replied:

>Barry, please learn to strip out the "noise" from these logs. Anything to do with
>"lame"ness is probably not your problem, and NOTIFY- or AXFR-related messages are
>basically just informational. After all that junk is ignored, basically what we're
>left with is a bunch of db_load errors, followed by loss of hints information and
>catatonia, at approximately 9:20:58. Did you reload the nameserver at that
>time? Looks like the files "disappeared" during the reload. Are you NFS-mounting
>your DNS directory? Having disk problems? Everything here points to some sort of
>filesystem or disk problem, not a DNS problem, _per_se_. Without access to any
>zonefiles on a reload, named understandably went berserk...

And I now reply:

I left the records in the log to show that BIND 8.2.3-REL appeared to
be operating normally until the db_load errors.  The server was not
reloaded manually at that time.  We had another occurrence Tuesday
morning at 6:20AM (when no one was around).  The DNS directory is
not NFS-mounted, and there are no disk problems that are evident
anywhere else in the logs.  The OS is 

     SunOS 5.6 Generic_105181-19 sun4m sparc SUNW,SPARCstation-5

If there had been a disk problem, then I would have expected the 

     ndc reload

to have failed also, as that command tells BIND to re-read its 
configuration and all of its zone files on disk.  In the Tuesday
failure, I see similar log lines at the time of failure:

     Apr 17 06:21:06 titania.ctd.anl.gov named-xfer[3552]:
       send AXFR query 0 to 146.137.96.48
     Apr 17 06:21:06 titania.ctd.anl.gov named[2559]: 
       db_load could not open: _sites.anl.gov: No such file or directory
     Apr 17 06:21:06 titania.ctd.anl.gov named[2559]: 
       db_load could not open: _sites.anl.gov: No such file or directory
     Apr 17 06:21:06 titania.ctd.anl.gov named[2559]: 
       db_load could not open: named.anlgov: No such file or directory
     Apr 17 06:21:06 titania.ctd.anl.gov named[2559]: 
       db_load could not open: named.anlgov: No such file or directory
     Apr 17 06:21:06 titania.ctd.anl.gov named[2559]: 
       db_load could not open: named.root: No such file or directory
     Apr 17 06:21:06 titania.ctd.anl.gov named[2559]: 
       db_load could not open: named.root: No such file or directory
     Apr 17 06:21:07 titania.ctd.anl.gov named[2559]:
       sysquery: nlookup error on ?

I have looked at the named-xfer.c source for the message

     send AXFR query 0 to 146.137.96.48

and I am not sure what the "0" signifies.  The IP address in the message
refers to our Win2k DNS box, and I am assuming (from the next message
in the syslog) that the AXFR query was for the zone

     Fri:  _sites.bio.anl.gov
     Tue:  _sites.anl.gov

I am wondering if this strange BIND behavior is due to the response that
was returned to named-xfer from the Win2k DNS box.  We have eight
"_" zones on that Win2k DNS box, and there are any number of successful
zone transfers for those eight zones to this BIND DNS server.  We have
not seen any failure on our other BIND 8.2.3-REL slave server, but with
only two failures on one server, I do not have enough data to conclude
that the problem will not occur on the other server.

What I have done is write a shell script that is run every five minutes
via cron.  If it sees a SERVFAIL line in the last 100 lines of the
syslog, then it issues the "ndc reload" command and sends me a page.
I need to add one more step to get a core dump before the reload,
but I am not sure exactly what command would give the desired info.
----------------------------------------------------------------------
Barry S. Finkel
Electronics and Computing Technologies Division
Argonne National Laboratory          Phone:    +1 (630) 252-7277
9700 South Cass Avenue               Facsimile:+1 (630) 252-9689
Building 221, Room B236              Internet: BSFinkel at anl.gov
Argonne, IL   60439-4844             IBMMAIL:  I1004994



More information about the bind-users mailing list