Name resolution failure on a caching server -- many '; pending-answer' records in the cache

TPCbind at mklab.ph.rhul.ac.uk TPCbind at mklab.ph.rhul.ac.uk
Wed Jan 27 01:01:46 UTC 2016


Dear All,
     I run a caching server on a section of the departmental LAN.
Occasionally network congestion results in timeouts & name resolution
failures.  Lookups performed on name servers outside my LAN section
fail with NXDOMAIN.  Querying my name server for items not in its
cache gets the same result.

My problem is that long after the congestion has subsided, queries to
my name server still result in NXDOMAIN failure.  AFAICT this
situation remains indefinitely, until the cache is flushed 'rndc
flush' or the bind restarted.  When it is in this state dumping the
cache with 'rndc dumpdb' shows numerous entries like this,

--------------------------------------------------------------------------------------------
; pending-additional
thdow.bbc.co.uk.        76632   NS      ns3.bbc.net.uk.
                        76632   NS      ns4.bbc.co.uk.
                        76632   NS      ns4.bbc.net.uk.
                        76632   NS      ns3.bbc.co.uk.
; pending-answer
ns0.thdow.bbc.co.uk.    2082    \-AAAA  ;-$NXRRSET
; thdow.bbc.co.uk. SOA ns.bbc.co.uk. hostmaster.bbc.co.uk. 2015122100 1800 600 864000 86400
; pending-answer
                        76632   A       212.58.240.162
; pending-answer
www.bbc.co.uk.          30      CNAME   www.bbc.net.uk.
; glue
--------------------------------------------------------------------------------------------

and attempts to lookup eg. www.bbc.co.uk result in NXDOMAIN.

Browsing the documentation I noticed the parameter 'max-ncache-ttl'
which is unset in my named.conf and apparently defaults to 3hours.
However the problem persists long after 3hours has elapsed following
incidents of network congestion.

I could setup a cronjob to check name resolution on external domains
and flush the cache when it fails?  I am assuming there must be better
solution!  Should I set max-ncache-ttl to something fairly short in my
named.conf and hope that the default value is for some reason actually
>> 3hours?

BTW I there a way to dump out all the parameters from a running named
-- just to see all their values ?


Any ideas on how to solve or further diagnose the problem?

Many thanks
Tom Crane

System details:
OS:    Scientific Linux CERN SLC release 6.7 (Carbon) [NB: SLC is a derivative of RHEL]
BIND:  bind-9.8.2-0.37.rc1.el6_7.5.x86_64

Ps. I originally posted in Usenet NG comp.protocols.dns.bind but 
got no followups and then noticed all messages in that NG had this 
ML's fields 'NNTP-Posting-Host: lists.isc.org' and 'X-Original-To: 
bind-users at lists.isc.org' etc. in their headers.  Is c.p.d.b 
actually a moderated group now or exclusively tied to this ML via 
a mail2news gateway?

-- 
Tom Crane, Dept. Physics, Royal Holloway, University of London, Egham Hill,
Egham, Surrey, TW20 0EX, England.
Email:  T dot Crane at rhul dot ac dot uk



More information about the bind-users mailing list