Bind 8.2.3, query-restart on expired NS and A record

Wed Oct 10 13:31:30 UTC 2001

[Mark.Andrews at isc.org: Wed, Oct 10, 2001 at 10:19:13AM +1000]
> 
> > does a bind 8.2.3 stub resolver ever overwrite existing cache entries
> > with records received from an additional section for the same record?
> 
> 	Well given that the stub resolver doesn't have a cache this
> 	does not make sence the way it was written.

you and kevin are right here: I meant a recursive server. sorry for
the confusion.

>	The nameserver does not refresh TTL based on answers it
> 	receives (though earlier versions did creating server lock).

I think this is the source of my problem.. Bind 8 does not refresh
TTLs even for expired records - expired records must first be
explicitly deleted before a record with a fresh TTL will take their
place.. I think this can be problematic with respect to glue
records. I'll back it up with a bind trace in a second.

But the setting is this: the cache has a stale A record for
gluetest.limey.net, it also has a stale NS record for
gluetest.limey.net (that delegates to px.limey.net), and it also has a
stale A record for px.limey.net. It has 2 fresh NS records for
limey.net (delegating to sidehack.gweep.net and ayup.limey.net) as
well as a fresh A records for sidehack and ayup.

[** First we find the A for gluetest.limey.net, figure out that it's
    stale and delete it. **]
req: found 'gluetest.limey.net' as 'gluetest.limey.net' (cname=0)
stale: ttl 1002571133 -7 (x2)
delete_all(0x80ef0e0:"gluetest" IN A)

[** The best we can do with what is fresh is to contact the limey.net
     nameservers. They reply with an NS record and a glue record **]
nslookup(nsp=0xbfbfeaf8, qp=0x810f000, "gluetest.limey.net")
nslookup: NS "AYUP.limey.net" c=1 t=2 (flags 0x2)
nslookup: NS "SIDEHACK.GWEEP.net" c=1 t=2 (flags 0x2)
nslookup: 2 ns addrs total
forw: forw -> [65.105.101.18].53 ds=4 nsid=1531 id=37409 18ms retry
4sec
Response (USER NORMAL -) nsid=1531 id=37409
gluetest.limey.net.     1m41s IN NS     px.limey.net.
px.limey.net.           4m1s IN A       204.168.16.17
rrextract: dname gluetest.limey.net type 2 class 1 ttl 100
rrextract: dname px.limey.net type 1 class 1 ttl 100

[** Now, as I understand it, we check the extracted records against
  the existing cache.. The NS records doesn't match anything (we just
  deleted it, but px.limey.net matches an existing record. That record
  is stale, but we pay no heed **]
rrsetupdate: gluetest.limey.net
rrsetcmp: no records in database
rrsetupdate: gluetest.limey.net 0
rrsetupdate: px.limey.net
rrsetcmp: rrsets matched

[** we now write the NS to the cache.. but not the A. When I run this
    same trace on a clean cache it writes both the NS and the A here. **]
db_update(gluetest.limey.net, 0x810c1f8, 0x810c1f8, 0, 031, 0x80feca0)
db_update: adding 0x810c1f8

[** we now return to the business of following that delegation - but 
    the server can't find the px A record "wanted!" **]
resp: nlookup(gluetest.limey.net) qtype=1
resp: found 'gluetest.limey.net' as 'gluetest.limey.net' (cname=0)
wanted(0x810c1f8, IN A) [IN NS]

we now need to start a query for px.limey.net - that triggers the
query restart behavior and timeouts which started all of this in the
first place.. The problem seems to be that in order to get a TTL
updated you have to be explicitly deleted - and deletions only happen
when the cache has already looked up stale data.. because glue records
are really hints about future queries, they are not pre-emptively
deleted.

-P