Bind 8.2.3, query-restart on expired NS and A record

Mon Oct 8 22:28:13 UTC 2001

Folks,

I'm a bit perplexed.. I'm ready to attribute this to bind 8.2.3 stub
resolver limitations, but given that there are so many of them out
there I'm really looking for a way at least around the problem instead
of just upgrading. My responsibility is on the server side, so I can't
upgrade everybody's stub. Towards the bottom of the post I've
mentioned where I think the problem is but I haven't found any mention
of it in the archives.

I'm desperately trying *not* to trigger the bind8 query restart limitation.

I'm trying to publish an A record.. for my example I've setup
gluetest.limey.net.. it's an A record with a 10 second TTL. (again,
just for example purposes).

starting from scratch (empty stub resolver) everything is fine..

1: [a-m].root-servers.net delegates .net with glue to
   [a-m].gtld-servers.net.. big TTL

2: [a-m].gtld-servers.net delegates limey.net with glue to
   ayup.limey.net and sidehack.gweep.net.. big TTL

3: the .limey.net nameservers delegate gluetest.limey.net to
   px.limey.net.. smaller TTLs (for illustrative purposes)..  the NS
   has a 101 second TTL and the A for px.limey.net has a 241 second
   TTL

4: the gluetest.limey.net nameserver (px.limey.net) responds with an
   A record for gluetest.limey.net - the A record has a 10 second TTL.

as I said, on first query everything is fine.. it takes 4 RTTs (ouch)
but we all complete in a straightforward fashion.

If we tried a query after 15 seconds, the stub resolver still has a
fresh ns record for gluetest.limey.net and does the right thing - it
asks px.limey.net who happily responds.

If, instead of querying after 15s we query after 115s everything is
still ok - the stub queries the limey.net nameservers for
gluetest.limey.net - it gets a fresh NS delegation of px.limey.net -
it still has a valid A record for that - it asks px.limey.net and an A
is resolved.

If, instead of either of the above scenarios, we requery at t=150 and
t=300 seconds we are still ok.. t=150 proceeds exactly like the last
paragraph.. at t=300 the roles are reversed: there is a fresh NS of
gluetest.limey.net delegated to px.limey.net but px.limey.net's A has
gone stale.. the resolver fixes that by querying limey.net's
nameservers and off we go.. no problems yet.

However, if we don't make any requery until t=300 I somehow trigger
bind 8's query restart limitation.. how I don't fully understand.

under this scenario the following RR's are all stale:

gluetest.limey.net  A	some IP
gluetest.limey.net  NS	px.limey.net
px.limey.net	    A	some other IP

but the following RR's are fresh:

limey.net	  NS   ayup.limey.net
limey.net	  NS   sidehack.gweep.net
ayup.limey.net	    A    some IP
sidehack.gweep.net  A	 some IP

according to tcpdump traces, when the stub resolver gets a request to
lookup the A of gluetest.limey.net it sends a A? query to 
sidehack or ayup.. that makes sense as they are authoritative for
limey.net which is still a fresh RR.. They respond with a fresh
"gluetest.limey.net is delegated to px.limey.net" authoritative
response, and the px.limey.net A record which they are of course
authoritative for (as it's in limey.net)..

here's the weird part: the stub resolver then issues a A? query for
px.limey.net.. It never did this when starting from scratch.. this
looks to me exactly like the 'poison detection' behavior if for
example the delegation was to px.example.com.. but I wouldn't think
this qualifies as poison (you're querying this server because it is
authoritative for limey.net and it gave you a host in limey.net.. not
really a shocker).. not to mention that _when the cache was empty, it
agreed with me - it never sent an explicit px.limey.net resolution
query_..

in any event this of course is the bind query restart behavior at this
point (where it has to issue queries to find A's of NS records) and
that's a well known bind8 limitation. (if you don't know: essentially it
drops the gluetest.limey.net request on the floor in favor of the pursuit
of px.limey.net.. the assumption is that the client will ask about
gluetest.limey.net again soon and by then an A for px.limey.net will
be in cache.. but soon is 5 seconds - which is what I'm trying to
avoid.)

some bind logs:

req: found 'gluetest.limey.net' as 'gluetest.limey.net' (cname=0)
stale: ttl 1002571133 -7 (x2)
delete_all(0x80ef0e0:"gluetest" IN A)

nslookup(nsp=0xbfbfeaf8, qp=0x810f000, "gluetest.limey.net")
nslookup: NS "AYUP.limey.net" c=1 t=2 (flags 0x2)
nslookup: NS "SIDEHACK.GWEEP.net" c=1 t=2 (flags 0x2)
nslookup: 2 ns addrs total

forw: forw -> [65.105.101.18].53 ds=4 nsid=1531 id=37409 18ms retry 4sec
Response (USER NORMAL -) nsid=1531 id=37409
gluetest.limey.net.     1m41s IN NS     px.limey.net.
px.limey.net.		4m1s IN A	204.168.16.17

rrextract: dname gluetest.limey.net type 2 class 1 ttl 100
rrextract: dname px.limey.net type 1 class 1 ttl 100

rrsetupdate: gluetest.limey.net
rrsetcmp: no records in database
rrsetupdate: gluetest.limey.net 0
rrsetupdate: gluetest.limey.net 0

rrsetupdate: px.limey.net
rrsetcmp: rrsets matched
[ ** I think this is an issue ** ]

db_update(gluetest.limey.net, 0x810c1f8, 0x810c1f8, 0, 031, 0x80feca0)
db_update: adding 0x810c1f8
[** so we write the NS record, but not the included A for
    px.limey.net.. I think this is because of that "rrsets matched"
    message - we've already got a db entry for it and don't want to
    overwrite.. however it is stale as we'll see **]

resp: nlookup(gluetest.limey.net) qtype=1
resp: found 'gluetest.limey.net' as 'gluetest.limey.net' (cname=0)
wanted(0x810c1f8, IN A) [IN NS]
findns: 1 NS's added for 'gluetest'
ns_resp: ns AYUP.limey.net rcnt 1 (busy)
ns_resp: nsdata 65.105.101.18 rcnt 1 (busy)
ns_resp: ns SIDEHACK.GWEEP.net rcnt 1 (busy)
ns_resp: nsdata 204.145.148.154 rcnt 1 (busy)
nslookup(nsp=0xbfbff7d0, qp=0x810f000, "gluetest.limey.net")
nslookup: NS "px.limey.net" c=1 t=2 (flags 0x2)
stale: ttl 1002571126 -15 (x2)
[** we can't contact the NS because our stored A is stale.. so we have
    look it up which causes a query restart timeout.. but we shouldn't
    have to look it up except for the fact that we just declined to
    add it to our cache! **]

delete_all(0x80e5364:"px" IN A)

----------------------------------------------------------------

anybody got anything to share here? (even if its a "right on, we all
knew that but you've diagnosed it correctly")..

the net effect is that any requery that takes place after an idle
period greater than maxttl (NS gluetest.limey.net, A px.limey.net)
will timeout.. while initial queries will not..

Production will of course have bigger TTLs which will mitigate this
somewhat.. but they will still be on the order of hrs/days and idle
periods of that duration are common. I realize I can set the TTL on
px's A record to be gigantic and if I need to renumber it I can just
point the NS record at a different A but that will cause cache
pollution if nothing else (it might be the best that can be done)

thoughts?

-Pat