forwarder cache

Tue Nov 29 17:46:39 UTC 2022

Hi there,

I am running two instances of named on the same server (BIND 9.16.33 on alpine 3.16). They are running using completely separate config directories, and they have separate work directories as well as control ports. Let's call them NS1 and NS2.

NS1 is a forwarding instance. It listens on any:53 and forwards all requests to 127.0.0.1:153
NS2 is a normal bind9 instance. It has one zone (test.com), and listens to 127.0.0.1:153

My understanding is, when NS1 receives a request for "test.com", it will initially forward that query to NS2 for resolution, and then cache the result in memory for TTL of that record. The next request coming in for "test.com", should be served from in-memory cache of NS1, and NS2 should be out of the picture.

Based on that, I am running some tests. Initial dump of NS1's memory shows an empty cache:
/ # cat /var/cache/ns1/named_dump.db
;
; Start view _default
;
;
; Cache dump of view '_default' (cache _default)
;
; using a 86400 second stale ttl
$DATE 20221129172701
;
; Address database dump

Next, I send an A record request for test.com to NS1, which returns the correct result. Dumping the cache:
;
; Start view _default
;
;
; Cache dump of view '_default' (cache _default)
;
; using a 86400 second stale ttl
$DATE 20221129172835
; authanswer
; stale
test.com. 86390 IN A 10.10.10.10
;
; Address database dump

Which shows that the A record is cached by NS1 at this point, and should be valid for the next 86390 seconds.
The next test would be to kill NS2, and query the record. Desired outcome would be NS1 resolving the query, without the need for NS2.
After killing NS2 however, NS1 fails to resolve the query. Looking at NS1 cache:
;
; Start view _default
;
;
; Cache dump of view '_default' (cache _default)
;
; using a 86400 second stale ttl
$DATE 20221129173157
; authanswer
; stale
test.com. 86188 IN A 10.10.10.10
;
; Address database dump

Which shows me that the cache still exists and is valid. Looking at the logs:
29-Nov-2022 17:31:52.014 serve-stale: info: test.com resolver failure, stale answer unavailable
29-Nov-2022 17:31:52.014 query-errors: info: client @0x7feeb7f1b308 192.168.56.1#59506 (test.com): query failed (SERVFAIL) for test.com/IN/A at query.c:5871

which tells me the query fails, because the stale result is unavailable.
in NS1's config, I have:
options {
listen-on port 53 { any; };
listen-on-v6 { none; };

directory "/var/cache/ns1";

recursion yes;
allow-transfer { none; };
allow-query { any; };

forwarders {
127.0.0.1 port 153;
};
forward only;

stale-answer-enable yes;
stale-answer-ttl 300;

dnssec-validation yes;

statistics-file "/var/run/named.ns1.stats";

auth-nxdomain no;
};

Two questions about this situation:
1. Why would the test.com entry in cache be stale, if the TTL has not expired yet? The ideal scenario would be for the forwarder not to reach out to NS2 unless necessary. Am I not understanding the stale record concept correctly?
2. Why is the stale answer not available in this scenario, even though stale answers are enabled and the cache exists and is valid? Am I missing some config part?

Any help would be appreciated.

Regards
Hamid Maadani
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20221129/72856ed6/attachment.htm>