forwarder cache

Tue Nov 29 21:05:29 UTC 2022

I have a sort of similar configuration to this in my home network.  I
have two recursive servers and two "authoritative" servers (for a
domain I call "mylocal" which has forward and also in.addr.arpa for my
inside network).  These are all running on one Intel NUC.  The only
difference is that my "authoritative" servers are not running at
127.0.0.1 but rather 192.168.40.142 and 40.182.  Recursive servers are
at 192.168.40.42 and 40.82.  Had to use the "forwarders" statement to
send lookups for the local domain and reverse for local IPs.  Another
difference is that these are all running in chroot jails with separate
directories.  Last difference is I don't have any settings regarding
stale entries.  I can shut off the "authoritative' servers and the
recursive servers will still answer questions about "mylocal" hosts
and in.addr.arpa queries as long as they had previously looked up such
answers.    I'm not sure why it isn't working for you.  I do have my
forwarders setup differently (ie: I have them only on a per domain
level instead of at the options level).  Example:

zone "40.168.192.in-addr.arpa" {
  type forward;
  // this defines the addresses of the resolvers to
  // which queries for this zone will be forwarded
  forwarders {
    192.168.40.142;
    192.168.40.182;
  };
  // indicates all queries for this zone will be forwarded
  forward only;
};
zone "mylocal" {
  type forward;
  // this defines the addresses of the resolvers to
  // which queries for this zone will be forwarded
  forwarders {
    192.168.40.142;
    192.168.40.182;
  };
  // indicates all queries for this zone will be forwarded
  forward only;
};

The reason I did it that way was that I didn't think it would make
sense to send other recursive queries to the "authoritative" servers
that won't have an answer and have no way to get an answer for
"www.microsoft.com", for example.  Not sure how that would make a
difference for the problem you are having, however.

On Tue, Nov 29, 2022 at 12:47 PM Hamid Maadani <hamid at dexo.tech> wrote:
>
> Hi there,
>
> I am running two instances of named on the same server (BIND 9.16.33 on alpine 3.16). They are running using completely separate config directories, and they have separate work directories as well as control ports. Let's call them NS1 and NS2.
>
> NS1 is a forwarding instance. It listens on any:53 and forwards all requests to 127.0.0.1:153
> NS2 is a normal bind9 instance. It has one zone (test.com), and listens to 127.0.0.1:153
>
> My understanding is, when NS1 receives a request for "test.com", it will initially forward that query to NS2 for resolution, and then cache the result in memory for TTL of that record. The next request coming in for "test.com", should be served from in-memory cache of NS1, and NS2 should be out of the picture.
>
> Based on that, I am running some tests. Initial dump of NS1's memory shows an empty cache:
> / # cat /var/cache/ns1/named_dump.db
> ;
> ; Start view _default
> ;
> ;
> ; Cache dump of view '_default' (cache _default)
> ;
> ; using a 86400 second stale ttl
> $DATE 20221129172701
> ;
> ; Address database dump
>
> Next, I send an A record request for test.com to NS1, which returns the correct result. Dumping the cache:
> ;
> ; Start view _default
> ;
> ;
> ; Cache dump of view '_default' (cache _default)
> ;
> ; using a 86400 second stale ttl
> $DATE 20221129172835
> ; authanswer
> ; stale
> test.com. 86390 IN A 10.10.10.10
> ;
> ; Address database dump
>
> Which shows that the A record is cached by NS1 at this point, and should be valid for the next 86390 seconds.
> The next test would be to kill NS2, and query the record. Desired outcome would be NS1 resolving the query, without the need for NS2.
> After killing NS2 however, NS1 fails to resolve the query. Looking at NS1 cache:
> ;
> ; Start view _default
> ;
> ;
> ; Cache dump of view '_default' (cache _default)
> ;
> ; using a 86400 second stale ttl
> $DATE 20221129173157
> ; authanswer
> ; stale
> test.com. 86188 IN A 10.10.10.10
> ;
> ; Address database dump
>
> Which shows me that the cache still exists and is valid. Looking at the logs:
> 29-Nov-2022 17:31:52.014 serve-stale: info: test.com resolver failure, stale answer unavailable
> 29-Nov-2022 17:31:52.014 query-errors: info: client @0x7feeb7f1b308 192.168.56.1#59506 (test.com): query failed (SERVFAIL) for test.com/IN/A at query.c:5871
>
> which tells me the query fails, because the stale result is unavailable.
> in NS1's config, I have:
> options {
> listen-on port 53 { any; };
> listen-on-v6 { none; };
>
> directory "/var/cache/ns1";
>
> recursion yes;
> allow-transfer { none; };
> allow-query { any; };
>
> forwarders {
> 127.0.0.1 port 153;
> };
> forward only;
>
> stale-answer-enable yes;
> stale-answer-ttl 300;
>
> dnssec-validation yes;
>
> statistics-file "/var/run/named.ns1.stats";
>
> auth-nxdomain no;
> };
>
> Two questions about this situation:
> 1. Why would the test.com entry in cache be stale, if the TTL has not expired yet? The ideal scenario would be for the forwarder not to reach out to NS2 unless necessary. Am I not understanding the stale record concept correctly?
> 2. Why is the stale answer not available in this scenario, even though stale answers are enabled and the cache exists and is valid? Am I missing some config part?
>
> Any help would be appreciated.
>
> Regards
> Hamid Maadani
> --
> Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list
>
> ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information.
>
>
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users