BIND9 don't query specific nameserver with IPv4 address.

Wed Jun 23 15:37:33 UTC 2004

>>>>> On Wed, 23 Jun 2004 19:45:52 +0900 (JST), 
>>>>> Daisuke Koike <daisukek at tkd.att.ne.jp> said:

> I have a problem when using BIND-9.2.3 as a cache server.

> When I'll resolve RRs of specific domain, sometimes it seems that BIND9 query
> that nemeserver only with IPv6, though that nameserver has both IPv4 and IPv6
> addresses.
> # I checked by tcpdump and trace logs, and thought so
> The cache server don't have IPv6 reachability, so the query fails.

> The domain is "sm.sony.co.jp" and the problem can reproduce on my box as
> follows.

I guess there are no meaningful IPv6 routes (particularly including
the default route) in the server machine.

Based on the assumption, it seems the following steps happened:

1. the cache server gets the A and AAAA RRs for widefw.csl.sony.co.jp.
2. when the first time the server tries to send queries to widefw, it
   sets up ADB entries for both the IPv4 and IPv6 addresses.  RTTs for
   the addresses are randomly initialized.
3. a query is sent to the address that has a smaller initial RTT.  In
   the problematic case, it's the IPv6 address.
4. since the server does not have a route to the destination, sending
   the UDP query immediately fails (at least on a BSD box), and so the
   server records the error in the corresponding socket event.
5. the server then immediately cancels the query session (see line 689
   of lib/dns/resolver.c)
6. however, the ADB entries are intact according to the logic of
   resolver.c:fctx_cancelquery() in this context.  Thus, the
   preference between the two ADB entries do not change despite the
   failure.
7. eventually, a timeout at the fctx level occurs, and steps from 3 to
   6 are repeated.
8. there are actually several chances to reset the ADB entries
   according to the log.  Unfortunately, however, the IPv6 address
   gets the smaller initial RTT every time in this session.
9. finally, dig gives up waiting a response.

This is definitely a bad behavior, can happen for many other users, so
we need to provide a way to mitigate the problem (at least).

An easy workaround would be to rebuild the server with --disable-ipv6,
which seems acceptable in your case (except for the overhead of
rebuilding and restarting the server).

Perhaps it's better to adjust RTT when we fail sending a query.  The
patch below will do this, and I confirmed that this worked in that
we only needed a single timeout (per each unreachable address) even in
the worst case.  However, I'm not sure if this is the correct fix in
terms of the BIND9 architecture.  Perhaps the original code did not
want to modify a status of a remote end just due to an internal error,
which might be a temporary one.  If so, I don't have an idea to fix
the problem in the implementation level right now.

					JINMEI, Tatuya
					Communication Platform Lab.
					Corporate R&D Center, Toshiba Corp.
					jinmei at isl.rdc.toshiba.co.jp

p.s. as far as I can see (and according to some test results) the same
problem can happen for 9.2.4rc5 and 9.3.0rc1 as well.

--- /tmp/bind-9.2.3/lib/dns/resolver.c	Mon Sep 22 09:32:39 2003
+++ resolver.c	Thu Jun 24 00:27:19 2004
@@ -686,7 +686,7 @@
 			resquery_destroy(&query);
 		}
 	} else if (sevent->result != ISC_R_SUCCESS)
-		fctx_cancelquery(&query, NULL, NULL, ISC_FALSE);
+		fctx_cancelquery(&query, NULL, NULL, ISC_TRUE);
 
 	isc_event_free(&event);
 }