bind 9.10 fallback to tcp
Graham Clinch
g.clinch at lancaster.ac.uk
Thu Jul 9 00:21:47 UTC 2015
Hi Carl,
> I have a client with 9.10.2-P1-RedHat-9.10.2-2.P1.fc22 on Fedora 22, on
> a machine with a pppoe link with an mtu of 1492. The routers seem to be
> properly fragmenting udp - it can receive large packets such as
>
> dig www.byington.org +dnssec +bufsiz=4000 +notcp @205.147.40.34
>
> which says:
>
> ;; MSG SIZE rcvd: 3790
>
> However, a tcpdump for tcp port 53 shows a lot of traffic. In
> particular,
>
> rndc flushtree novell.com
> dig www.novell.com @localhost
>
> shows some tcp traffic to the .com servers. How does one isolate the
> query or server that is causing that fallback to tcp?
We saw a similar jump in TCP traffic with 'cold' (not much in the cache)
resolvers after switching from 9.9 to 9.10. The cause seems to be a
change to the way edns sizes are advertised to unknown servers. The
gory details are in the ARM for the 'edns-udp-size' option, but here's a
simplified version:
In 9.9, edns-udp-size is advertised initially, and only after problems
is it reduced to 512 bytes.
In 9.10, edns-udp-size sets the *maximum* size that could be advertised,
but the first query uses 512 and then it grows up as successes occur.
The 'Address database dump' section of a cache dump (rndc dumpdb -cache)
has 'udpsize' notes along with edns success rates:
; [edns success/4096 timeout/1432 timeout/1232 timeout/512 timeout]
; [plain success/timeout]
; 148.88.65.105 [srtt 1489] [flags 00006000] [edns 50/0/0/0/0] [plain
0/0] [udpsize 1757] [ttl 173]
though I'm not clear what udpsize is really reflecting here since it has
many different values (not just 512, 1232, 1432 & 4096, as I would
expect from the ARM).
We see a freshly restarted (validating) 9.10 resolver need to make many
TCP connections before returning its first answer, but things settle
after it's got comfortable using larger edns sizes with the root & tld
servers.
Graham
More information about the bind-users
mailing list