Happy holidays from ISC!
ISC is fortunate to have staff members in so many different countries around the world: our software development benefits from all the different perspectives - and we benefit personally!
Read postRecursive DNS Servers administrators have for many years been advised to ensure that both the servers that they are running and the network environments wherein those servers reside are RFC-compliant. This is to ensure the best possible outcome when handling client queries.
While some older DNS implementations and/or mis-configured servers still fail to adhere to current standards, there are two situations in particular where load-balancers, firewalls, routers and other network infrastructure devices sometimes fail to allow uninterrupted transit of DNS queries and query responses.
BIND recursive servers operating in broken environments as detailed above will still be able to provide adequate answers to clients most of the time, although there will be some failures or resolution delays, particularly if neither TCP nor EDNS is possible. Clients of such servers will not be able to perform DNSSEC-validation locally.
DNS servers should be able to make and respond to queries using both TCP and large UDP packet sizes (with EDNS)
Clients using DNS servers that themselves don’t fully support TCP or EDNS or which are running within environments that don’t permit transit of DNS traffic using TCP and EDNS will potentially have a degraded experience. There are a few BIND configuration workarounds that can be deployed to temporarily mitigate some of the impact of broken deployments like these, but the long term recommendation is that equipment that cannot be reconfigured to support modern DNS protocols should be replaced or upgraded.
When operating recursively, BIND 9 has a “fallback” process that it uses to test the capabilities of remote authoritative servers that it queries. When it “learns” what the path to those servers supports, it caches this information.
By default, all currently-supported versions of BIND request DNSSEC signatures when querying authoritative servers using TCP or UDP with EDNS. DNSSEC signatures cannot be requested using UDP without EDNS0.
For currently (and recently) supported versions of BIND up to and including BIND 9.9, the fallback algorithm for a “new” authoritative server operates as follows:
BIND 9.10 uses a slightly different process of tries and retries for EDNS-capable servers to determine the maximum size of UDP responses that it should request from them, but similar logic applies to whether or not queries will be tried without EDNS, and when TCP should be used.
BIND will resend a query over TCP immediately if it receives a response to that same query sent over UDP that has the TC (Truncated) bit set in the DNS response header. This mechanism is designed so that a server can flag to a client that it was unable to fit all of the necessary records into the maximum packet size that it was able to use for that response, therefore the client is receiving an incomplete answer. Although this is optional (per DNS standards), most clients receiving a response that is flagged as truncated will re-send the query using TCP in order to obtain the full response.
In these new versions of BIND, we reworked the fallback process so that if a server had previously responded using EDNS, then we would flag it as EDNS-capable, and would not subsequently retry queries without EDNS.
Unfortunately there is one rare scenario where this can cause new problems (although in that scenario, the server would already be experiencing other difficulties).
The requirements for there to be a new problem after upgrading are that all of the following are true:
In the situation above, because there was an answer that used EDNS with a buffer size of 512, the server is presumed to be EDNS-capable; therefore there will be no retry of the query without EDNS. Because TCP fails, and plain DNS is never attempted, it is impossible to obtain a non-truncated response from this server. The outcome will be that a SERVFAIL response is sent back to the requesting client.
Older versions of BIND would have retried the query without EDNS, and would have received a smaller, unsigned, non-truncated response from the server, which they would then have sent back to the requesting client instead of SERVFAIL.
The scenario above is a very unusual corner case that should only affect DNS server administrators whose network infrastructure is already badly broken.
Check your network infrastructure. Below are some of the problems that you might find:
All DNS queries and responses using TCP are being blocked/dropped.
If your network infrastructure is doing this, fix it now - it will be causing you other problems, not just this one. There are many authoritative servers today that respond to clients with the TC bit set for reasons other than response truncation, including those deploying Response Rate Limiting as part of their mitigation strategy for Denial of Service (DoS) attacks. All modern DNS recursive servers need to be able to use TCP when querying authoritative servers.
All DNS queries/responses using EDNS are being blocked/dropped.
If your network infrastructure is doing this, plan to fix it as soon as possible. Responses that are too large to fit in 512-byte UDP packets are increasingly common nowadays, even from zones that are not DNSSEC-signed. If your infrastructure doesn’t support DNS using EDNS then your servers will be falling back to TCP more often than necessary. As a short-term workaround (since you can’t use it anyway), you can disable the use of EDNS for all servers in your named.conf file:
server 0.0.0.0/0 { edns no; };
server ::/0 { edns no; };
server { edns no; };
server 0.0.0.0/0 { edns-udp-size 1432; };
server ::/0 { edns-udp-size 1232; };
Note that the settings suggested above are using values expected to work in most situations when UDP fragmentation is not supported. You may find that there are other factors at play in your network infrastructure that mean that you can use larger settings, or that instead you need to set smaller limits.
If you are unable to correct your network infrastructure or to use the configuration workarounds suggested above, patches are attached to this article for BIND 9.9.6-P1 and 9.10.1-P1. The patches add a final query without EDNS, as long as the resolver is not configured for DNSSEC validation.
One problem with the BIND fallback algorithm in prior versions of BIND 9 is that when BIND learns that a server doesn’t support EDNS or TCP, then it saves this information for next time it needs to query the same server.
In the case of intermittent network connectivity or packet losses, it can happen that a server that previously supported queries using EDNS, temporarily does not appear to respond properly unless queried without EDNS. When that happens, the earlier versions of BIND would flag the server as unable to support EDNS and would no longer use EDNS. Any queries thereafter sent to that server wouldn’t have the DO (DNSSEC OK) flag, so for DNSSEC-signed zones, the responses from that server would not include DNSSEC signatures. A DNSSEC-validating server receiving unsigned responses for a signed zone would regard them as spoofed, and answer with a SERVFAIL to the client. The domain would then effectively become unreachable for DNSSEC-validating clients.
In response to this problem, BIND changed the way it handles caching the capabilities of authoritative servers, so that if a server had previously demonstrated that it was EDNS-capable, BIND would store that information, and never query that server without EDNS.
A future change to BIND will reintroduce the query without EDNS as a last-ditch effort to obtain an answer to a query, so long as the resolver is not configured for DNSSEC validation. Subsequent queries to the same authoritative server would always be tried first with EDNS.
The proposed future change to BIND is being made purely to accommodate broken server environments in the short term - administrators are strongly urged to fix their configurations and infrastructure. The process of retries adds delays into resolution of queries for clients. Where there are many servers authoritative for a domain, named will progress through all of them and in some situations may fail to complete its fallback to no-EDNS before the client query times out. Recursive servers that do not support TCP are broken and will be having a negative impact on the experience of their clients. Recursive servers that do not support EDNS are limiting their own performance. Recursive servers that support limited EDNS sizes but which have not been appropriately configured are also limiting their own performance.
This article is a reprint of an article previously published in ISC’s Knowledgebase.
What's New from ISC