External resolution timeouts

Thu Aug 5 02:42:18 UTC 2004

Is your name server behind a firewall, especially a CheckPoint FW-1?

BIND 9 tries first DNS query with ENDS0 option and also CD flag set in
DNS message header.

The DNS Query message with this setting would be dropped by the
firewall, if such SmartDefence function is enabled on CheckPoint FW-1.

This firewall's operation will cause "timeout" instead of immediate
DNS response messages with error code from the authoritative name
servers.

According to below log data, there are several timeouts and then name
server detaches EDNS0 option out of DNS Query message..., not based on
Rcode of DNS response but based on too many 'timeouts'.

It would be helpful to check there is firewall, CheckPoint FW-1, and
if the firewall checks the DNS message flags, first.

In normal case, the first trial DNS query with EDNS0 and CD flag would
be responded immediately with DNS response message with RCODE of
"FORMERR", and then the name server would detache the EDNS0 option and
unset CD flag, then send new DNS query message again. This would
result in success operation.

The utility dig, in default, sends DNS query without EDNS0 and CD
flag.

If you check this situation with dig, try following commands on the
name server.

  dig @<target_nameserver> <domain_name> A +cdflag +buffsize=1024

  * +cdflag : setting CD flag in DNS query message header
  * +buffsize : attaching EDNS0 option in DNS query message

If my guessing - your name server is behind a firewall and the
firewall drops DNS query message - is right, the result of above
command will also timed out..., instead of receiving DNS response with
FORMERR.

Good Luck...

Justin.

"Jason L. Cook" <jason at siliconashes.net> wrote in message news:<cepaul$1vs9$1 at sf1.isc.org>...
> Hey everybody!
> 
> I'm having weird timeout issues when resolving names for which my local
> nameserver is not authoritative. If I try to resolve an external name using dig
> or nslookup, my query will almost always fail the first time. However, it will
> succeed either on the first or second successive try.
> 
> Debugging output on tracelevel 3 shows lots of:
> 
> resquery 0x810fb90 (fctx 0x80fb378): send
> resquery 0x810fb90 (fctx 0x80fb378): sent
> resquery 0x810fb90 (fctx 0x80fb378): senddone
> fctx 0x80f0e90: timeout
> fctx 0x80f0e90: try
> fctx 0x80f0e90: query
> resquery 0x810fe08 (fctx 0x80f0e90): send
> resquery 0x810fe08 (fctx 0x80f0e90): sent
> resquery 0x810fe08 (fctx 0x80f0e90): senddone
> fctx 0x80eb8d8: timeout
> fctx 0x80eb8d8: try
> fctx 0x80eb8d8: query
> resquery 0x8110080 (fctx 0x80eb8d8): send
> fctx 0x80eb8d8: too many timeouts, disabling EDNS0
> resquery 0x8110080 (fctx 0x80eb8d8): sent
> fctx 0x80edb70: timeout
> 
> I'm running BIND 9.2.2rc1 on Linux 2.4.20. Any ideas, hints, or nudges in the
> right direction would be appreciated. Thanks!