External Name Server Timeouts

Sun Mar 5 19:25:26 UTC 2006

On 03 Mar 2006, at 17:29 PST, Barry Margolin wrote:

> In article <du9v4v$2irf$1 at sf1.isc.org>,
>  Merton Campbell Crockett <m.c.crockett at adelphia.net> wrote:
>
>> I have an external master name server running BIND 9.3.1 on a SuSE
>> Linux 9.3 system.  Periodically, the name server stops responding to
>> DNS queries from the Internet.  In some instances when this occurs,
>> all external name servers will become unresponsive.
>>
>> During these incidents, the external name server appears to remain
>> responsive to DNS requests forwarded from internal name servers.
>> However, it is not clear if the responses to the internal DNS
>> requests are from the external name servers cache or not.  A quick
>> check using tcpdump seems to indicate that the name server is sending
>> DNS requests to the Internet but may not be receiving any responses.
>>
>> A "weird" element of these incidents is that they don't appear to
>> impact the ability of internal users to access the Internet, i.e.
>> there are no problem tickets being opened by users claiming to be
>> unable to access external systems.  The problem tickets that are
>> opened are from users on the road, at home, or at customer sites that
>> are unable to establish VPN connections.
>
> This suggests to me that something is blocking incoming packets with
> destination port 53, but allowing incoming packets to the high port
> that's used when BIND sends out recursive queries.
>
> If you run tcpdump during this, do you see any of the normal queries
> reaching the server?

One problem is that I don't have ready access to the system console  
and must access the system using SSH.  The additional SSH traffic  
results in tcpdump dropping more packets than it might otherwise.

Essentially, all that I see with tcpdump is the UDP packets from the  
system attempting to update the zone file.  I do know that other DNS  
traffic is getting through to the name server as I can log into  
remote external systems that are configured with tcpwrapper's  
PARANOID feature enabled.

Using "tcpdump -n udp and port 53 and net not <internal>", I do see  
some queries from other systems but "99.95 percent" of them appear to  
come from the system attempting to perform the update.  All packets  
appear to have the same source port number.

>> During the last incident, I noticed that the external name server was
>> being inundated with DNS requests to update the external zone file
>> from an IP address, 196.25.255.194, assigned to an Internet Service
>> Provider in Zaire.  Dynamic updates are not permitted but the name
>> server appears to be going through all the steps needed to perform a
>> dynamic update before rejecting the request.  Log entries indicate
>> that the check for no existing RRset entries succeeded before
>> reporting that the update request was denied.
>
> I suspect this is because it has to determine which zone is being
> updated, so that it can check that zone's "allow-update" setting.  It
> would probably be nice if the code special-cased the situation  
> where NO
> zones allow updating, and skipped all these unnecessary checks.
>
>>
>> I am attempting to eliminate BIND 9 as the cause of the DNS
>> timeouts.  My suspicion is that the DNS timeouts are being caused by
>> the SideWinder G2 firewall that was installed in November.
>
> Maybe some kind of IDS/IPS function of the firewall, throttling  
> incoming
> DNS queries when they get too frequent?

A curious fact that lends credence to this involves how the incident  
can be cleared.

The version of rndc included in the SuSE distribution claims that the  
"restart" option is not implemented.  As a result, I use SuSE's YaST  
module to stop and start BIND.  This appears to be significantly  
slower than the "ndc restart" that I use on my older BSD/OS systems  
running BIND 8.

After stopping and starting named, tcpdump no longer reports any UDP  
traffic coming from the remote system.  It almost appears that the  
firewall might be "wedged" sending a small group of packets over and  
over again and that the only way to stop this behaviour is to stop  
listening on port 53 long enough for a timeout event to occur.

Merton Campbell Crockett
m.c.crockett at adelphia.net