FOLLOWUP- DNS MX timeouts

Vernon A. Fort vfort at provident-solutions.com
Tue Jul 7 22:42:18 UTC 2009


Mark Andrews wrote:
> In message <4A452428.9020701 at provident-solutions.com>, "Vernon A. Fort" writes:
>   
>> I've run into a problem with named and timeouts primarily with MX 
>> lookups.  When a MX query fails the first time, i have to restart the 
>> named process before it will return a successful query.  Again, its 
>> mainly with MX lookups but it also happens with A records as well.  The 
>> problem subsides for 1-2 hours and starts happening again - basically i 
>> look in the mailq for deferred messages with MX lookup failures.
>>
>>     
> This box is a Gentoo install running a medium volume (500K per day) mail 
>   
>> server - lots of dns queries due to rbl's, spamassassin, etc.  This 
>> problem started showing up around mid-may.  Since then, i have 
>> re-installed bind and bind-tools several times, updated the kernel, 
>> linux headers to 2.6.29, recompiled glibc, etc....
>>
>> I just updated to 9.6.0-P1 from 9.4.3-P2 - same problem exists.  When 
>> doing a manual MX lookup (dig MX isc.org) - it takes around 45 seconds 
>> on the first attempt.  If it fails the first time, it will never return 
>> a positive query, just "connection timed out; no servers could be 
>> reached" until i restart named.  I can't say for sure but the bind 
>> application was updated around the time i noticed this problem.  All 
>> versions of bind i have tried (in gentoo portage) have the same problem.
>>
>> Can anyone help me find where this problem might be?  I've google'd 
>> until my eyes are red and throbbing.
>>
>> Thanks
>>
>> Vernon
>> _______________________________________________
>> bind-users mailing list
>> bind-users at lists.isc.org
>> https://lists.isc.org/mailman/listinfo/bind-users
>>     
>
> I suggest that you fix your firewalls to allow 4096 byte EDNS
> responses though.  Both ORG and ISC.ORG are signed zones so there
> reponses are larger than with unsigned zones.  Named is having to
> retry with different options to get a response through your firewall
> and this takes time.
>
> A EDNS/UDP MX response is 1999 bytes for isc.org.
>
> ;; Query time: 872 msec
> ;; SERVER: 2001:4f8:0:2::19#53(2001:4f8:0:2::19)
> ;; WHEN: Sat Jun 27 09:39:34 2009
> ;; MSG SIZE  rcvd: 1999
>   
I now have two servers running behind checkpoint firewall which are 
failing to resolve MX records.  One of IT guys called CheckPoint and 
support suggested they disable the smart defense  DNS udp check.  This 
did correct the problem, but queries are still sluggish from time to time.

I have three questions related to this:

1.  On both servers - the dns version (and glibc) were updated in 
mid-January bind-9.4.1 to 9.4.3.  The SmartDefense DNS check has been 
enabled on both firewalls long before the last updates were applied.  
Why did the issues just now start showing up (late May - early June)?

2.  When a email is deferred in the mailq, it will stay deferred until 
named is restarted.  I just tested this on a mail message that sat in 
the queue for just about three days.  I keep trying to dig MX domain.com 
during this time period and NOTHING would resolved (including any A 
records) until i restarted named.  Why?

3.  In both network environments, i switched the resolution to internal 
windows 2003 dns servers.  NO problems occurred during the week we used 
the windows DNS server.  Why would smartdefense not have the same effect 
on windows based name servers?

Updated to bind-9.6.1 and updating the root.zone file made little if any 
difference.  Basically,  It appears that SOMETHING has changed somewhere 
because we have just now altered the cisco PIX rules to increase the udp 
packet size due to timeout in these environments.  I have seen posts 
related to my problems as far back as 2-3 years ago.  So again, i'm 
scratching my head wondering what the heck did i miss - why did these 
problems just now start showing up?

Any pointers or additional reading would be greatly appreciated.  I'm 
just trying to understand from a 1000 foot view but whatever view anyone 
suggests is fine.

Vernon




More information about the bind-users mailing list