FOLLOWUP- DNS MX timeouts
Mark Andrews
marka at isc.org
Wed Jul 8 04:29:03 UTC 2009
In message <4A53CF4A.8050600 at provident-solutions.com>, "Vernon A. Fort" writes:
> Mark Andrews wrote:
> > In message <4A452428.9020701 at provident-solutions.com>, "Vernon A. Fort" wri
> tes:
> >
> >> I've run into a problem with named and timeouts primarily with MX
> >> lookups. When a MX query fails the first time, i have to restart the
> >> named process before it will return a successful query. Again, its
> >> mainly with MX lookups but it also happens with A records as well. The
> >> problem subsides for 1-2 hours and starts happening again - basically i
> >> look in the mailq for deferred messages with MX lookup failures.
> >>
> >>
> > This box is a Gentoo install running a medium volume (500K per day) mail
> >
> >> server - lots of dns queries due to rbl's, spamassassin, etc. This
> >> problem started showing up around mid-may. Since then, i have
> >> re-installed bind and bind-tools several times, updated the kernel,
> >> linux headers to 2.6.29, recompiled glibc, etc....
> >>
> >> I just updated to 9.6.0-P1 from 9.4.3-P2 - same problem exists. When
> >> doing a manual MX lookup (dig MX isc.org) - it takes around 45 seconds
> >> on the first attempt. If it fails the first time, it will never return
> >> a positive query, just "connection timed out; no servers could be
> >> reached" until i restart named. I can't say for sure but the bind
> >> application was updated around the time i noticed this problem. All
> >> versions of bind i have tried (in gentoo portage) have the same problem.
> >>
> >> Can anyone help me find where this problem might be? I've google'd
> >> until my eyes are red and throbbing.
> >>
> >> Thanks
> >>
> >> Vernon
> >> _______________________________________________
> >> bind-users mailing list
> >> bind-users at lists.isc.org
> >> https://lists.isc.org/mailman/listinfo/bind-users
> >>
> >
> > I suggest that you fix your firewalls to allow 4096 byte EDNS
> > responses though. Both ORG and ISC.ORG are signed zones so there
> > reponses are larger than with unsigned zones. Named is having to
> > retry with different options to get a response through your firewall
> > and this takes time.
> >
> > A EDNS/UDP MX response is 1999 bytes for isc.org.
> >
> > ;; Query time: 872 msec
> > ;; SERVER: 2001:4f8:0:2::19#53(2001:4f8:0:2::19)
> > ;; WHEN: Sat Jun 27 09:39:34 2009
> > ;; MSG SIZE rcvd: 1999
> >
> I now have two servers running behind checkpoint firewall which are
> failing to resolve MX records. One of IT guys called CheckPoint and
> support suggested they disable the smart defense DNS udp check. This
> did correct the problem, but queries are still sluggish from time to time.
>
> I have three questions related to this:
>
> 1. On both servers - the dns version (and glibc) were updated in
> mid-January bind-9.4.1 to 9.4.3. The SmartDefense DNS check has been
> enabled on both firewalls long before the last updates were applied.
> Why did the issues just now start showing up (late May - early June)?
The ORG zone went from unsigned to signed using NSEC3 in
that period. I suspect SmartDefense doesn't yet know about
NSEC3 records.
> 2. When a email is deferred in the mailq, it will stay deferred until
> named is restarted. I just tested this on a mail message that sat in
> the queue for just about three days. I keep trying to dig MX domain.com
> during this time period and NOTHING would resolved (including any A
> records) until i restarted named. Why?
Did you look at the nameserver logs?
> 3. In both network environments, i switched the resolution to internal
> windows 2003 dns servers. NO problems occurred during the week we used
> the windows DNS server. Why would smartdefense not have the same effect
> on windows based name servers?
Windows 2003 dns servers don't talk EDNS nor DNSSEC so
firewalls don't interfere with the responses.
> Updated to bind-9.6.1 and updating the root.zone file made little if any
> difference. Basically, It appears that SOMETHING has changed somewhere
> because we have just now altered the cisco PIX rules to increase the udp
> packet size due to timeout in these environments. I have seen posts
> related to my problems as far back as 2-3 years ago. So again, i'm
> scratching my head wondering what the heck did i miss - why did these
> problems just now start showing up?
>
> Any pointers or additional reading would be greatly appreciated. I'm
> just trying to understand from a 1000 foot view but whatever view anyone
> suggests is fine.
>
> Vernon
>
> _______________________________________________
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: marka at isc.org
More information about the bind-users
mailing list