SERVFAIL and lame server errors for google MX servers

Chris Thompson cet1 at cam.ac.uk
Mon Oct 20 11:55:11 UTC 2008


Andys,

You wrote:

>  I am seeing SERVFAIL errors from our FreeBSD BIND 9.5.0.2 server when 
>looking up MX records for google.com.

But your later example involves MX records for googlemail.com, 
not google.com.

>We have a customer complaining they are having mail delivery errors as a 
>result which have a serious impact to this customer, so I need to know if 
>its a problem with our server. 

That shouldn't be happening even if you are getting SERVFAIL, unless it
is persistent. SERVFAIL indicates a (possibly) temporary error, and MTAs
should retry delivery later, up to some limit (usually days).

>So far the symptoms have been that in the past couple of weeks we have seen 
>errors like: 

Debugging using nslookup(1) is *so* painful. Try using dig(1) instead.

>> Default Server:  ns1.ukgrid.net 
>> 
>> Address:  85.159.60.194 

This appears to be providing an open recursive lookup service, which
among other things won't improve your chances of avoiding cache poisoning.

>>> set type=mx
>> 
>>> googlemail.com
>> 
>> Server:  ns1.ukgrid.net 
>> 
>> Address:  85.159.60.194 
>> 
>> *** ns1.ukgrid.net can't find googlemail.com: Server failed
>
>then a few moments later if you try again the query succeeds. 
>
>Now on the face of it this looks like a problem with google DNS to me (but 
>Im no BIND expert), but the thing that worries me is that restarting named 
>on our server seems to completely clear the problem, at least for some days.

Never restart named unless you absolutely have to. Learn how to use
rndc (assuming you haven't already). If you can catch it in the 
situation when it is giving SERVFAIL, then "rndc dumpdb -cache" 
may give useful information. Also, it would be interesting to know
whether "rndc flushname googlemail.com" takes it out of the state
immediately.

>I have also tried upgrading BIND, we were previously running on the FreeBSD 
>ports version 9.5.0.1 and now on 9.5.0.2 but the same issue is still being 
>seen. 

If these are ports of 9.5.0-P1 and/or -P2, then you may be running out
of UDP sockets, i.e. the problem could be entirely local. (But then I
would expect you to see SERVFAILs for several names.)

>Any advise on whether we have an issue or its just a problem with google 
>servers would be appreciated. 

More debugging is needed, clearly. I certainly don't see any problem
with the MX records for googlemail.com (or those for google.com) at
the moment.

-- 
Chris Thompson
Email: cet1 at cam.ac.uk



More information about the bind-users mailing list