bind-users Digest, Vol 2489, Issue 2

Weekes, Curtis Curtis.Weekes at td.com
Sun Sep 11 19:08:53 UTC 2016



Send bind-users mailing list submissions to
        bind-users at lists.isc.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.isc.org/mailman/listinfo/bind-users
or, via email, send a message with subject or body 'help' to
        bind-users-request at lists.isc.org

You can reach the person managing the list at
        bind-users-owner at lists.isc.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of bind-users digest..."


Today's Topics:

   1. Fwd: why this query cause ServFail (Hillary Nelson)
   2. Re: why this query cause ServFail (John Miller)
   3. Re: why this query cause ServFail (Hillary Nelson)


----------------------------------------------------------------------

Message: 1
Date: Sat, 10 Sep 2016 13:39:45 -0400
From: Hillary Nelson <nelsonhillary8 at gmail.com>
To: bind-users at lists.isc.org
Subject: Fwd: why this query cause ServFail
Message-ID:
        <CAJS9+Ybp7mrp8PP+PtR7otTR34eL879_P6rX79cvqCbCSYXMag at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Thanks John, I've changed the resolver-query-timeout from default 10 to 30
seconds thought my nameserver should have enough time to query at least one
other nameservers of production.tacc.utexas.edu before gets timed out. But
still it stuck with the one that's not working instead of trying other
nameservers. This is the tcpdump as you can see my nameserver 192.168.1.100
keeps querying 129.114.13.17 four times within the 30 seconds, shouldn't it
try the one of the other nameservers ?

22:24:32.594680 IP 10.79.1.6.42064 > 192.168.1.100.53: 25767+ [1au] A?
web1.production.tacc.utexas.edu. (60)
22:24:32.595029 IP 192.168.1.100.65437 > 129.114.13.17.53: 27989% [1au] A?
web1.production.tacc.utexas.edu. (60)
22:24:37.594642 IP 10.79.1.6.42064 > 192.168.1.100.53: 25767+ [1au] A?
web1.production.tacc.utexas.edu. (60)
22:24:41.595312 IP 192.168.1.100.19764 > 129.114.13.17.53: 8074% [1au] A?
web1.production.tacc.utexas.edu. (60)
22:24:42.594873 IP 10.79.1.6.42064 > 192.168.1.100.53: 25767+ [1au] A?
web1.production.tacc.utexas.edu. (60)
22:24:50.595523 IP 192.168.1.100.62364 > 129.114.13.17.53: 18009 A?
web1.production.tacc.utexas.edu. (49)
22:24:59.595825 IP 192.168.1.100.58124 > 129.114.13.17.53: 57314 A?
web1.production.tacc.utexas.edu. (49)
22:25:02.595236 IP 192.168.1.100.53 > 10.79.1.6.42064: 25767 ServFail 0/0/1
(60)

I'll contact the admin for the domain to gets the broken nameserver fixed,
but seems to me there is also problem with how named handle the NS of this
domain, or there is other parameter to tell named to try to loop through
other nameservers if one fails.



On Fri, Sep 9, 2016 at 7:20 PM, John Miller <johnmill at brandeis.edu> wrote:

> Hi Hillary,
>
> By default, BIND will return SERVFAIL to the client if it can't
> complete the full iteration process within 10 seconds.  This is
> controllable by the "resolver-query-timeout" parameter.  As for why
> your recursive server doesn't just try elsewhere, it _will_, but it
> assumes that it's querying a valid nameserver, so the original query
> needs to time out first.  It takes several queries for BIND to get its
> round-trip time cache in order.  With six authoritative NSs, it'll
> take longer than if you only had three.
>
> As for 129.114.13.18 being lame - it's hard to be lame if you aren't
> getting responses.  Lame just means that responses from the nameserver
> aren't authoritative, even though it's listed in your NS records.
>
> Your best option is to fix the non-responding nameservers or remove
> them from your NS records if they aren't supposed to respond to
> queries - name resolution isn't just broken for you, it's broken for
> everyone who wants to find web1.production.tacc.utexas.edu.
>
> John
>
> On Fri, Sep 9, 2016 at 5:23 PM, Hillary Nelson <nelsonhillary8 at gmail.com>
> wrote:
> > Also should mention that our BIND is 9.9.8-P4, what confuses me here is
> that
> > the listed nameserver (129.114.13.18) is lame and our nameserver (
> > 192.168.1.100) can't get any responses from it(see tcpdump above), why
> our
> > nameserver try other listed NS servers  instead sending 'ServFail' to the
> > client(10.79.1.6) ?
> > Any help will be greatly appreciated!
> >
> > On Fri, Sep 9, 2016 at 1:07 PM, Hillary Nelson <nelsonhillary8 at gmail.com
> >
> > wrote:
> >>
> >> We've been seeing sporadic failure of resolve this name
> >> web1.production.tacc.utexas.edu from our nameserver.
> >>
> >> There are 6 NS listed for domain production.tacc.utexas.edu, two of the
> >> six don't seem to work(dc1.production.tacc.utexas.edu 129.114.13.17 and
> >> dc2.production.tacc.utexas.edu 129.114.13.18).
> >>
> >> If our nameserver hits the two and doesn't get any response, it sends
> >> 'ServFail' to client, shouldn't the our nameserver keeps trying the
> other
> >> four working nameservers listed for the domain ?
> >>
> >> Here is the tcpdump:
> >>
> >> 12:33:38.593146 IP 10.79.1.6.51980 > 192.168.1.100.53: 60950+ [1au] A?
> >> tas.tacc.utexas.edu. (48)
> >> 12:33:38.593573 IP 192.168.1.100.54985 > 129.114.13.18.53: 40455% [1au]
> A?
> >> web1.production.tacc.utexas.edu. (60)
> >> 12:33:43.593131 IP 10.79.1.6.51980 > 192.168.1.100.53: 60950+ [1au] A?
> >> tas.tacc.utexas.edu. (48)
> >> 12:33:47.593796 IP 192.168.1.100.49009 > 129.114.13.18.53: 38559% [1au]
> A?
> >> web1.production.tacc.utexas.edu. (60)
> >> 12:33:48.593234 IP 10.79.1.6.51980 > 192.168.1.100.53: 60950+ [1au] A?
> >> tas.tacc.utexas.edu. (48)
> >> 12:33:48.593583 IP 192.168.1.100.53 > 10.79.1.6.51980: 60950 ServFail
> >> 0/0/1 (48)
> >>
> >>
> >> Thanks in advance for your help!
> >>
> _______________________________________________
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to
> unsubscribe from this list
>
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20160910/19577710/attachment-0001.html>

------------------------------

Message: 2
Date: Sat, 10 Sep 2016 14:35:19 -0400
From: John Miller <johnmill at brandeis.edu>
Cc: Bind Users Mailing List <bind-users at lists.isc.org>
Subject: Re: why this query cause ServFail
Message-ID:
        <CAGYMsbvqJ__oCbh=o6s7ewvVrM_tsCMR7hNrQUCgsLKZ=p-qAg at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Hillary,

I suspect there's more going on behind the scenes than just what your
tcpdump shows here.  Can you please post your named.conf file so we
can all see if there are any forwarders, stub zones, etc. involved
here?

Second thing: after you flush your cache, does the same behavior
persist, or does BIND try a different nameserver?

Finally, can you post the tcpdump command you're using?

John

On Sat, Sep 10, 2016 at 1:39 PM, Hillary Nelson
<nelsonhillary8 at gmail.com> wrote:
> Thanks John, I've changed the resolver-query-timeout from default 10 to 30
> seconds thought my nameserver should have enough time to query at least one
> other nameservers of production.tacc.utexas.edu before gets timed out. But
> still it stuck with the one that's not working instead of trying other
> nameservers. This is the tcpdump as you can see my nameserver 192.168.1.100
> keeps querying 129.114.13.17 four times within the 30 seconds, shouldn't it
> try the one of the other nameservers ?
>
> 22:24:32.594680 IP 10.79.1.6.42064 > 192.168.1.100.53: 25767+ [1au] A?
> web1.production.tacc.utexas.edu. (60)
> 22:24:32.595029 IP 192.168.1.100.65437 > 129.114.13.17.53: 27989% [1au] A?
> web1.production.tacc.utexas.edu. (60)
> 22:24:37.594642 IP 10.79.1.6.42064 > 192.168.1.100.53: 25767+ [1au] A?
> web1.production.tacc.utexas.edu. (60)
> 22:24:41.595312 IP 192.168.1.100.19764 > 129.114.13.17.53: 8074% [1au] A?
> web1.production.tacc.utexas.edu. (60)
> 22:24:42.594873 IP 10.79.1.6.42064 > 192.168.1.100.53: 25767+ [1au] A?
> web1.production.tacc.utexas.edu. (60)
> 22:24:50.595523 IP 192.168.1.100.62364 > 129.114.13.17.53: 18009 A?
> web1.production.tacc.utexas.edu. (49)
> 22:24:59.595825 IP 192.168.1.100.58124 > 129.114.13.17.53: 57314 A?
> web1.production.tacc.utexas.edu. (49)
> 22:25:02.595236 IP 192.168.1.100.53 > 10.79.1.6.42064: 25767 ServFail 0/0/1
> (60)
>
> I'll contact the admin for the domain to gets the broken nameserver fixed,
> but seems to me there is also problem with how named handle the NS of this
> domain, or there is other parameter to tell named to try to loop through
> other nameservers if one fails.
>
>
>
> On Fri, Sep 9, 2016 at 7:20 PM, John Miller <johnmill at brandeis.edu> wrote:
>>
>> Hi Hillary,
>>
>> By default, BIND will return SERVFAIL to the client if it can't
>> complete the full iteration process within 10 seconds.  This is
>> controllable by the "resolver-query-timeout" parameter.  As for why
>> your recursive server doesn't just try elsewhere, it _will_, but it
>> assumes that it's querying a valid nameserver, so the original query
>> needs to time out first.  It takes several queries for BIND to get its
>> round-trip time cache in order.  With six authoritative NSs, it'll
>> take longer than if you only had three.
>>
>> As for 129.114.13.18 being lame - it's hard to be lame if you aren't
>> getting responses.  Lame just means that responses from the nameserver
>> aren't authoritative, even though it's listed in your NS records.
>>
>> Your best option is to fix the non-responding nameservers or remove
>> them from your NS records if they aren't supposed to respond to
>> queries - name resolution isn't just broken for you, it's broken for
>> everyone who wants to find web1.production.tacc.utexas.edu.
>>
>> John
>>
>> On Fri, Sep 9, 2016 at 5:23 PM, Hillary Nelson <nelsonhillary8 at gmail.com>
>> wrote:
>> > Also should mention that our BIND is 9.9.8-P4, what confuses me here is
>> > that
>> > the listed nameserver (129.114.13.18) is lame and our nameserver (
>> > 192.168.1.100) can't get any responses from it(see tcpdump above), why
>> > our
>> > nameserver try other listed NS servers  instead sending 'ServFail' to
>> > the
>> > client(10.79.1.6) ?
>> > Any help will be greatly appreciated!
>> >
>> > On Fri, Sep 9, 2016 at 1:07 PM, Hillary Nelson
>> > <nelsonhillary8 at gmail.com>
>> > wrote:
>> >>
>> >> We've been seeing sporadic failure of resolve this name
>> >> web1.production.tacc.utexas.edu from our nameserver.
>> >>
>> >> There are 6 NS listed for domain production.tacc.utexas.edu, two of the
>> >> six don't seem to work(dc1.production.tacc.utexas.edu 129.114.13.17 and
>> >> dc2.production.tacc.utexas.edu 129.114.13.18).
>> >>
>> >> If our nameserver hits the two and doesn't get any response, it sends
>> >> 'ServFail' to client, shouldn't the our nameserver keeps trying the
>> >> other
>> >> four working nameservers listed for the domain ?
>> >>
>> >> Here is the tcpdump:
>> >>
>> >> 12:33:38.593146 IP 10.79.1.6.51980 > 192.168.1.100.53: 60950+ [1au] A?
>> >> tas.tacc.utexas.edu. (48)
>> >> 12:33:38.593573 IP 192.168.1.100.54985 > 129.114.13.18.53: 40455% [1au]
>> >> A?
>> >> web1.production.tacc.utexas.edu. (60)
>> >> 12:33:43.593131 IP 10.79.1.6.51980 > 192.168.1.100.53: 60950+ [1au] A?
>> >> tas.tacc.utexas.edu. (48)
>> >> 12:33:47.593796 IP 192.168.1.100.49009 > 129.114.13.18.53: 38559% [1au]
>> >> A?
>> >> web1.production.tacc.utexas.edu. (60)
>> >> 12:33:48.593234 IP 10.79.1.6.51980 > 192.168.1.100.53: 60950+ [1au] A?
>> >> tas.tacc.utexas.edu. (48)
>> >> 12:33:48.593583 IP 192.168.1.100.53 > 10.79.1.6.51980: 60950 ServFail
>> >> 0/0/1 (48)
>> >>
>> >>
>> >> Thanks in advance for your help!
>> >>
>> _______________________________________________
>> Please visit https://lists.isc.org/mailman/listinfo/bind-users to
>> unsubscribe from this list
>>
>> bind-users mailing list
>> bind-users at lists.isc.org
>> https://lists.isc.org/mailman/listinfo/bind-users
>
>
>
>
> _______________________________________________
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to
> unsubscribe from this list
>
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users



--
John Miller
Systems Engineer
Brandeis University
johnmill at brandeis.edu
(781) 736-4619


------------------------------

Message: 3
Date: Sat, 10 Sep 2016 18:03:33 -0400
From: Hillary Nelson <nelsonhillary8 at gmail.com>
To: bind-users at lists.isc.org
Subject: Re: why this query cause ServFail
Message-ID:
        <CAJS9+YaiYiYaWJYnexrmykFtBbd2gk-52LSF4Va+9=naLXebEg at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I've double checked our nameserver config and there shouldn't be any stub
involved when resolving this domain, we don't have forwarder configured.

After flush the cache or the cache expires itself(the ttl is short), bind
almost always hit another server and works, we have 9 named resolvers,
anytime I checked there are always one or two(not the same ones) has
problem with this domain.

The nameserver is dedicated and on RHEL 6.8,  tcpdump command:
tcpdump -i any -nn port 53

Here is named.conf, please let me know if there is anythings else needed:

include "/etc/rndc.key";
include "/named/acl";
controls {
         inet 127.0.0.1 allow { 127.0.0.1; } keys { localkey; };
};

options {
        listen-on-v6 { any; };
        listen-on { any; };
        directory "/named";
        dump-file "/var/run/named_dump.db";
        pid-file "/var/run/named.pid";
        recursing-file "/var/run/named.recursing";
        statistics-file "/var/run/named.stats";
        transfer-format many-answers;
        max-transfer-time-in 60;
        resolver-query-timeout 30;
        check-names master ignore;
        check-names slave ignore;
        check-names response ignore;
        datasize default;
        stacksize default;
        coresize default;
        files unlimited;
        recursion yes;
        notify no;
        auth-nxdomain no;
        version "unknown";
        response-policy { zone "dns-policy.rpz.zone"; };
        allow-transfer { xfer; };
        allow-query { all-allowed; };
        allow-query-cache { all-allowed; };
        allow-recursion { all-allowed; };
        blackhole { bogon; };
        include "validate";
        include "anycast.server";

};

server fe80::/16 { bogus yes; };
server ::/0 { bogus yes; };

include "logging.conf";
include "trusted-keys.conf";
include "gen.conf";
include "rpz.conf";
include "Secondary.conf";


Thanks!!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20160910/6773888a/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
bind-users mailing list
bind-users at lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

------------------------------

End of bind-users Digest, Vol 2489, Issue 2
*******************************************

If you wish to unsubscribe from receiving commercial electronic messages from TD Bank Group, please click here or go to the following web address: www.td.com/tdoptout 
Si vous souhaitez vous désabonner des messages électroniques de nature commerciale envoyés par Groupe Banque TD veuillez cliquer ici ou vous rendre à l'adresse www.td.com/tddesab

NOTICE: Confidential message which may be privileged. Unauthorized use/disclosure prohibited. If received in error, please go to www.td.com/legal for instructions.
AVIS : Message confidentiel dont le contenu peut être privilégié. Utilisation/divulgation interdites sans permission. Si reçu par erreur, prière d'aller au www.td.com/francais/avis_juridique pour des instructions.


More information about the bind-users mailing list