Strange problem with a query deleting a record...

Fri Aug 23 23:49:51 UTC 2013

In message <52177D81.8020206 at chrysler.com>, Kevin Darcy writes:
> On 8/22/2013 12:55 PM, johnh at primebuchholz.com wrote:
> > Greetings All,
> >
> > First of all, I apologize if this is out of place - I'm having a very
> > strange issue that is either a problem with bind itself, or at least,
> > affecting it.  Summary:
> >
> > For only ONE address, whenever I attempt to access it through my squid
> > proxy, the record disappears from DNS, and the retry time changes too.
> > Essentially, accessing www.thisdomain.com works, but a link to a portal on
> > that page to the subdomain login.thisdomain.com causes the problem.  I'm
> > willing to bet the problem lies with squid, but as to how it could
> > possibly change a record in bind... Well, I'm stumped.  If you don't go
> > through squid, everything works.  All other requests to bind for the
> > address of the host in question work fine. Here's a the output of dig from
> > before accessing the page through squid:
> >
> > ; <<>> DiG 9.4.1-P1 <<>> login.thisdomain.com
> > ;; global options:  printcmd
> > ;; Got answer:
> > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45037
> > ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 0
> >
> > ;; QUESTION SECTION:
> > ;login.thisdomain.com.            IN      A
> >
> > ;; ANSWER SECTION:
> > login.thisdomain.com.     17      IN      A       111.222.333.123
> >
> > ;; AUTHORITY SECTION:
> > thisdomain.com.         168319  IN      NS      ns1.thisdomain.com.
> > thisdomain.com.         168319  IN      NS      ns2.thisdomain.com.
> >
> > ;; Query time: 0 msec
> > ;; SERVER: 127.0.0.1#53(127.0.0.1)
> > ;; WHEN: Thu Aug 22 12:29:57 2013
> > ;; MSG SIZE  rcvd: 88
> >
> > You can do anything to request the address from bind and it works,
> > *except* try to access it through squid.  Bypassing squid and going
> > directly through the firewall works fine.
> >
> > Now, immediately after you try to access it through squid:
> >
> > ; <<>> DiG 9.4.1-P1 <<>> login.thisdomain.com
> > ;; global options:  printcmd
> > ;; Got answer:
> > ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 43943
> > ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
> >
> > ;; QUESTION SECTION:
> > ;login.thisdomain.com.            IN      A
> >
> > ;; AUTHORITY SECTION:
> > thisdomain.com.         298     IN      SOA     ns1.thisdomain.com.
> > serv.anotherdomain.com. 2006062510 3600 3600 2592000 300
> >
> > ;; Query time: 0 msec
> > ;; SERVER: 127.0.0.1#53(127.0.0.1)
> > ;; WHEN: Thu Aug 22 12:30:06 2013
> > ;; MSG SIZE  rcvd: 95
> >
> > After the 5-minute retry shown above expires, the original record
> > reappears.
> >
> > Ideas?  I'm stumped.  It seems like squid is somehow able to corrupt
> > bind's info, but I can't imagine how.
> I have a theory. If this is a name that's hosted on a stupid 
> load-balancer, and that load-balancer doesn't understand non-A-record 
> query types, then if Squid is sending a non-A query type (e.g. SRV, 
> possibly even AAAA, if it's *really* stupid), then the load-balancer may 
> be erroneously "poisoning" your cache with an NXDOMAIN response.
> 
> We ran into this many years ago with Cisco GSSes (Global Site Selectors) 
> and work around it by having a "shadow" version of the zone, which the 
> GSSes proxy to for QTYPEs they don't handle. That "shadow" version of 
> the zone has a wildcard entry in it which forces responses to be NODATA 
> instead of NXDOMAIN, and this prevents the cache poisoning.
> 
>                                                              - Kevin

The load balancer should be able to correct for such misconfigurations
by changing the rcode of the response from NXDOMAIN to NOERROR.  It
knows what names is is answering for so it can know that the NXDOMAIN
is a erroneous response.

Obviously this hack doesn't work for signed zones where the shadow
zone needs to have dummy records for the queries being answered by
the load balancer so that all the DNSSEC records are properly
consistent with the records being returned by the load balancer.

In either case the load balancer should be logging error or in the
DNSSEC case not answering anymore queries for the name until the
error is corrected.

Mark

> _______________________________________________
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe
>  from this list
> 
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: marka at isc.org