Strange problem with a query deleting a record...

Sat Aug 24 04:46:58 UTC 2013

In article <mailman.1159.1377301811.20661.bind-users at lists.isc.org>,
 Mark Andrews <marka at isc.org> wrote:

> In message <52177D81.8020206 at chrysler.com>, Kevin Darcy writes:
> > On 8/22/2013 12:55 PM, johnh at primebuchholz.com wrote:
> > > Greetings All,
> > >
> > > First of all, I apologize if this is out of place - I'm having a very
> > > strange issue that is either a problem with bind itself, or at least,
> > > affecting it.  Summary:
> > >
> > > For only ONE address, whenever I attempt to access it through my squid
> > > proxy, the record disappears from DNS, and the retry time changes too.
> > > Essentially, accessing www.thisdomain.com works, but a link to a portal 
> > > on
> > > that page to the subdomain login.thisdomain.com causes the problem.  I'm
> > > willing to bet the problem lies with squid, but as to how it could
> > > possibly change a record in bind... Well, I'm stumped.  If you don't go
> > > through squid, everything works.  All other requests to bind for the
> > > address of the host in question work fine. Here's a the output of dig 
> > > from
> > > before accessing the page through squid:
> > >
> > > ; <<>> DiG 9.4.1-P1 <<>> login.thisdomain.com
> > > ;; global options:  printcmd
> > > ;; Got answer:
> > > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45037
> > > ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 0
> > >
> > > ;; QUESTION SECTION:
> > > ;login.thisdomain.com.            IN      A
> > >
> > > ;; ANSWER SECTION:
> > > login.thisdomain.com.     17      IN      A       111.222.333.123
> > >
> > > ;; AUTHORITY SECTION:
> > > thisdomain.com.         168319  IN      NS      ns1.thisdomain.com.
> > > thisdomain.com.         168319  IN      NS      ns2.thisdomain.com.
> > >
> > > ;; Query time: 0 msec
> > > ;; SERVER: 127.0.0.1#53(127.0.0.1)
> > > ;; WHEN: Thu Aug 22 12:29:57 2013
> > > ;; MSG SIZE  rcvd: 88
> > >
> > > You can do anything to request the address from bind and it works,
> > > *except* try to access it through squid.  Bypassing squid and going
> > > directly through the firewall works fine.
> > >
> > > Now, immediately after you try to access it through squid:
> > >
> > > ; <<>> DiG 9.4.1-P1 <<>> login.thisdomain.com
> > > ;; global options:  printcmd
> > > ;; Got answer:
> > > ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 43943
> > > ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
> > >
> > > ;; QUESTION SECTION:
> > > ;login.thisdomain.com.            IN      A
> > >
> > > ;; AUTHORITY SECTION:
> > > thisdomain.com.         298     IN      SOA     ns1.thisdomain.com.
> > > serv.anotherdomain.com. 2006062510 3600 3600 2592000 300
> > >
> > > ;; Query time: 0 msec
> > > ;; SERVER: 127.0.0.1#53(127.0.0.1)
> > > ;; WHEN: Thu Aug 22 12:30:06 2013
> > > ;; MSG SIZE  rcvd: 95
> > >
> > > After the 5-minute retry shown above expires, the original record
> > > reappears.
> > >
> > > Ideas?  I'm stumped.  It seems like squid is somehow able to corrupt
> > > bind's info, but I can't imagine how.
> > I have a theory. If this is a name that's hosted on a stupid 
> > load-balancer, and that load-balancer doesn't understand non-A-record 
> > query types, then if Squid is sending a non-A query type (e.g. SRV, 
> > possibly even AAAA, if it's *really* stupid), then the load-balancer may 
> > be erroneously "poisoning" your cache with an NXDOMAIN response.
> > 
> > We ran into this many years ago with Cisco GSSes (Global Site Selectors) 
> > and work around it by having a "shadow" version of the zone, which the 
> > GSSes proxy to for QTYPEs they don't handle. That "shadow" version of 
> > the zone has a wildcard entry in it which forces responses to be NODATA 
> > instead of NXDOMAIN, and this prevents the cache poisoning.
> > 
> >                                                              - Kevin
> 
> The load balancer should be able to correct for such misconfigurations
> by changing the rcode of the response from NXDOMAIN to NOERROR.  It
> knows what names is is answering for so it can know that the NXDOMAIN
> is a erroneous response.

If I understand what Kevin was saying, the load balancer IS the DNS 
server. If you ask it for the A record it's responsible for, it sends a 
reasonable reply. If you ask it for some other record type for that 
name, it sends NXDOMAIN instead of NOERROR.

It's a design flaw in these load balancers.

-- 
Barry Margolin
Arlington, MA