Strange / Frustrating Caching Problems

Fri Jul 14 16:16:35 UTC 2006

On 14 Jul 2006, at 08:29 , Kevin Darcy wrote:

> Merton Campbell Crockett wrote:
>> On 13 Jul 2006, at 11:43 , Smith, William E. ((Bill)), Jr. wrote:
>>
>>
>>> -----Original Message-----
>>> From: Mark_Andrews at isc.org [mailto:Mark_Andrews at isc.org]
>>> Sent: Thursday, July 13, 2006 1:55 PM
>>> To: Smith, William E. (Bill), Jr.
>>> Cc: bind-users at isc.org
>>> Subject: Re: Strange / Frustrating Caching Problems
>>>
>>>
>>>
>>>> For the past few months, I have been trying to resolve
>>>> (unsuccessfully
>>>> to thi s point) with a  trio of caching only name servers that we
>>>> have
>>>> in place.  The general nature of the problem is as follows.  A dhcp
>>>> client originally gets  an IP address on subnet A but at some point
>>>> prior to lease expiration moves to subnet B, where they obtain a  
>>>> new
>>>> IP address successfully.  The problem that I am seeing is that  
>>>> after
>>>> the move to subnet B, one or more of our caching  only name servers
>>>> are still returning the old IP address when a lookup of the  
>>>> hostname
>>>> occurs.  This behavior seems reasonable at first glance since  
>>>> caching
>>>> only servers should retain the information they have in cache until
>>>> the TTL expires and/or the cache is flushed.  After digging into  
>>>> this
>>>> further, I'm  finding that that the TTL for the hosts whose forward
>>>> lookups are returning the wrong IP are set to 604800 seconds or 168
>>>> hours.  I've determined this by dumping / viewing the cache.   In
>>>> addition, I've also discovered that the TTL for the reverse record
>>>> for the same client is also set to this high value.  This behavior
>>>> would seem reasonable if this high value was the TTL value  
>>>> configured
>>>> for the domain, which is not the case here.  We have the default  
>>>> TTL
>>>> in our environment set for 10800 seconds or 4 hours.  Thus, I'm a
>>>> little baffled as to why the TTL for some of these DHCP clients are
>>>> being set to such a high value when other clients have their TTL's
>>>> set
>>>> to the 10800 v alue configured at
>>>> the domain level.  I've checked the registration at the ob ject  
>>>> level
>>>> (in our IP management application) and the TTL field is blank,  
>>>> thu s
>>>>
>>> implying the default TTL is in place.
>>>
>>>> Aside from the above details, I can also note that the problematic
>>>> lookups se em to involve the same DHCP clients.  The only reason I
>>>> know about these clie nts is that they are unable to SSH to some  
>>>> Unix
>>>> boxes in a DMZ that restrict access to hosts that they can perform
>>>>
>>> both forward and reverse lookups for.
>>>
>>>> In this scenario, the forward lookup is failing since it's  
>>>> returning
>>>> the old IP address of the client.  When this problem occurs, it  
>>>> tends
>>>> to affect one o r two of the caching servers but not all three.
>>>> Furthermore, it is somewhat random as to which of the 3 servers are
>>>>
>>> affected.
>>>
>>>> The caching servers in question are all Solaris 9 running BIND  
>>>> 9.3.2
>>>>
>>>> If anyone can provide some insight here, it would be much
>>>> appreciated.
>>>>
>>>> I can  provide additional information and/or elaborate on
>>>> something as
>>>>
>>> needed.
>>>
>>>> Bill Smith
>>>> <mailto:bill.smith at jhuapl.edu>
>>>> ISS Server Systems Group
>>>> Johns Hopkins University Applied Physics Laboratory 11100 Johns
>>>> Hopkins Road Laurel, MD 20723
>>>> Phone:  443-778-5523
>>>> Web:  http://www.jhuapl.edu <http://www.jhuapl.edu/>
>>>>
>>> 	Nameservers do what the dhcp servers tell them to do.  The TTL
>>> 	is set by the DHCP server.  Try lowering the dhcp lease time as
>>> 	that influences the DNS TTL.
>>>
>>
>>
>> In an environment where people can wander with their laptops from
>> subnet to subnet, why do you have caching only name servers?
>>
>> These name servers should, at least, have the local zones defined as
>> forward or stub zones to minimize the amount of erroneous data being
>> returned in a volatile environment.
>>
> Uh, how will that help? Caching still occurs -- and TTLs are  
> honored --
> even for names in "forward" or "stub" zones.
>
> The only way I can think of to speed up this propagation, short of
> reducing the TTLs that are set by the DHCP server, or running a  
> modified
> version of BIND (e.g. QIP's version, in which secondaries can receive
> Dynamic Updates), or an out-of-band replication mechanism, is to  
> set up
> all of the servers as stealth slaves enumerated in the relevant
> also-notify(s), so that the changes should replicate fairly quickly.

Right you are. A momentary brain-fade.  You can, however, configure a  
name server to override the TTL value received in a query response by  
defining max-cache-ttl in the global options of the configuration file.

If max-cache-ttl is not defined in the configuration file, BIND sets  
the value of max-cache-ttl to its default value of 7 days.  Any query  
response with a TTL greater than max-cache-ttl will have the TTL  
replaced with max-cache-ttl.

This is why Bill Smith was seeing a TTL of 7 days in the caching only  
name server for a system with a TTL of 14 days.  Depending upon the  
volume of DNS queries in the DMZs, it might be reasonable to define  
"max-cache-ttl 1800;" to force the name server to perform a new query  
after 30 minutes.

Merton Campbell Crockett
m.c.crockett at adelphia.net