Strange / Frustrating Caching Problems
Merton Campbell Crockett
m.c.crockett at adelphia.net
Fri Jul 14 16:16:35 UTC 2006
On 14 Jul 2006, at 08:29 , Kevin Darcy wrote:
> Merton Campbell Crockett wrote:
>> On 13 Jul 2006, at 11:43 , Smith, William E. ((Bill)), Jr. wrote:
>>
>>
>>> -----Original Message-----
>>> From: Mark_Andrews at isc.org [mailto:Mark_Andrews at isc.org]
>>> Sent: Thursday, July 13, 2006 1:55 PM
>>> To: Smith, William E. (Bill), Jr.
>>> Cc: bind-users at isc.org
>>> Subject: Re: Strange / Frustrating Caching Problems
>>>
>>>
>>>
>>>> For the past few months, I have been trying to resolve
>>>> (unsuccessfully
>>>> to thi s point) with a trio of caching only name servers that we
>>>> have
>>>> in place. The general nature of the problem is as follows. A dhcp
>>>> client originally gets an IP address on subnet A but at some point
>>>> prior to lease expiration moves to subnet B, where they obtain a
>>>> new
>>>> IP address successfully. The problem that I am seeing is that
>>>> after
>>>> the move to subnet B, one or more of our caching only name servers
>>>> are still returning the old IP address when a lookup of the
>>>> hostname
>>>> occurs. This behavior seems reasonable at first glance since
>>>> caching
>>>> only servers should retain the information they have in cache until
>>>> the TTL expires and/or the cache is flushed. After digging into
>>>> this
>>>> further, I'm finding that that the TTL for the hosts whose forward
>>>> lookups are returning the wrong IP are set to 604800 seconds or 168
>>>> hours. I've determined this by dumping / viewing the cache. In
>>>> addition, I've also discovered that the TTL for the reverse record
>>>> for the same client is also set to this high value. This behavior
>>>> would seem reasonable if this high value was the TTL value
>>>> configured
>>>> for the domain, which is not the case here. We have the default
>>>> TTL
>>>> in our environment set for 10800 seconds or 4 hours. Thus, I'm a
>>>> little baffled as to why the TTL for some of these DHCP clients are
>>>> being set to such a high value when other clients have their TTL's
>>>> set
>>>> to the 10800 v alue configured at
>>>> the domain level. I've checked the registration at the ob ject
>>>> level
>>>> (in our IP management application) and the TTL field is blank,
>>>> thu s
>>>>
>>> implying the default TTL is in place.
>>>
>>>> Aside from the above details, I can also note that the problematic
>>>> lookups se em to involve the same DHCP clients. The only reason I
>>>> know about these clie nts is that they are unable to SSH to some
>>>> Unix
>>>> boxes in a DMZ that restrict access to hosts that they can perform
>>>>
>>> both forward and reverse lookups for.
>>>
>>>> In this scenario, the forward lookup is failing since it's
>>>> returning
>>>> the old IP address of the client. When this problem occurs, it
>>>> tends
>>>> to affect one o r two of the caching servers but not all three.
>>>> Furthermore, it is somewhat random as to which of the 3 servers are
>>>>
>>> affected.
>>>
>>>> The caching servers in question are all Solaris 9 running BIND
>>>> 9.3.2
>>>>
>>>> If anyone can provide some insight here, it would be much
>>>> appreciated.
>>>>
>>>> I can provide additional information and/or elaborate on
>>>> something as
>>>>
>>> needed.
>>>
>>>> Bill Smith
>>>> <mailto:bill.smith at jhuapl.edu>
>>>> ISS Server Systems Group
>>>> Johns Hopkins University Applied Physics Laboratory 11100 Johns
>>>> Hopkins Road Laurel, MD 20723
>>>> Phone: 443-778-5523
>>>> Web: http://www.jhuapl.edu <http://www.jhuapl.edu/>
>>>>
>>> Nameservers do what the dhcp servers tell them to do. The TTL
>>> is set by the DHCP server. Try lowering the dhcp lease time as
>>> that influences the DNS TTL.
>>>
>>
>>
>> In an environment where people can wander with their laptops from
>> subnet to subnet, why do you have caching only name servers?
>>
>> These name servers should, at least, have the local zones defined as
>> forward or stub zones to minimize the amount of erroneous data being
>> returned in a volatile environment.
>>
> Uh, how will that help? Caching still occurs -- and TTLs are
> honored --
> even for names in "forward" or "stub" zones.
>
> The only way I can think of to speed up this propagation, short of
> reducing the TTLs that are set by the DHCP server, or running a
> modified
> version of BIND (e.g. QIP's version, in which secondaries can receive
> Dynamic Updates), or an out-of-band replication mechanism, is to
> set up
> all of the servers as stealth slaves enumerated in the relevant
> also-notify(s), so that the changes should replicate fairly quickly.
Right you are. A momentary brain-fade. You can, however, configure a
name server to override the TTL value received in a query response by
defining max-cache-ttl in the global options of the configuration file.
If max-cache-ttl is not defined in the configuration file, BIND sets
the value of max-cache-ttl to its default value of 7 days. Any query
response with a TTL greater than max-cache-ttl will have the TTL
replaced with max-cache-ttl.
This is why Bill Smith was seeing a TTL of 7 days in the caching only
name server for a system with a TTL of 14 days. Depending upon the
volume of DNS queries in the DMZs, it might be reasonable to define
"max-cache-ttl 1800;" to force the name server to perform a new query
after 30 minutes.
Merton Campbell Crockett
m.c.crockett at adelphia.net
More information about the bind-users
mailing list