Strange / Frustrating Caching Problems
Kevin Darcy
kcd at daimlerchrysler.com
Fri Jul 14 15:29:32 UTC 2006
Merton Campbell Crockett wrote:
> On 13 Jul 2006, at 11:43 , Smith, William E. ((Bill)), Jr. wrote:
>
>
>> -----Original Message-----
>> From: Mark_Andrews at isc.org [mailto:Mark_Andrews at isc.org]
>> Sent: Thursday, July 13, 2006 1:55 PM
>> To: Smith, William E. (Bill), Jr.
>> Cc: bind-users at isc.org
>> Subject: Re: Strange / Frustrating Caching Problems
>>
>>
>>
>>> For the past few months, I have been trying to resolve
>>> (unsuccessfully
>>> to thi s point) with a trio of caching only name servers that we
>>> have
>>> in place. The general nature of the problem is as follows. A dhcp
>>> client originally gets an IP address on subnet A but at some point
>>> prior to lease expiration moves to subnet B, where they obtain a new
>>> IP address successfully. The problem that I am seeing is that after
>>> the move to subnet B, one or more of our caching only name servers
>>> are still returning the old IP address when a lookup of the hostname
>>> occurs. This behavior seems reasonable at first glance since caching
>>> only servers should retain the information they have in cache until
>>> the TTL expires and/or the cache is flushed. After digging into this
>>> further, I'm finding that that the TTL for the hosts whose forward
>>> lookups are returning the wrong IP are set to 604800 seconds or 168
>>> hours. I've determined this by dumping / viewing the cache. In
>>> addition, I've also discovered that the TTL for the reverse record
>>> for the same client is also set to this high value. This behavior
>>> would seem reasonable if this high value was the TTL value configured
>>> for the domain, which is not the case here. We have the default TTL
>>> in our environment set for 10800 seconds or 4 hours. Thus, I'm a
>>> little baffled as to why the TTL for some of these DHCP clients are
>>> being set to such a high value when other clients have their TTL's
>>> set
>>> to the 10800 v alue configured at
>>> the domain level. I've checked the registration at the ob ject level
>>> (in our IP management application) and the TTL field is blank, thu s
>>>
>> implying the default TTL is in place.
>>
>>> Aside from the above details, I can also note that the problematic
>>> lookups se em to involve the same DHCP clients. The only reason I
>>> know about these clie nts is that they are unable to SSH to some Unix
>>> boxes in a DMZ that restrict access to hosts that they can perform
>>>
>> both forward and reverse lookups for.
>>
>>> In this scenario, the forward lookup is failing since it's returning
>>> the old IP address of the client. When this problem occurs, it tends
>>> to affect one o r two of the caching servers but not all three.
>>> Furthermore, it is somewhat random as to which of the 3 servers are
>>>
>> affected.
>>
>>> The caching servers in question are all Solaris 9 running BIND 9.3.2
>>>
>>> If anyone can provide some insight here, it would be much
>>> appreciated.
>>>
>>> I can provide additional information and/or elaborate on
>>> something as
>>>
>> needed.
>>
>>> Bill Smith
>>> <mailto:bill.smith at jhuapl.edu>
>>> ISS Server Systems Group
>>> Johns Hopkins University Applied Physics Laboratory 11100 Johns
>>> Hopkins Road Laurel, MD 20723
>>> Phone: 443-778-5523
>>> Web: http://www.jhuapl.edu <http://www.jhuapl.edu/>
>>>
>> Nameservers do what the dhcp servers tell them to do. The TTL
>> is set by the DHCP server. Try lowering the dhcp lease time as
>> that influences the DNS TTL.
>>
>
>
> In an environment where people can wander with their laptops from
> subnet to subnet, why do you have caching only name servers?
>
> These name servers should, at least, have the local zones defined as
> forward or stub zones to minimize the amount of erroneous data being
> returned in a volatile environment.
>
Uh, how will that help? Caching still occurs -- and TTLs are honored --
even for names in "forward" or "stub" zones.
The only way I can think of to speed up this propagation, short of
reducing the TTLs that are set by the DHCP server, or running a modified
version of BIND (e.g. QIP's version, in which secondaries can receive
Dynamic Updates), or an out-of-band replication mechanism, is to set up
all of the servers as stealth slaves enumerated in the relevant
also-notify(s), so that the changes should replicate fairly quickly.
- Kevin
More information about the bind-users
mailing list