Bind 9.2.1 cached glue records not updating

Mark_Andrews at isc.org Mark_Andrews at isc.org
Wed Feb 19 04:58:44 UTC 2003


> I have an odd problem with Bind 9.2.1 on some fairly busy servers, and I 
> was hoping someone will have an idea of a fix.
> 
> It seems that on my busiest servers, some would say almost overloaded 
> servers, the NS records for cached entries don't update if the glue 
> record changes.  It only happens to entries already cached, and I only 
> see the issue when someone changes nameserver providers and we don't 
> pick up the change, even after several times the TTL of the glue record.
> 
> An example:  domain.com is hosted at site1.com, and changes providers to 
> site2.com.  Usually, after the TTL for the glue record expires, the 
> server will look up the NS record from the root servers again and find 
> that it should go to site2.com and get the new DNS data.
> 
> This works fine on my less loaded servers that have the exact same 
> hardware setup and software compile.  The busy servers, however, run 
> down their TTL and refresh the glue record from the same ISP (site1.com) 
> that they had before, even though the registrar shows it to be different 
> and the rest of the Internet seems to have caught the change.
> 
> I wouldn't worry about it, except that the busy servers don't ever 
> update while the under utilized servers get the updates in a timely 
> manner.  (overloaded is about 90% cpu and roughly 1500 queries per 
> second).  The only solution that I have found to update the cached glue 
> records is to flush the cache, and with 750MB of cache, that is really 
> annoying.
> 
> I have looked at the Bind 9.2.2rc1 code and think that maybe it has 
> something to do with the RTT not aging properly in 9.2.1, but I cannot 
> be sure that is the issue being seen here.  My engineering and QA 
> departments can take quite a while to get bugfix code out to the feild, 
> so I need to be sure this is the (best) fix before I push it through.  
> It is also a bug that cannot be recreated in-house due to its nature 
> that has been observed.
> 
> Any ideas?  Does anyone know if the problem seen is actually the RTT 
> aging issue?  And if not, what might it be?
> 
> Any suggestions would be appreciated.
> 
> -Steve

	The NS/A/AAAA records are being refreshed by the old servers
	for the zone with old content.  The old servers should be
	turned off or made to serve new content until all the cached
	NS/A/AAAA RRsets have expired then be turned off.  Essentially
	you are seeing the results of zone mis-management by allowing
	multiple version of the zone to exist.

	The fix is to not refresh potential glue (NS/A/AAAA) if it is
	the same as the existing glue but rather let it expire and
	be relearnt.

	Mark
--
Mark Andrews, Internet Software Consortium
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: Mark.Andrews at isc.org


More information about the bind-users mailing list