BIND 9.7 Serial Number Decrease Problem

Mark Andrews marka at isc.org
Mon Jun 6 00:52:52 UTC 2011


In message <4DE9045C.2050509 at anl.gov>, Barry Finkel writes:
> I have a problem with BIND 9.7.x on Ubuntu.
> I have two servers that are running 9.7.3.
> They slave 332 zones, and they also master 213,750
> malware/spyware zones that we have defined to reroute these
> domains to a local machine.
> 
> When I was upgrading the BIND to 9.7.3-P1 yesterday, an
> 
>       ./rndc stop
> 
> command ran over 8 minutes, and named did not stop.
> A "kill" command did not work; I had to revert to a
> "kill -9" command.  What was BIND doing?  Gracefully
> closing all of the zones?

Most probably.  "rndc stop" ensures that masterfiles are up-to-date
before exiting.  "rndc halt" does not try to flush master files
before exiting.

There could also have been a reference leak causing named to not
stop.

>  BIND 9.7.3-P1 came up fine, but there are two things that concern me:
> 
> 1) After BIND began responding to queries, it was using
>     100% of the CPU for about three minutes.  I am not sure what
>     BIND was doing.  This is not major because BIND was handling
>     customer queries, and after the three minutes the CPU usage
>     dropped to a normal 1%.
> 
> 2) Two zones reported serial number decreases.  This is bad.
> 
> I did some research on the two zones - both Microsoft
> Active Directory zones (one _tcp and one _udp) that are mastered
> on a Windows Domain Controller and slaved on my BIND boxes.
> I have around 44 AD zones I slave, and only these two reported
> problems - on my two internal Ubuntu slaves and my two Solaris 10
> slaves.  The two Solaris 10 slaves do not run the spyware zones,
> so I had no problem with "./rndc stop".  I therefore am not sure
> that the serial number problems are due to the "kill -9".

They shouldn't be.  The handling of master files and journals is
designed to have the power be pull at anytime provided the filesystem
supports atomic replacement of files.

> I looked at the serial number issue on these two zones in detail;
> I capture the serial numbers on all the AD zones each morning at
> 6:10.  Here is information for the _tcp zone:
> 
>       Date        Zone  Mast Slav Slav
>       20 Oct 2010 _tcp. 1233 1233 1233
>       21 Oct 2010 _tcp. 1239 1239 1239 The master incremented the serial.
>       ...
>       09 Nov 2010 _tcp. 1239 1239 1239
>       10 Nov 2010 _tcp. 1238 1239 1239 Master decreased due to MS patch
>       11 Nov 2010 _tcp. 1238 1238 1238
>       ...
>       03 Dec 2010 _tcp. 1238 1238 1238
>       04 Dec 2010 _tcp. 1238 1238 1239 ??
>       05 Dec 2010 _tcp. 1238 1239 1238 ??
>       06 Dec 2010 _tcp. 1238 1238 1238
>       ...
>       09 Dec 2010 _tcp. 1238 1238 1238
>       10 Dec 2010 _tcp. 1238 1238 1239 ??
>       11 Dec 2010 _tcp. 1238 1239 1238 ??
>       12 Dec 2010 _tcp. 1238 1238 1238
>       ...
>       05 Jan 2011 _tcp. 1238 1238 1238
>       06 Jan 2011 _tcp. 1238 1239 1239 ??
>       07 Jan 2011 _tcp. 1238 1238 1238
>       ...
>       02 Mar 2011 _tcp. 1238 1238 1238 Upgrade 9.7.2-P3 to 9.7.3
>       03 Mar 2011 _tcp. 1238 1239 1239
>       04 Mar 2011 _tcp. 1238 1238 1238
>       ...
>       16 Apr 2011 _tcp. 1238 1238 1238
>       17 Apr 2011 _tcp. 1238 1238 1238 1238 1238 Two Sol10 slaves added.
>       ...
>       02 Jun 2011 _tcp. 1238 1238 1238 1238 1238 Upgrade 9.7.3 to 9.7.3-P1
>       03 Jun 2011 _tcp. 1238 1239 1239 1239 1239
> 
> Both Ubuntu slaves have been up for 149 days (reboot around Jan 15).
> The zone serial was 1239 until a MS patch run on the Domain
> Controller decreased the serial by one on the evening of Nov 9.
> I did nothing to correct the problem; I waited for the two zones
> to expire, and then new zones were transferred from the Windows
> master server.  The serial number was 1238 on the master and
> slaves.  On a few days, the serial on the slaves increased
> by one, and I am not sure what happened on those days.
> 
> On Mar 02 I upgraded BIND from 9.7.2-P3 to 9.7.3, and the
> serial numbers on the two upgraded BIND slaves reverted to the
> higher 1239 serial.  Again, I did no fixup, and on Mar 04
> the serials were the same at the lower value.  I think that the
> serial number decrease was temporary during the patch run.
> On Apr 17 I added the two Solaris 10 slaves to my morning report, and
> all five serials were contant at 1238 until I upgraded BIND Tuesday (on
> the Solaris 10 boxes) and yesterday (on the Ubuntu boxes).  Immediately
> after the upgrade BIND reported the serial number problem on these two
> zones.  The other AD zones have had no serial number problems.
> 
> I have no idea why BIND would remember the increased 1239
> serial number, when the serial number for the zone has been constant
> at 1238 since Mar 04.  I have to assume that between Mar 04 and
> Jun 03 BIND would have written the zone to disk, either in the
> base zone file or a .jnl file.
> 
> -- 
> ----------------------------------------------------------------------
> Barry S. Finkel
> Computing and Information Systems Division
> Argonne National Laboratory          Phone:    +1 (630) 252-7277
> 9700 South Cass Avenue               Facsimile:+1 (630) 252-4601
> Building 240, Room 5.B.8             Internet: BSFinkel at anl.gov
> Argonne, IL   60439-4828             IBMMAIL:  I1004994
> _______________________________________________
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: marka at isc.org



More information about the bind-users mailing list