BIND not loading into memory on first transfer

Fri Mar 27 14:54:56 UTC 2015

On Thu, Mar 26, 2015 at 11:34:42AM -0700, Frank Even wrote:
> The subject is about the only way I can think to describe a 
> situation we've run into recently.  First here is the system:
> 
> [root at dns]# cat /etc/redhat-release
> CentOS release 6.6 (Final)
> [root at dns]# rpm -q bind
> bind-9.8.2-0.30.rc1.el6_6.2.x86_64
> 
> So, we got bit by a chroot permissions issue (unsure exactly how it 
> got introduced), where the chroot was owned by root, but had named 
> as the group owner.  Perms were 750 on the dir (rwxr-x---)
> 
> Zone files were in place for the necessary domains, but were 
> outdated (assuming one of our updates broke something somewhere, 
> they were all on average 3 months old).
> 
> We updated some of the boxes, and on restart, named started.
> 
> It initially started loading the 3 month old zone (one frequently
> updated I might add).  The boxes then did a zone transfer from the
> master.  Failing to be able to write the tmp file to the working
> directory, it moved on.

Slave and other dynamic zones do require write privilege in the 
working directory.  Have you fixed this problem yet?

If you're running as user "named", that's the user which must have 
write privilege.  If running as root, root must have explicit 
privilege to write, because named drops superuser capabilities.

I suspect the problem might be SELinux.  Check "getenforce", and 
perhaps restore the context to the working directory (see "man 
restorecon") or disable SELinux if you prefer.

> Here is where the issue is.  I've generally found if BIND fails to
> write the zone, it generally loads it into memory (masking the fact
> that there is an issue here for an extended period of time).

named makes a best effort to get up and running, which it ought to 
do, IMO.  It's not masking anything; the inability to write to the 
working directory has been logged.

> In this particular instance, the masters ended up under maintenance 
> shortly after these boxes rebooted, so they were unable to transfer 
> the zone from them for another 2 hours.  These boxes were serving 
> stale data after boot despite being able to accomplish one zone 
> transfer after boot.  It seems that failing the first zone transfer 
> did NOT load the zone into memory (but subsequent zone transfers 
> while still failing to write the tmp file DID load the zone into 
> memory).
> 
> I guess the question really is, is this expected behavior or a bug?

The bug is a misconfiguration bug, where contrary to documented 
requirements, you have not given named write privilege in its 
directory.

I think you're saying named should fail to load the stale zones, 
which at startup it cannot know are stale.  That does not sound 
correct to me.

Perhaps you're suggesting that named should SERVFAIL the zone when 
the first zone transfer has happened, and while this sounds more 
reasonable, I am not sure that the zone transfer actually does take 
place if named is unable to open a temporary file to write.  (What 
would be the point in talking to the master when you know you are 
unable to handle the data?)
-- 
  http://rob0.nodns4.us/
  Offlist GMX mail is seen only if "/dev/rob0" is in the Subject: