SELinux Struggles with BIND Startup

We have recently identified several SELinux-specific issues with our BIND RPM packages. This (rather technical) blog post describes the issues we found, how we got down to the bottom of those issues, and the decisions we made based on our findings.

RPMs, SCLs, LSMs…

For a while now, ISC has been publishing BIND RPM packages for supported versions of Red Hat Enterprise Linux and CentOS. Since these operating systems come with a default SELinux policy which relates to the named process, from the outset we wanted our packages to comply with that policy. Every SELinux policy contains rules which assign specific SELinux contexts to various files, directories, processes, sockets, etc. These contexts, which can be thought of as labels assigned to different entities, are then used for defining allowed interactions. Since SELinux rules defining contexts for files and directories are path-based and we package BIND in the form of Software Collections, which (among other things) means that our binaries are installed in different locations than those used by stock RHEL/CentOS packages, the binaries we ship in BIND SCLs are not automatically assigned proper SELinux contexts. Fortunately, it is possible to set up so-called file context equivalency rules, which allow SELinux rules specified for a certain part of the filesystem to be reused for a different part of the filesystem. That is what we do in the post-installation scriptlet of the isc-bind metapackage in order to apply the SELinux rules specified in the stock SELinux policy to the files and directories installed by our SCLs.

The BIND 9.15.6 Mystery

Until recently, this worked like a charm. However, when we released BIND 9.15.6, it turned out that on CentOS 6, running /etc/init.d/isc-bind-named start no longer allowed named to be started… unless SELinux policy enforcement was disabled (e.g. using setenforce 0). At the same time, everything was working fine on CentOS 7 and CentOS 8. This was something we had not seen before, so we had to find out what was happening.

When SELinux is a suspected culprit of something not working, the first step is to– no, not to disable it! The first step is to look at the contents of /var/log/audit/audit.log to see what is not working. In the case of BIND 9.15.6 on CentOS 6, the following AVC denial was telling:

497 fsgid=497 tty=(none) ses=2 comm="isc-worker0000" exe="/opt/isc/isc-bind/root/usr/sbin/named" subj=unconfined_u:system_r:named_t:s0 key=(null)
type=AVC msg=audit(1577961466.346:33): avc:  denied  { write } for  pid=1831 comm="isc-worker0000" path="[eventfd]" dev=anon_inodefs ino=3853 scontext=unconfined_u:system_r:named_t:s0 tcontext=system_u:object_r:anon_inodefs_t:s0 tclass=file

The key part of the above message was [eventfd] because it immediately rang a bell: the big change in BIND 9.15.6 was the introduction of the new network manager which adds a dependency on libuv and uses the latter instead of custom networking code. The purpose of this move was to simplify the source code while retaining portability and also to allow us to more conveniently implement new networking-related features, like DNS-over-HTTPS support. Anyway, on Linux, libuv’s event loop uses eventfds internally, so it is not surprising that named tried to use an eventfd upon startup. But why was that attempt blocked by SELinux only on CentOS 6?

First, we verified that named_t processes are indeed not allowed to write to anon_inodefs_t files:

# sesearch --allow --source named_t --target anon_inodefs_t
Found 2 semantic av rules:
   allow named_t file_type : filesystem getattr ; 
   allow named_t filesystem_type : filesystem getattr ; 

Since SELinux works on a deny-by-default basis and this denial is not triggered on CentOS 7 and later, our first guess was that something in the default SELinux policy changed between CentOS 6 and 7. Thus, we ran the same command on CentOS 7, but the result was surprising:

# sesearch --allow --source named_t --target anon_inodefs_t
Found 6 semantic av rules:
   allow system_bus_type filesystem_type : dir { getattr search open } ; 
   allow named_t file_type : filesystem getattr ; 
   allow named_t filesystem_type : filesystem getattr ; 
   allow domain file_type : file map ; 
   allow domain file_type : chr_file map ; 
   allow domain file_type : blk_file map ; 

While there are indeed some differences in this part of the SELinux policy, none of the above rules allows named to write to an eventfd. Yet, everything was working as intended. So what was making it possible? Since SELinux policy enforcement happens in the kernel, another theory we had was that some kernel change was responsible for “fixing” the issue on CentOS 7+. To minimize the time needed to verify that theory, we used ELRepo to install kernel 4.4 on CentOS 6 while still using the stock SELinux policy shipped with that operating system. It turned out that rebooting with kernel 4.4 in use enabled /etc/init.d/isc-bind-named start to work again on CentOS 6 with SELinux in enforcing mode.

Since 4.4 is much newer than 3.10 (the default CentOS 7 kernel), we could not be 100% sure that a change in kernel code was the root cause of our named issue being magically fixed in CentOS 7. Curiosity pushed us forward and thus we performed a git bisect session on the kernel source tree, trying to figure out which specific change in the kernel code was the culprit for the phenomenon we were observing. It turned out it was commit 3836a03d978e68b0ae00d3589089343c998cd4ff, a one-line change merged into Linux 2.6.33 (just one minor version later than 2.6.32, on which the CentOS 6 kernel is based):

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 9f0bf13291e5..2de009565d8e 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -209,6 +209,7 @@ static struct inode *anon_inode_mkinode(void)
 	inode->i_mode = S_IRUSR | S_IWUSR;
 	inode->i_uid = current_fsuid();
 	inode->i_gid = current_fsgid();
+	inode->i_flags |= S_PRIVATE;
 	inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
 	return inode;
 }

The commit log message does a good job of explaining why this change was introduced:

Inotify was switched to use anon_inode instead of its own private filesystem which only had one inode in commit c44dcc56d2b5c7 “switch inotify_user to anon_inode”

The problem with this is that now the inotify inode is not a distinct inode which can be managed by LSMs. userspace tools which use inotify were allowed to use the inotify inode but may not have had permission to do read/write type operations on the anon_inode. After looking at the anon_inode and its users it looks like the best solution is to just mark the anon_inode as S_PRIVATE so the security system will ignore it.

It all made sense now. Previously, the SELinux policy allowed specific tools to access the inotify inode, which was a distinct inode type. After the inotify code had been updated to use a more “common” inode type, it was decided that instead of updating existing SELinux policies so that they matched this code change, the kernel should be modified to treat inodes of type anon_inode as “transparent” to Linux Security Modules like SELinux. In other words, no SELinux policy rule was required anymore to allow a confined process to access an anon_inode inode (like an eventfd). This is why BIND 9.15.6 is allowed to start on CentOS 7+, but not on CentOS 6.

How to Make BIND 9.15.6+ Work With SELinux on CentOS 6

Since CentOS 6 will reach end-of-life in less than a year, we decided not to add any workaround for this issue to our RPM packages. We believe that the number of users who would like to use a very recent version of BIND on an operating system released almost nine years ago and requiring SELinux in enforcing mode is very low. If you are such a user, you have a few options:

  • install a custom SELinux policy module (see below),
  • update to a more recent kernel,
  • rebuild stock CentOS 6 kernel with commit 3836a03d978e68b0ae00d3589089343c998cd4ff reverted.

To build a custom SELinux policy module containing the change required for BIND 9.15.6+ to run on a stock CentOS 6 kernel, start with putting the following contents in a file called isc-bind-named-centos6.te:

module isc-bind-named-centos6 1.0;

require {
	type anon_inodefs_t;
	type named_t;
	class file { read write };
}

allow named_t anon_inodefs_t : file { read write };

Then build and install the module by running the following commands:

checkmodule -M -m -o isc-bind-named-centos6.mod isc-bind-named-centos6.te
semodule_package -o isc-bind-named-centos6.pp -m isc-bind-named-centos6.mod
semodule -i isc-bind-named-centos6.pp

To verify that the module has been successfully installed, run:

# semodule -l | grep isc-bind-named-centos6
isc-bind-named-centos6	1.0

You should now be able to start named using /etc/init.d/isc-bind-named start on a stock CentOS 6 kernel with SELinux in enforcing mode.

The “SCL + systemd + SELinux” Pitfall

While investigating the issue described above, we made another unsettling discovery. Here is what we observed on CentOS 7 after running systemctl start isc-bind-named:

# ps -eZ | grep named
system_u:system_r:unconfined_service_t:s0 4966 ? 00:00:00 named

Compare this with CentOS 6 output after running /etc/init.d/isc-bind-named start:

# ps -eZ | grep named
unconfined_u:system_r:named_t:s0 2371 ?        00:00:00 named

But we did everything according to the SELinux-related sections of the Software Collection Packaging Guide! So what was happening? This issue was caused by the way SCL-provided service binaries are started. When a user needs to start an SCL-provided application directly, they should prefix the command line with scl enable <scl-name> --. This was what we used in our systemd unit file for named and things seemed to work fine. However, we never checked whether the named process transitions to the expected SELinux domain (named_t).

The default RHEL/CentOS SELinux policy includes a set of so-called domain transition rules which specify which file contexts are entrypoints for which SELinux domains. The list of such rules for the named_exec_t file context (which is what the named binary is labeled with) that are present in the default CentOS 6 policy is:

# sesearch --type --target named_exec_t
Found 10 semantic te rules:
   type_transition system_dbusd_t named_exec_t : process named_t; 
   type_transition glusterd_t named_exec_t : process named_t; 
   type_transition cluster_t named_exec_t : process named_t; 
   type_transition condor_startd_t named_exec_t : process named_t; 
   type_transition initrc_t named_exec_t : process named_t; 
   type_transition cobblerd_t named_exec_t : process named_t; 
   type_transition NetworkManager_t named_exec_t : process named_t; 
   type_transition openshift_initrc_t named_exec_t : process named_t; 
   type_transition piranha_pulse_t named_exec_t : process named_t; 
   type_transition init_t named_exec_t : process named_t; 

Let’s look at their CentOS 7 counterparts:

# sesearch --type --target named_exec_t
Found 15 semantic te rules:
   type_transition system_dbusd_t named_exec_t : process named_t; 
   type_transition kdumpctl_t named_exec_t : process named_t; 
   type_transition system_cronjob_t named_exec_t : process named_t; 
   type_transition ipsec_mgmt_t named_exec_t : process named_t; 
   type_transition crond_t named_exec_t : process named_t; 
   type_transition initrc_t named_exec_t : process named_t; 
   type_transition init_t named_exec_t : process named_t; 
   type_transition condor_startd_t named_exec_t : process named_t; 
   type_transition NetworkManager_t named_exec_t : process named_t; 
   type_transition cluster_t named_exec_t : process named_t; 
   type_transition piranha_pulse_t named_exec_t : process named_t; 
   type_transition glusterd_t named_exec_t : process named_t; 
   type_transition cobblerd_t named_exec_t : process named_t; 
   type_transition dnssec_trigger_t named_exec_t : process named_t; 
   type_transition openshift_initrc_t named_exec_t : process named_t; 

Even though these look similar to CentOS 6 rules, the SELinux aspect of starting services on CentOS 7 (with systemd) differs a lot from CentOS 6 (with its SysVinit scripts). When systemctl start <service> is run, the systemd process (PID 1) is told to spawn a process by running the command provided in the relevant unit file. PID 1 runs in SELinux domain init_t:

# ps -eZ | grep systemd$
system_u:system_r:init_t:s0         1 ?        00:00:01 systemd

Thus, since this rule in present in the default SELinux policy:

   type_transition init_t named_exec_t : process named_t; 

when a process running in the init_t domain executes a binary labeled with the named_exec_t context, it should transition to the named_t domain, right?

Well, yes - but the catch is that when the /usr/bin/scl enable ... construct is used in a systemd unit file, the binary which PID 1 executes is /usr/bin/scl - and that binary is labeled with:

# ls -Z /usr/bin/scl
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/scl

As you can see above, there is no transition rule which specifies bin_t as an entrypoint for the named_t domain. There is, however, a rule which specifies which domain a process should transition to if it is currently running in the init_t domain and it executes a bin_t binary:

# sesearch --type --source init_t --target bin_t
Found 1 semantic te rules:
   type_transition init_t bin_t : process unconfined_service_t; 

Found 1 named file transition filename_trans:
type_transition init_t bin_t : dir cupsd_rw_etc_t "inf"; 

This is why using scl enable ... in a systemd unit file causes the service to be run as an unconfined one. A post from Dan Walsh’s Blog explains why this approach was chosen for CentOS 7 on.

This is different from what happens on CentOS 6. When init (PID 1), which runs in SELinux domain init_t, executes an init script (most of which - including /etc/init.d/isc-bind-named - are labeled with the initrc_exec_t context), it first transitions to the initrc_t domain:

# sesearch --type --source init_t --target initrc_exec_t
Found 1 semantic te rules:
   type_transition init_t initrc_exec_t : process initrc_t;

When the init script invokes /usr/bin/scl (labeled with bin_t), no domain transition happens:

# sesearch --type --source initrc_t --target bin_t

Thus, /usr/bin/scl still runs in the initrc_t domain, which allows named to transition to the named_t domain when it is executed:

# sesearch --type --source initrc_t --target named_exec_t
Found 1 semantic te rules:
   type_transition initrc_t named_exec_t : process named_t; 

This is why the problem is not triggered on CentOS 6.

Fortunately for us, named does not need any special environment to be set up in order to start; the scl enable ... prefix can simply be removed from the ExecStart line of the systemd unit file and the service will still be able to start, even though the shared libraries it is linked against are installed in a non-standard location. This is possible thanks to the use of -rpath in the build process:

# objdump -x /opt/isc/isc-bind/root/usr/sbin/named | grep RPATH
  RPATH                /opt/isc/isc-bind/root/usr/lib64

Thus, we removed the scl enable ... prefix from the unit file shipped with BIND SCLs and this is no longer an issue with the latest versions of our packages. We also prepared a pull request for the Software Collection Packaging Guide so that the latter briefly mentions the issue at hand. Hopefully this will prevent other packagers from encountering the same pitfall. We also added new checks to our CI pipelines to ensure the problem does not silently reoccur.

Third Time’s a Charm

Shortly after releasing the next BIND version, 9.15.7, we received a report of a permissions-related issue which we missed in our testing because on CentOS 6 its symptoms were similar to the BIND 9.15.6 eventfd issue discussed above and it was not triggered on CentOS 7 because of the unconfined domain issue also discussed above. What happens is that named attempts to create Unix domain sockets in /tmp for the purpose of passing TCP sockets between threads. While cumbersome, apparently using uv_pipe structures (which are implemented using Unix domain sockets on Unix platforms) is what libuv needs to pass sockets between threads (not processes!) ever since functions allowing a simpler approach were removed. Unfortunately, named connecting to Unix domain sockets causes trouble when certain security mechanisms are in place. To avoid this problem, BIND 9.15.8 uses internal libuv functions in order to pass TCP sockets between threads without employing an IPC channel for that purpose.

As the above examples show, SELinux can be tricky to get right, but that does not mean your knee-jerk reaction to the issues it causes should be to disable it. As software evolves, SELinux rules confining it may need to be tweaked. Unfortunately, debugging differences in SELinux behavior between various operating systems may prove challenging due to the number of components involved (kernel code, policy in effect, labels used, etc.). Despite these obstacles, ISC will keep on trying to support SELinux in the BIND packages we publish. If you run into any further problems with our packages, make sure to let us know by opening a GitLab issue.

Recent Posts

What's New from ISC

Previous post: 2019 In Review