Happy holidays from ISC!
ISC is fortunate to have staff members in so many different countries around the world: our software development benefits from all the different perspectives - and we benefit personally!
Read postWe have recently identified several SELinux-specific issues with our BIND RPM packages. This (rather technical) blog post describes the issues we found, how we got down to the bottom of those issues, and the decisions we made based on our findings.
For a while now, ISC has been publishing BIND RPM packages for supported
versions of Red Hat Enterprise Linux and CentOS. Since these operating systems
come with a default SELinux policy which relates to the named
process, from the
outset we wanted our packages to comply with that policy. Every SELinux policy
contains rules which assign specific SELinux contexts to various files,
directories, processes, sockets, etc. These contexts, which can be thought of
as labels assigned to different entities, are then used for defining allowed
interactions. Since SELinux rules defining contexts for files and directories
are path-based and we package BIND in the form of Software Collections,
which (among other things) means that our binaries are installed in different
locations than those used by stock RHEL/CentOS packages, the binaries we ship in
BIND SCLs are not automatically assigned proper SELinux contexts. Fortunately,
it is possible to set up so-called file context equivalency rules, which
allow SELinux rules specified for a certain part of the filesystem to be reused
for a different part of the filesystem. That is what we do in the
post-installation scriptlet of the isc-bind
metapackage in order to apply the
SELinux rules specified in the stock SELinux policy to the files and directories
installed by our SCLs.
Until recently, this worked like a charm. However, when we released BIND
9.15.6, it turned out that on CentOS 6, running /etc/init.d/isc-bind-named start
no longer allowed named
to be started… unless SELinux policy
enforcement was disabled (e.g. using setenforce 0
). At the same time,
everything was working fine on CentOS 7 and CentOS 8. This was something we
had not seen before, so we had to find out what was happening.
When SELinux is a suspected culprit of something not working, the first step is
to– no, not to disable it! The first step is to look at the contents of
/var/log/audit/audit.log
to see what is not working. In the case of BIND
9.15.6 on CentOS 6, the following AVC denial was telling:
497 fsgid=497 tty=(none) ses=2 comm="isc-worker0000" exe="/opt/isc/isc-bind/root/usr/sbin/named" subj=unconfined_u:system_r:named_t:s0 key=(null)
type=AVC msg=audit(1577961466.346:33): avc: denied { write } for pid=1831 comm="isc-worker0000" path="[eventfd]" dev=anon_inodefs ino=3853 scontext=unconfined_u:system_r:named_t:s0 tcontext=system_u:object_r:anon_inodefs_t:s0 tclass=file
The key part of the above message was [eventfd]
because it immediately rang a
bell: the big change in BIND 9.15.6 was the introduction of the new network
manager which adds a dependency on libuv and uses the latter instead of
custom networking code. The purpose of this move was to simplify the source code
while retaining portability and also to allow us to more conveniently implement
new networking-related features, like DNS-over-HTTPS support. Anyway, on Linux,
libuv’s event loop uses eventfds internally, so it is not surprising that
named
tried to use an eventfd upon startup. But why was that attempt blocked
by SELinux only on CentOS 6?
First, we verified that named_t
processes are indeed not allowed to write to
anon_inodefs_t
files:
# sesearch --allow --source named_t --target anon_inodefs_t
Found 2 semantic av rules:
allow named_t file_type : filesystem getattr ;
allow named_t filesystem_type : filesystem getattr ;
Since SELinux works on a deny-by-default basis and this denial is not triggered on CentOS 7 and later, our first guess was that something in the default SELinux policy changed between CentOS 6 and 7. Thus, we ran the same command on CentOS 7, but the result was surprising:
# sesearch --allow --source named_t --target anon_inodefs_t
Found 6 semantic av rules:
allow system_bus_type filesystem_type : dir { getattr search open } ;
allow named_t file_type : filesystem getattr ;
allow named_t filesystem_type : filesystem getattr ;
allow domain file_type : file map ;
allow domain file_type : chr_file map ;
allow domain file_type : blk_file map ;
While there are indeed some differences in this part of the SELinux policy, none
of the above rules allows named
to write to an eventfd. Yet, everything was
working as intended. So what was making it possible? Since SELinux policy
enforcement happens in the kernel, another theory we had was that some kernel
change was responsible for “fixing” the issue on CentOS 7+. To minimize the
time needed to verify that theory, we used ELRepo to install kernel 4.4 on
CentOS 6 while still using the stock SELinux policy shipped with that operating
system. It turned out that rebooting with kernel 4.4 in use enabled
/etc/init.d/isc-bind-named start
to work again on CentOS 6 with SELinux in
enforcing mode.
Since 4.4 is much newer than 3.10 (the default CentOS 7 kernel), we could not
be 100% sure that a change in kernel code was the root cause of our named
issue being magically fixed in CentOS 7. Curiosity pushed us forward and thus
we performed a git bisect
session on the kernel source tree, trying to figure
out which specific change in the kernel code was the culprit for the phenomenon
we were observing. It turned out it was commit
3836a03d978e68b0ae00d3589089343c998cd4ff, a one-line change merged into
Linux 2.6.33 (just one minor version later than 2.6.32, on which the CentOS 6
kernel is based):
diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 9f0bf13291e5..2de009565d8e 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -209,6 +209,7 @@ static struct inode *anon_inode_mkinode(void)
inode->i_mode = S_IRUSR | S_IWUSR;
inode->i_uid = current_fsuid();
inode->i_gid = current_fsgid();
+ inode->i_flags |= S_PRIVATE;
inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
return inode;
}
The commit log message does a good job of explaining why this change was introduced:
Inotify was switched to use anon_inode instead of its own private filesystem which only had one inode in commit c44dcc56d2b5c7 “switch inotify_user to anon_inode”
The problem with this is that now the inotify inode is not a distinct inode which can be managed by LSMs. userspace tools which use inotify were allowed to use the inotify inode but may not have had permission to do read/write type operations on the anon_inode. After looking at the anon_inode and its users it looks like the best solution is to just mark the anon_inode as S_PRIVATE so the security system will ignore it.
It all made sense now. Previously, the SELinux policy allowed specific tools to access the inotify inode, which was a distinct inode type. After the inotify code had been updated to use a more “common” inode type, it was decided that instead of updating existing SELinux policies so that they matched this code change, the kernel should be modified to treat inodes of type anon_inode as “transparent” to Linux Security Modules like SELinux. In other words, no SELinux policy rule was required anymore to allow a confined process to access an anon_inode inode (like an eventfd). This is why BIND 9.15.6 is allowed to start on CentOS 7+, but not on CentOS 6.
Since CentOS 6 will reach end-of-life in less than a year, we decided not to add any workaround for this issue to our RPM packages. We believe that the number of users who would like to use a very recent version of BIND on an operating system released almost nine years ago and requiring SELinux in enforcing mode is very low. If you are such a user, you have a few options:
To build a custom SELinux policy module containing the change required for BIND
9.15.6+ to run on a stock CentOS 6 kernel, start with putting the following
contents in a file called isc-bind-named-centos6.te
:
module isc-bind-named-centos6 1.0;
require {
type anon_inodefs_t;
type named_t;
class file { read write };
}
allow named_t anon_inodefs_t : file { read write };
Then build and install the module by running the following commands:
checkmodule -M -m -o isc-bind-named-centos6.mod isc-bind-named-centos6.te
semodule_package -o isc-bind-named-centos6.pp -m isc-bind-named-centos6.mod
semodule -i isc-bind-named-centos6.pp
To verify that the module has been successfully installed, run:
# semodule -l | grep isc-bind-named-centos6
isc-bind-named-centos6 1.0
You should now be able to start named
using /etc/init.d/isc-bind-named start
on a stock CentOS 6 kernel with SELinux in enforcing mode.
While investigating the issue described above, we made another unsettling
discovery. Here is what we observed on CentOS 7 after running systemctl start isc-bind-named
:
# ps -eZ | grep named
system_u:system_r:unconfined_service_t:s0 4966 ? 00:00:00 named
Compare this with CentOS 6 output after running /etc/init.d/isc-bind-named start
:
# ps -eZ | grep named
unconfined_u:system_r:named_t:s0 2371 ? 00:00:00 named
But we did everything according to the SELinux-related sections of the Software
Collection Packaging Guide! So what was happening? This issue was caused by
the way SCL-provided service binaries are started. When a user needs to start an
SCL-provided application directly, they should prefix the command line with
scl enable <scl-name> --
. This was what we used in our systemd unit file
for named
and things seemed to work fine. However, we never checked whether
the named
process transitions to the expected SELinux domain (named_t
).
The default RHEL/CentOS SELinux policy includes a set of so-called domain
transition rules which specify which file contexts are entrypoints for which
SELinux domains. The list of such rules for the named_exec_t
file context
(which is what the named
binary is labeled with) that are present in the
default CentOS 6 policy is:
# sesearch --type --target named_exec_t
Found 10 semantic te rules:
type_transition system_dbusd_t named_exec_t : process named_t;
type_transition glusterd_t named_exec_t : process named_t;
type_transition cluster_t named_exec_t : process named_t;
type_transition condor_startd_t named_exec_t : process named_t;
type_transition initrc_t named_exec_t : process named_t;
type_transition cobblerd_t named_exec_t : process named_t;
type_transition NetworkManager_t named_exec_t : process named_t;
type_transition openshift_initrc_t named_exec_t : process named_t;
type_transition piranha_pulse_t named_exec_t : process named_t;
type_transition init_t named_exec_t : process named_t;
Let’s look at their CentOS 7 counterparts:
# sesearch --type --target named_exec_t
Found 15 semantic te rules:
type_transition system_dbusd_t named_exec_t : process named_t;
type_transition kdumpctl_t named_exec_t : process named_t;
type_transition system_cronjob_t named_exec_t : process named_t;
type_transition ipsec_mgmt_t named_exec_t : process named_t;
type_transition crond_t named_exec_t : process named_t;
type_transition initrc_t named_exec_t : process named_t;
type_transition init_t named_exec_t : process named_t;
type_transition condor_startd_t named_exec_t : process named_t;
type_transition NetworkManager_t named_exec_t : process named_t;
type_transition cluster_t named_exec_t : process named_t;
type_transition piranha_pulse_t named_exec_t : process named_t;
type_transition glusterd_t named_exec_t : process named_t;
type_transition cobblerd_t named_exec_t : process named_t;
type_transition dnssec_trigger_t named_exec_t : process named_t;
type_transition openshift_initrc_t named_exec_t : process named_t;
Even though these look similar to CentOS 6 rules, the SELinux aspect of starting
services on CentOS 7 (with systemd) differs a lot from CentOS 6 (with its
SysVinit scripts). When systemctl start <service>
is run, the systemd
process (PID 1) is told to spawn a process by running the command provided in
the relevant unit file. PID 1 runs in SELinux domain init_t
:
# ps -eZ | grep systemd$
system_u:system_r:init_t:s0 1 ? 00:00:01 systemd
Thus, since this rule in present in the default SELinux policy:
type_transition init_t named_exec_t : process named_t;
when a process running in the init_t
domain executes a binary labeled with the
named_exec_t
context, it should transition to the named_t
domain, right?
Well, yes - but the catch is that when the /usr/bin/scl enable ...
construct
is used in a systemd unit file, the binary which PID 1 executes is
/usr/bin/scl
- and that binary is labeled with:
# ls -Z /usr/bin/scl
-rwxr-xr-x. root root system_u:object_r:bin_t:s0 /usr/bin/scl
As you can see above, there is no transition rule which specifies bin_t
as an
entrypoint for the named_t
domain. There is, however, a rule which specifies
which domain a process should transition to if it is currently running in the
init_t
domain and it executes a bin_t
binary:
# sesearch --type --source init_t --target bin_t
Found 1 semantic te rules:
type_transition init_t bin_t : process unconfined_service_t;
Found 1 named file transition filename_trans:
type_transition init_t bin_t : dir cupsd_rw_etc_t "inf";
This is why using scl enable ...
in a systemd unit file causes the service to
be run as an unconfined one. A post from Dan Walsh’s Blog explains
why this approach was chosen for CentOS 7 on.
This is different from what happens on CentOS 6. When init
(PID 1), which
runs in SELinux domain init_t
, executes an init script (most of which -
including /etc/init.d/isc-bind-named
- are labeled with the initrc_exec_t
context), it first transitions to the initrc_t
domain:
# sesearch --type --source init_t --target initrc_exec_t
Found 1 semantic te rules:
type_transition init_t initrc_exec_t : process initrc_t;
When the init script invokes /usr/bin/scl
(labeled with bin_t
), no domain
transition happens:
# sesearch --type --source initrc_t --target bin_t
Thus, /usr/bin/scl
still runs in the initrc_t
domain, which allows named
to transition to the named_t
domain when it is executed:
# sesearch --type --source initrc_t --target named_exec_t
Found 1 semantic te rules:
type_transition initrc_t named_exec_t : process named_t;
This is why the problem is not triggered on CentOS 6.
Fortunately for us, named
does not need any special environment to be
set up in order to start; the scl enable ...
prefix can simply be
removed from the ExecStart
line of the systemd unit file and the service will
still be able to start, even though the shared libraries it is linked against
are installed in a non-standard location. This is possible thanks to the use of
-rpath
in the build process:
# objdump -x /opt/isc/isc-bind/root/usr/sbin/named | grep RPATH
RPATH /opt/isc/isc-bind/root/usr/lib64
Thus, we removed the scl enable ...
prefix from the unit file shipped
with BIND SCLs and this is no longer an issue with the latest versions of our
packages. We also prepared a pull request for the Software Collection
Packaging Guide so that the latter briefly mentions the issue at hand.
Hopefully this will prevent other packagers from encountering the same pitfall.
We also added new checks to our CI pipelines to ensure the problem does not
silently reoccur.
Shortly after releasing the next BIND version, 9.15.7, we received a
report of a permissions-related issue which we missed in our testing
because on CentOS 6 its symptoms were similar to the BIND 9.15.6 eventfd issue
discussed above and it was not triggered on CentOS 7 because of the unconfined
domain issue also discussed above. What happens is that named
attempts to
create Unix domain sockets in /tmp
for the purpose of passing TCP sockets
between threads. While cumbersome, apparently using uv_pipe
structures
(which are implemented using Unix domain sockets on Unix platforms) is what
libuv needs to pass sockets between threads (not processes!) ever since
functions allowing a simpler approach were removed. Unfortunately,
named
connecting to Unix domain sockets causes trouble when certain security
mechanisms are in place. To avoid this problem, BIND 9.15.8 uses internal
libuv functions in order to pass TCP sockets between threads without
employing an IPC channel for that purpose.
As the above examples show, SELinux can be tricky to get right, but that does not mean your knee-jerk reaction to the issues it causes should be to disable it. As software evolves, SELinux rules confining it may need to be tweaked. Unfortunately, debugging differences in SELinux behavior between various operating systems may prove challenging due to the number of components involved (kernel code, policy in effect, labels used, etc.). Despite these obstacles, ISC will keep on trying to support SELinux in the BIND packages we publish. If you run into any further problems with our packages, make sure to let us know by opening a GitLab issue.
What's New from ISC