Enable systemd hardening options for named

Wed Jan 31 14:37:24 UTC 2018

Am 31.01.2018 um 15:18 schrieb Petr Menšík:
> as a Fedora maintainer of BIND package, I can say only that SELinux in
> enforcing mode will provide better hardening than most of suggested
> changes. That does not mean they are not useful, but most of them are
> irrelevant with SELinux in enforcing mode. We want all Fedora users to
> run in enforcing mode, especially on servers.
> 
> Especially restricting path access does not make sense with SELinux. It
> is much more powerful and is already used.

it is completly irrelevant because when you switch SELinux to 
"permissive" in case you need to debug something it's gone and hence 
layered-security is always the way to go

the same for service-configuration even if you have iptables running - i 
had a case some years ago when i tried to enable SELinux on my personal 
machine that i found out failed logins in samba *because* SELinux leaded 
for whatever reason in iptables to fail at boot

> Dne 16.1.2018 v 13:52 Daniel Stirnimann napsal(a):
>> Hello all,
>>
>> Just wondering, if one is already using selinux in enforcing mode, does
>> systemd hardening provide any additional benefit?
>>
>> Daniel
>>
>> On 16.01.18 12:21, Ludovic Gasc wrote:
>>> Hi,
>>>
>>> I have merged config files from Tony, Robert, and me.
>>> I have tried to be the most generic, the result below.
>>>
>>> It seems to work here without regression, except a warning:
>>> managed-keys-zone: Unable to fetch DNSKEY set '.': operation canceled
>>>
>>> But only at the first boot, I don't see the message anymore when I
>>> restart the daemon.
>>> Any clue ?
>>>
>>> Thanks for your feedbacks.
>>>
>>> [Unit]
>>> After=network-online.target
>>>
>>> [Service]
>>> Type=simple
>>> TimeoutSec=25
>>> Restart=always
>>> RestartSec=1
>>> User=bind
>>> Group=bind
>>> CapabilityBoundingSet=CAP_NET_BIND_SERVICE
>>> AmbientCapabilities=CAP_NET_BIND_SERVICE
>>> SystemCallFilter=~@mount @debug acct modify_ldt add_key adjtimex
>>> clock_adjtime delete_module fanotify_init finit_module get_mempolicy
>>> init_module io_destroy io_getevents iopl ioperm io_setup io_submit
>>> io_cancel kcmp kexec_load keyctl lookup_dcookie migrate_pages move_pages
>>> open_by_handle_at perf_event_open process_vm_readv process_vm_writev
>>> ptrace remap_file_pages request_key set_mempolicy swapoff swapon uselib
>>> vmsplice
>>>
>>> NoNewPrivileges=true
>>> PrivateDevices=true
>>> PrivateTmp=true
>>> ProtectHome=true
>>> ProtectSystem=strict
>>> ProtectKernelModules=true
>>> ProtectKernelTunables=true
>>> ProtectControlGroups=true
>>> InaccessiblePaths=/home
>>> InaccessiblePaths=/opt
>>> InaccessiblePaths=/root
>>> ReadWritePaths=/run/named
>>> ReadWritePaths=/var/cache/bind
>>> ReadWritePaths=/var/lib/bind
>>>
>>>
>>> --
>>> Ludovic Gasc (GMLudo)
>>>
>>> 2018-01-15 21:14 GMT+01:00 Robert Edmonds <edmonds at mycre.ws
>>> <mailto:edmonds at mycre.ws>>:
>>>
>>>      Tony Finch wrote:
>>>      > Ludovic Gasc <gmludo at gmail.com <mailto:gmludo at gmail.com>> wrote:
>>>      > >
>>>      > > 1. The list of minimal capabilities needed for bind to run correctly:
>>>      > > http://man7.org/linux/man-pages/man7/capabilities.7.html
>>>      <http://man7.org/linux/man-pages/man7/capabilities.7.html>
>>>      >
>>>      > named already drops capabilities - have a look at the code around here:
>>>      > https://source.isc.org/cgi-bin/gitweb.cgi?p=bind9.git;a=blob;f=bin/named/unix/os.c;hb=v9_11_2#l234
>>>      <https://source.isc.org/cgi-bin/gitweb.cgi?p=bind9.git;a=blob;f=bin/named/unix/os.c;hb=v9_11_2#l234>
>>>      >
>>>      > Note that it's a bit clever - the privileges are dropped in two stages,
>>>      > right at the start, and after the server has been configured.
>>>
>>>      I checked just now to see what that code actually ends up doing, and on
>>>      my system I ended up with:
>>>
>>>          $ grep -h ^Cap /proc/$(pidof named)/**/status | sort | uniq -c
>>>                6 CapAmb:     0000000000000000
>>>                6 CapBnd:     0000003fffffffff
>>>                6 CapEff:     0000000001000400
>>>                6 CapInh:     0000000000000000
>>>                6 CapPrm:     0000000001000400
>>>          $
>>>
>>>      That decodes to:
>>>
>>>       - The effective and permitted capabilities sets were reduced to
>>>         CAP_NET_BIND_SERVICE and CAP_SYS_RESOURCE.
>>>
>>>       - The ambient and inheritable capabilities sets were cleared.
>>>
>>>       - The capability bounding set was left completely open-ended.
>>>
>>>      It's not clear why CAP_SYS_RESOURCE needs to be retained past startup:
>>>
>>>              /*
>>>               * XXX  We might want to add CAP_SYS_RESOURCE, though it's not
>>>               *      clear it would work right given the way linuxthreads
>>>      work.
>>>               * XXXDCL But since we need to be able to set the maximum number
>>>               * of files, the stack size, data size, and core dump size to
>>>               * support named.conf options, this is now being added to test.
>>>               */
>>>              SET_CAP(CAP_SYS_RESOURCE);
>>>
>>>      See commits 5e4b7294d88ab58371d8c98e05ea80086dcb67cd,
>>>      108490a7f8529aff50a0ac7897580b59a73d9845. "[T]o test"?
>>>
>>>      CAP_SYS_RESOURCE is documented as permitting:
>>>
>>>         CAP_SYS_RESOURCE
>>>                * Use reserved space on ext2 filesystems;
>>>                * make ioctl(2) calls controlling ext3 journaling;
>>>                * override disk quota limits;
>>>                * increase resource limits (see setrlimit(2));
>>>                * override RLIMIT_NPROC resource limit;
>>>                * override maximum number of consoles on console allocation;
>>>                * override maximum number of keymaps;
>>>                * allow more than 64hz interrupts from the real-time clock;
>>>                * raise msg_qbytes limit for a System V message queue
>>>      above  the
>>>                  limit in /proc/sys/kernel/msgmnb (see msgop(2) and
>>>      msgctl(2));
>>>                * allow  the  RLIMIT_NOFILE resource limit on the number
>>>      of "in-
>>>                  flight" file descriptors to  be  bypassed  when
>>>      passing  file
>>>                  descriptors  to  another process via a UNIX domain
>>>      socket (see
>>>                  unix(7));
>>>                * override the /proc/sys/fs/pipe-size-max limit when
>>>      setting the
>>>                  capacity of a pipe using the F_SETPIPE_SZ fcntl(2) command.
>>>                * use  F_SETPIPE_SZ to increase the capacity of a pipe
>>>      above the
>>>                  limit specified by /proc/sys/fs/pipe-max-size;
>>>                * override /proc/sys/fs/mqueue/queues_max  limit  when
>>>      creating
>>>                  POSIX message queues (see mq_overview(7));
>>>                * employ the prctl(2) PR_SET_MM operation;
>>>                * set  /proc/[pid]/oom_score_adj to a value lower than the
>>>      value
>>>                  last set by a process with CAP_SYS_RESOURCE.
>>>
>>>      I would guess that retaining CAP_NET_BIND_SERVICE and CAP_SYS_RESOURCE
>>>      during the process runtime permits open-ended reloading of the config at
>>>      runtime (e.g., binding to a new IP address on port 53 without needing to
>>>      restart the daemon). So even though BIND drops some capabilities, it's
>>>      still running with elevated privileges compared to a traditional
>>>      non-root user.
>>>
>>>      systemd permits a nice pattern for network daemons that want to run as
>>>      an unprivileged user, but bind to a privileged port (and without using
>>>      socket activation), without starting the process as root. Basically, you
>>>      put something like this in the unit file:
>>>
>>>          [Service]
>>>          User=…
>>>          Group=…
>>>          CapabilityBoundingSet=CAP_NET_BIND_SERVICE CAP_SYS_CHROOT
>>>      CAP_SETPCAP
>>>          AmbientCapabilities=CAP_NET_BIND_SERVICE CAP_SYS_CHROOT CAP_SETPCAP
>>>          …
>>>
>>>      Any needed filesystem directories and permissions need to be set up
>>>      correctly before hand. The service is started by the init system as the
>>>      unprivileged User/Group specified in the unit file, so there's no need
>>>      to change UID/GID. CAP_NET_BIND_SERVICE is then used to bind to a
>>>      privileged port, CAP_SYS_CHROOT is used to perform the chroot, and
>>>      CAP_SETPCAP is used to drop all remaining capabilities from the
>>>      capability sets and the capability bounding set, so you end up with a
>>>      completely unprivileged process at runtime. (Alternatively you could
>>>      keep CAP_NET_BIND_SERVICE and drop CAP_SYS_CHROOT and CAP_SETPCAP, if
>>>      you wanted to retain the capability to perform privileged binds at
>>>      runtime. Or you could eliminate CAP_SYS_CHROOT and use other systemd
>>>      functionality to make parts of the filesystem inaccessible, etc.) This
>>>      pattern might be a bit hard to retrofit into BIND at this point, though,
>>>      other than by adding more knobs.