Restarting DHCP safely whilst avoiding partner-down state

Fri May 13 14:37:45 UTC 2016

I just tested this and it seemed to work for me.

#dhcpd4.service
[Unit]
Description=IPv4 DHCP server
After=network.target

[Service]
Type=forking
PIDFile=/run/dhcpd4.pid
ExecStart=/usr/bin/dhcpd -4 -q -cf /etc/dhcpd.conf -pf /run/dhcpd4.pid
ExecStop=/path/to/shutdown/script.sh

[Install]
WantedBy=multi-user.target

#/path/to/shutdown/script.sh
#copy-pasted from https://kb.isc.org/article/AA-00475/0/Sending-a-Server-Shutdown-Message-Via-OMAPI.html
#
#!/bin/sh

#  uses omshell to connect to a dhcp server on the
#  local machine, create a control object, set the
#  state of the control object, and update the
#  running server to cause that server to shut down
#  gracefully.
#
#  per dhcpd man page, server shutdown can take
#  several seconds as the server waits for close
#  on all OMAPI connections.  Watching log files
#  for shutdown messages is recommended.

omshell << END_OF_INPUT > /dev/null 2> /dev/null
server localhost
port 7911
key omapi_key Ofakekeyfakekeyfakekey==
connect
new control
open
set state=2
update
END_OF_INPUT

echo "done sending shutdown instruction to dhcp server.."

Matt Pallissard

On 05/13/2016 09:33 AM, Terry Burton wrote:
> On 13 May 2016 at 15:10, Steve van der Burg <steve.vanderburg at lhsc.on.ca> wrote:
>> Here we push out new configs to a partner pair from a central server.  The config for one of the partners contains an extra file (dhcpd.i.am.secondary).  Each of the partners runs this every minute (perl script):
>>
>>   if ( -e "$spath/dhcpd.i.am.secondary" ) {
>>      exit if (localtime)[1] % 2 == 0;
>>   }
>>   else {
>>      exit if (localtime)[1] % 2 == 1;
>>   }
>>
>>   ... continue (test new config, kill running server, start new one, etc)
>>
>> So the config change, stop, start, etc, can only happen on odd minutes for one server and even minutes for the other.  As long as startup time is less than a minute (and it's much, much less than that) it all works smoothly.
>
> Thanks Steve. We've also been pushing configs around then
> synchronously restarting servers back-to-back (without sleeping) for
> several years without incident.
>
> It makes me a little suspicious about whether just killing the process
> is indeed unsafe... But then maybe we've been lucky.
>
> As mentioned I want to improve on what distributions are currently
> doing so I'm deliberately setting the bar high and it would be great
> if ISC could provide a single, approved, safe shutdown/restart
> mechanism or describe what is required to develop such a mechanism.
> Unfortunately the detail of Bug #36066 (retracting support for gentle
> shutdown) isn't available as it would be interesting to see what
> issues were encountered with the previous approach.
>
>
>> Chuck Anderson <cra at WPI.EDU> wrote:
>>> FWIW, we've been using the "kill" method for over a decade without any
>>> noticable side-effects (the default init.d scripts from RHEL 6
>>> (actually Scientific Linux 6) dhcp package).  We've never had to
>>> manually clean up a corrupted lease file.  We restart the services
>>> automatically on a 20 minute cycle, as needed.  We do one, then
>>> immediately do the other.  We do not wait to restart the other, and we
>>> do not monitor to see if failover has reconnected and rebalanced
>>> before restarting the other, but since we are SSH-ing into each server
>>> to do the restart, there might be enough of a built-in delay between
>>> restarting each server.
>>>
>>> I don't know if a corrupted lease file would cause a failure to start
>>> the dhcp server, or if it would just go unnoticed, perhaps with a log
>>> message.  But like I said, we've never had a failure to start the
>>> server that was caused by a lease file issue.
>>>
>>> Our script does test the config file before doing the restart:
>>>
>>> #!/bin/bash
>>> echo -n "Testing DHCP configuration: "
>>> if sudo /etc/rc.d/init.d/dhcpd configtest; then
>>>         echo "Restarting DHCP"
>>>         sudo /etc/rc.d/init.d/dhcpd restart
>>> else
>>>         echo "FAIL: Not restarting DHCP"
>>> fi
>>>
>>> which in CentOS 6 does the following:
>>>
>>> exec=/usr/sbin/dhcpd
>>> configtest() {
>>>     [ -x $exec ] || return 5
>>>     [ -f $config ] || return 6
>>>     $exec -q -t -cf $config
>>>     RETVAL=$?
>>>     if [ $RETVAL -eq 1 ]; then
>>>         $exec -t -cf $config
>>>     else
>>>         echo "Syntax: OK" >&2
>>>     fi
>>>     return $RETVAL
>>> }
>>>
>>>
>>> On Fri, May 13, 2016 at 02:00:03PM +0100, Terry Burton wrote:
>>>> Hi,
>>>>
>>>> I'm attempting to write a systemd .service file for my own uses of ISC
>>>> DHCP. However, if it can be made sufficiently generic then I would
>>>> intend to push this upstream or at least into distributions.
>>>>
>>>> It needs to be suitable for managing failover pairs and I'm struggling
>>>> with the age-old problem of restarting a dhcpd instance. From reading
>>>> around there does not currently appear to be a method for restarting
>>>> dhcpd that is both *safe* and *useful* in such a setup.
>>>>
>>>>
>>>> Restarting with signals:
>>>>
>>>> >From AA-01043 (Last Updated: 2015-03-18): "kill is the recommended
>>>> option, except where there is a high turnover of leases and the
>>>> production environment requires a high degree of reliability from
>>>> DHCP. In that case, we'd suggest that administrators consider using
>>>> OMAPI to control the daemon instead and to request a graceful
>>>> shutdown. The reason for this is that there is the slight possibility
>>>> that by using kill, administrators may stop dhcpd in the middle of
>>>> appending a lease to the leases file (in which case it may become
>>>> corrupted). This risk, while tiny, may be significant enough for some
>>>> administrators to prefer to use OMAPI instead."
>>>>
>>>> In other words this is recommending that casual users take the risk
>>>> that their service might not recover after restarting. This may be
>>>> unlikely but it's still dangerous advice! The documentation does
>>>> indicates that a feature for "gentle shutdown" in response to a signal
>>>> was added in the 4.2 time frame and then subsequently removed:
>>>>
>>>> "Added support for gentle shutdown after signal is received. [ISC-Bugs
>>>> #32692] [ISC-Bugs 34945]"
>>>> "Disable the gentle shutdown functionality until we can determine the
>>>> best way to present it to remove or reduce the side effects. [ISC-Bugs
>>>> #36066]"
>>>>
>>>> Is it still the case that kill isn't suitable for production purposes?
>>>>
>>>>
>>>> With OMAPI:
>>>>
>>>> You can cleanly shutdown via OMAPI "set state=2, etc." however the
>>>> effect on the failover protocol is less-ideal than with signals.
>>>>
>>>> OMAPI shutdown will place the partner into "partner-down" state making
>>>> it become active for all leases in the failover pools which isn't
>>>> ideal when brief restarting an instance. Contrast this with the effect
>>>> of restarting an instance with kill which is to briefly place the
>>>> partner into "communications-interrupted" state from which it
>>>> immediate revert to "normal" once the restarted instance is available
>>>> (with auto-partner-down taking care for things if the instance does
>>>> not recover.)
>>>>
>>>>
>>>> Is there a safe way to restart DHCP that has minimal impact on the
>>>> failover protocol?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Terry
> _______________________________________________
> dhcp-users mailing list
> dhcp-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/dhcp-users
>