Restarting DHCP safely whilst avoiding partner-down state

Terry Burton tez at terryburton.co.uk
Sun May 15 02:18:43 UTC 2016


On 13 May 2016 at 20:30, Terry Burton <tez at terryburton.co.uk> wrote:
> On 13 May 2016 at 20:06, Terry Burton <tez at terryburton.co.uk> wrote:
>> On 13 May 2016 at 19:25, dave c <dhcp at gvtc.drakkar.org> wrote:
>>> Are folks forgetting that the default action of the kill command is to send
>>> the TERM signal? That signal should tell the daemon to do an orderly
>>> shutdown, close the leases file cleanly, send whatever signals to the
>>> partner that are required and then exit when everything is ready.
>>>
>>> All the concern I am seeing below would be true if folks were issuing a kill
>>> -9 to stop the service. At which point the leases file would get potentially
>>> corrupted.
>> <...snip...>
>>> So it sounds like a lot of angst over nothing... a TERM signal is defined as
>>> closing all processes and threads cleanly, writing out the last bits of data
>>> and stopping things in an orderly fashion. So seems that issuing kill {dhcpd
>>> pid} would be perfectly acceptable to close things down even in a partner
>>> scenario.
>>
>> Where do you get the definition of a SIGTERM causing a graceful
>> shutdown (other than by convention) and if this were the case for ISC
>> DHCP then why the warning about truncated leases given in AA-01043?
>>
>> The effect of receiving a handleable signal is to immediately jump
>> into the trap handler if one is configured for that signal, otherwise
>> to die.
>>
>> Unless a handler takes care to ensure that everything is consistent
>> and then exit then SIGTERM, SIGINT, etc. are potentially dangerous.
>>
>> The release notes indicate that a "gentle shutdown" feature was added
>> in the past and then subsequently removed because the semantics chosen
>> caused operational issues - but what these were isn't known because
>> the associated bug report isn't publicly available.
>>
>> I need to find time to understand the current codebase, but what I'd
>> like to know the intended semantics and what issues are encountered
>> with implementing these in the way that Simon Hobson suggests.
>
> So currently there are no trap handlers for SIGTERM or SIGINT and
> therefore no cleanup whatsoever at exit.
>
> There is a compiled-out option ENABLE_GENTLE_SHUTDOWN which installs
> handlers for these signals but when this was activated it implemented
> the harmful semantics of putting the server through a
> recovery+partner-down transition which isn't useful for a quick
> configuration reload:
>
> /* Enable the gentle shutdown signal handling.  Currently this
>    means that on SIGINT or SIGTERM a client will release its
>    address and a server in a failover pair will go through
>    partner down.  Both of which can be undesireable in some
>    situations.  We plan to revisit this feature and may
>    make non-backwards compatible changes including the
>    removal of this define.  Use at your own risk.  */
> /* #define ENABLE_GENTLE_SHUTDOWN */
>
> #if defined(ENABLE_GENTLE_SHUTDOWN)
>         /* no signal handlers until we deal with the side effects */
>         /* install signal handlers */
>         signal(SIGINT, dhcp_signal_handler);   /* control-c */
>         signal(SIGTERM, dhcp_signal_handler);  /* kill */
> #endif
>
> Having a more basic signal handler that defers the exit in order to
> continue to write out an outstanding lease seems better. Perhaps once
> could even differentiate these exit semantics based on SIGINT vs
> SIGTERM.
>
> If someone who can speak for ISC is able to indicate whether this
> would be a sensible approach then I am happy to work up a patch.

As a rough proof of concept this simple amendment appears to provide
rapid shutdown without partner down interference for a basic reload.

The part of dhcpd.c that drives the current -> shutdown -> recover
state transition can be guarded by a variable that determines whether
or not to place the server into long-term shutdown (possibly based on
the received signal or a runtime configurable.)

...
#define RECOVER_ON_STARTUP 1!=1   /* To be a variable derived at runtime */
...
#if defined (FAILOVER_PROTOCOL)
        if (RECOVER_ON_STARTUP) {
            /* Set all failover peers into the shutdown state. */
            if (shutdown_state == shutdown_dhcp) {
                for (state = failover_states; state; state = state -> next) {
                    if (state -> me.state == normal) {
                       dhcp_failover_set_state (state, shut_down);
                       failover_connection_count++;
                    }
                    if (state -> me.state == shut_down &&
                        state -> partner.state != partner_down)
                            failover_connection_count++;
                }
            }
            if (shutdown_state == shutdown_done) {
                for (state = failover_states; state; state = state -> next) {
                    if (state -> me.state == shut_down) {
                        if (state -> link_to_peer)
                            dhcp_failover_link_dereference (&state ->
link_to_peer,
                                                            MDL);
                        dhcp_failover_set_state (state, recover);
                    }
                }
            }
        }
#endif
        if (shutdown_state == shutdown_done) {
#if defined (DEBUG_MEMORY_LEAKAGE) && \
                defined (DEBUG_MEMORY_LEAKAGE_ON_EXIT)
                free_everything ();
                omapi_print_dmalloc_usage_by_caller ();
#endif
                if (no_pid_file == ISC_FALSE)
                        (void) unlink(path_dhcpd_pid);
                exit (0);
        }
...


More information about the dhcp-users mailing list