Secondary server in failover fails to come out of recover state

Wed May 15 22:31:21 UTC 2013

On 15 May 2013 22:53, Paul B. Henson <henson at acm.org> wrote:
> Just as a reference point, we store our dhcp configuration in subversion,
> and have a job that extracts it whenever it changes and deploys it on the
> underlying servers. They both restart at about the exact same time, and
> we've never had an issue.
>
> Here's what it looks like:
>
> May 15 14:40:05 mercury dhcpd: failover peer cpp: I move from normal to
> startup
> May 15 14:40:06 mercury dhcpd: failover peer cpp: peer moves from normal to
> communications-interrupted
> May 15 14:40:06 mercury dhcpd: failover peer cpp: I move from startup to
> normal
>
>
> May 15 14:40:06 gemini dhcpd: failover peer cpp: I move from
> communications-interrupted to startup
> May 15 14:40:06 gemini dhcpd: failover peer cpp: I move from startup to
> communications-interrupted
> May 15 14:40:06 gemini dhcpd: failover peer cpp: peer moves from normal to
> normal
> May 15 14:40:06 gemini dhcpd: failover peer cpp: I move from
> communications-interrupted to normal

Yep, that's what I would expect to see, that communications are
interrupted between the peers, but the omshell script is flagging the
peers that the partner is down when it's not which is causing the
recovery problem, my guess is one of the peers still thinks they other
is down, or they both think each other is down.