Secondary server in failover fails to come out of recover state

Oscar Ricardo Silva osilva at scuff.cc.utexas.edu
Wed May 15 22:46:11 UTC 2013


On 05/15/2013 05:31 PM, Steven Carr wrote:
> On 15 May 2013 22:53, Paul B. Henson <henson at acm.org> wrote:
>> Just as a reference point, we store our dhcp configuration in subversion,
>> and have a job that extracts it whenever it changes and deploys it on the
>> underlying servers. They both restart at about the exact same time, and
>> we've never had an issue.
>>
>> Here's what it looks like:
>>
>> May 15 14:40:05 mercury dhcpd: failover peer cpp: I move from normal to
>> startup
>> May 15 14:40:06 mercury dhcpd: failover peer cpp: peer moves from normal to
>> communications-interrupted
>> May 15 14:40:06 mercury dhcpd: failover peer cpp: I move from startup to
>> normal
>>
>>
>> May 15 14:40:06 gemini dhcpd: failover peer cpp: I move from
>> communications-interrupted to startup
>> May 15 14:40:06 gemini dhcpd: failover peer cpp: I move from startup to
>> communications-interrupted
>> May 15 14:40:06 gemini dhcpd: failover peer cpp: peer moves from normal to
>> normal
>> May 15 14:40:06 gemini dhcpd: failover peer cpp: I move from
>> communications-interrupted to normal
>
> Yep, that's what I would expect to see, that communications are
> interrupted between the peers, but the omshell script is flagging the
> peers that the partner is down when it's not which is causing the
> recovery problem, my guess is one of the peers still thinks they other
> is down, or they both think each other is down.


One of the things we see when this happens is that the primary gets 
"stuck" as far as partner-state and local-state are concerned.  When the 
secondary is restarted, the primary says its local state is "partner 
down" while the partner state is "recover". It stays this way even when 
dhcpd is killed on the secondary.

I'll test out using a variation of the init script which kills the 
process.  I would still like for them to return to a normal state before 
restarting the secondary.


Oscar



More information about the dhcp-users mailing list