eth0: not responding (recovering)

Thu Oct 10 18:21:08 UTC 2019

I wouldn't change the state manually.

For me, after changing the primary configuration to failover, I initally start it with "mclt 60;" to speed
recovery:
mclt                3600;    # not for secondary
#mclt                60;    # use this when deploying a replacement server

If the primary is working, then start the secondary.  When they're both "normal", change
the configuration for the primary to the desired mclt time and restart the primary, then the
secondary.

Bill

On 10/10/2019 1:23 PM, Surya Teja wrote:
> Hi Bill
> I non-failover DHCP server doesn't have any "failover peer" stanza in the /var/log/dhcpd/dhcpd.leases file ---->
> Sorry for typo in my previous email
> The environment became most problematic none of the servers are granting leases to devices On the primary lease file it says
> my state partner-down
> peer state recovery
> ----> Its not peer it is  partner state
>
>  A working failover lease file has at the top ----->
> I can see at multiple places in the dhcpd.lease file specifying about these states appended with time stamp saying like
> failover peer "dhcp-peer-workspace1" state {
>   my state recover at *3 2019/10/09 14:23:41;*
>   partner state unknown-state *at 3 2019/10/09 14:23:41;*
> }
>
> failover peer "dhcp-peer- workspace1 " state {
>   my state recover at *3 2019/10/09 14:23:41;*
>   partner state unknown-state at *3 2019/10/09 14:23:41*;
> }
> server-duid "\000\001\000\001%0\251\355\000PV\207D\342";
> failover peer "dhcp-peer- workspace1 " state {
>   my state recover at 3 2019/10/09 14:23:41;
>   partner state unknown-state at 3 2019/10/09 14:23:41;
> }
>  Try shutting down both the primary and secondary servers, remove the "failover peer" stanza---->
> Yes I tried this,I shut down the failover and on primary,  I removed the failover config part totally from the config file
> Stopped the dhcpd and deleted the lease file and again touch the lease file then restarted the DHCP it worked as expected,
>
> The moment I bring the failover appliances up and add the failover section  to the primary config file and restart the dhcpd  
> on the primary  the issue starts.
> First the failover logs says it is in recovery mode ok, So i thought as it has to sync the primary it is in recovery mode. 
> After some span of
> time on the primary appliance lease file I see
> my state partner-down
> partner state recovery
> Thus comes the issues, and these are running in for ever condition the status are not getting updated in any of the appliance 
> lease file
>  Is it ok if I edit the lease file manually and make it normal ?
>
>
> On Thu, Oct 10, 2019 at 9:35 PM Bill Shirley <bill at c3po.polymerindustries.biz <mailto:bill at c3po.polymerindustries.biz>> wrote:
>
>     I non-failover DHCP server doesn't have any "failover peer" stanza in the /var/log/dhcpd/dhcpd.leases
>     file.  A working failover lease file has at the top:
>     failover peer "dhcp-failover" state {
>       my state normal at 4 2019/10/10 05:05:04;
>       partner state normal at 5 2011/09/02 23:51:25;
>     }
>
>     Try shutting down both the primary and secondary servers, remove the "failover peer" stanza from
>     both of their lease files, and then bring up the primary *with the failover configuration*.  Ensure it is
>     handing out leases, then bring up the secondary *with the failover configuration*.  Then check that
>     all is working correctly.
>
>     Bill
>
>     On 10/9/2019 1:32 PM, Surya Teja wrote:
>>     I am facing weird situation with fail over setup on my lab environment. I am facing issue when the failover dhcp
>>     appliance is added to my existing server.
>>     For the first time when i add failover to primary appliance,  On primary appliance lease file i see the partner state as
>>     unknown and in the failover the messages are printing not responding recovery, so i shutdown the failover appliance and
>>     removed the failover config section from primary and restarted primary then it was working fine.
>>     As a trial of second attempt i increased mclt value to 3600 and added the failover section back to primary config and
>>     bring up the failover server now.
>>     The environment became most problematic none of the servers are granting leases to devices On the primary lease file it says
>>     my state partner-down
>>     peer state recovery
>>
>>     Why do we get these recovery,partner down, unknown status when i add the failover to my environment?
>>     Or do we have any best practice steps how to add failover to existing server without causing any outages?
>>
>>     Any help would be appreciated
>>     Thanks
>>
>>
>>
>>     On Sun, 6 Oct 2019, 21:04 Surya Teja, <suryateja042 at gmail.com <mailto:suryateja042 at gmail.com>> wrote:
>>
>>         Hi Bill Thanks for your reply,
>>         Yes I see traffic on the peer ports which i mentioned in the fail over section of my configuration file.
>>         My mclt value is 1800(30 min).
>>         I am seeing these issues on the failover server and some times I see the logs saying peer hold all free leases, but
>>         that scope is not completely full with active entries in the dhcpd.lease file of that specified server
>>
>>         And one more strange thing I observerd in the lease file. In the file I have statements like my status and peer
>>         status. In that peer status is saying *unknown*
>>         When will this happen? In general scenario it should be normal that is what i got from internet, but the state is not
>>         getting updated in the lease file.
>>
>>         On Sat, 5 Oct 2019, 21:16 Bill Shirley, <bill at c3po.polymerindustries.biz <mailto:bill at c3po.polymerindustries.biz>> wrote:
>>
>>             Assuming you're referring to DHCP failover, is there any traffic flow on the
>>             port and peer port in the failover stanza?
>>
>>             What is your value for mclt?
>>
>>             Which server, primary or secondary, is giving the recovering message?
>>
>>             Bill
>>
>>             On 10/5/2019 9:33 AM, Surya Teja wrote:
>>>             Hi I have an issue in the lease flow with isc dhcp service. In the logs it is printing *eth0: not responding
>>>             (recovering) *
>>>             My local is set up with active-active mode(splt value as 50-50%) and because of some reason one of the
>>>             appliance  went down for some duration. I observed this and i bring it up, and duration of down is nearly 15hr.
>>>             After i bring it up. I am seeing the logs saying not responding (recovering). Its been more than two hours still
>>>             I am getting the same logs
>>>             Does any one have any idea about this scenario and how to get the environment stable
>>>
>>>             _______________________________________________
>>>             dhcp-users mailing list
>>>             dhcp-users at lists.isc.org  <mailto:dhcp-users at lists.isc.org>
>>>             https://lists.isc.org/mailman/listinfo/dhcp-users
>>             _______________________________________________
>>             dhcp-users mailing list
>>             dhcp-users at lists.isc.org <mailto:dhcp-users at lists.isc.org>
>>             https://lists.isc.org/mailman/listinfo/dhcp-users
>>
>>
>>     _______________________________________________
>>     dhcp-users mailing list
>>     dhcp-users at lists.isc.org  <mailto:dhcp-users at lists.isc.org>
>>     https://lists.isc.org/mailman/listinfo/dhcp-users
>     _______________________________________________
>     dhcp-users mailing list
>     dhcp-users at lists.isc.org <mailto:dhcp-users at lists.isc.org>
>     https://lists.isc.org/mailman/listinfo/dhcp-users
>
>
> _______________________________________________
> dhcp-users mailing list
> dhcp-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/dhcp-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/dhcp-users/attachments/20191010/bf73a0a7/attachment.htm>