Frustrated DHCP failover not working.. :(

Wed Feb 10 20:51:59 UTC 2016

> On Feb 10, 2016, at 12:35 PM, Gregory Sloop <gregs at sloop.net> wrote:
> 
> 
> 
> SR> On the original question.  Setting your MCLT to 30 minutes while
> SR> your default lease time is 20 minutes
> SR> seems a bit strange to me.
> 
> I certainly don't want to derail the search for a solution to the original poster, but MCLT and it's proper setting isn't much discussed, and when I brought it up some time back, Cathy Almond pointed at several FAQ's that were not very useful. There was/is very little discussion about what a "reasonable" forumla might be to determine what an optimal setting would be. [Cathy pointed to FAQ's that _strongly_ suggested leaving it the default 30m. Which may be where/why the OP left it at 30m. I believe this is quite wrong, but I'm certainly no guru on dhcpd - so I hesitate to make authoritative pronouncements about the subject. :) ]
> 
> In short, either here or in a new thread, I think a fuller discussion about MCLT and fail-over operation. [Especially when a partner is down - either in "communications-interrupted" mode, or in "partner-down mode."] I'd even perhaps be willing to spend some time writing up a more informative FAQ on the issue - provided I'm able to get a good understanding of how it works. If I've missed some docs somewhere, I'd be more than glad to be pointed at them.

It’s difficult to try and cover all the different possibilities as people use the servers in different ways.
I don’t think we have a great description on it.

> 
> ---
> More to the point of this thread.
> 1) I don't recall my peer servers waiting the MCLT time to go from "waiting" to "normal" - ever. [I could certainly be wrong about this - but I don't think waiting the MCLT time would have made the servers go from waiting to normal. I'd guess there's some kind of communication problem between the two. i.e. Intermittant, too little bandwidth, etc. ]

There are some wait states that last for MCLT.

> 
> 2) In a fail-over situation, I'm not sure it's clear to Rob that both machines will be splitting the pool and both will be responding to lease requests. 
> 
> i.e. Fail-over isn't really a "fail-over" server. Which machine is primary and which is secondary is, IMO, immaterial. [Other than one will have the "master" config file, while the second will have the "peer.”]

There is at least one major difference between primary and secondary.  In the transition
from active to free or backup.  In normal operation only the primary can transition the lease from active
to free (available on the primary) to backup (available on the secondary).  The rules change for partner-down
and we added an optimization for use in communications-interrupted.

> Personally, I think it ought to be called load-balancing with fail-over features. [Because in normal operation, it's load balancing between the two servers, and when one is down, it has features to continue operating on a single server with minimal disruption (or if you're somewhat lucky, none). 

One can approximate failover by setting the split level to 0 or 256 in which case one or the other server
will serve everything and the other will serve nothing until the load balance max seconds value is
exceeded.

> 
> My apologies if I'm wrong and #2 is clear to Rob.
> 
> -Greg
> 
> _______________________________________________
> dhcp-users mailing list
> dhcp-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/dhcp-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/dhcp-users/attachments/20160210/c861eb6d/attachment.html>