Multi-subnet/vlan and failover

Tue May 14 07:16:45 UTC 2013

Yep that's correct. I usually go with 1 hour for MCLT, not too long,
not too short, and clients will renew an MCLT lease every 30 mins so
hopefully wont cause too much of a loading issue (for either the
network or DHCP server).

On 14 May 2013 00:25, Gregory Sloop <gregs at sloop.net> wrote:
> Top posting... [Sorry for anyone it offends. :) ]
>
> So, essentially if I understand you correctly, just trust that
> balancing the pools should work fine with the rational defaults in the
> system, and don't worry about it too much.
>
> As far as MCLT - there is this text in the manpage
> "The longer you set this, the longer it will take for the running
> server to recover IP addresses after moving into PARTNER-DOWN state"
>
> In further reading, I think this is an non-elegent way of saying that
> the number of seconds set as MCLT must pass after the master and peer
> re-establish connection, before the master starts acting as a master
> again.
>
> ---
> So, if this was a *really* long time - say 3 weeks, it would take
> three weeks for a master to come back on-line and if the peer also
> failed in that three weeks, you'd have no dhcp servers at all.
>
> So, set it long enough that the remaining server can keep up with the
> expected load during a failure, but short enough not to incur
> excessive risk for both the peer and master failing even though the
> master [or peer] is back up, but hasn't yet been up longer than the
> MCLT.
> ---
>
> Is that correct?
>
> And thanks so much for the discussion. While I don't have any more
> control than before - I at least think I understand the moving parts a
> bit better.
>
> -Greg
>
> SC> So the servers will rebalance the pools on their own, it's in the
> SC> code, it's not user configurable. MCLT has nothing whatsoever to do
> SC> with the rebalancing process
>
> SC> MCLT is the maximum client lead time that will be used by the server
> SC> in a failover situation. e.g. if the failover is enacted and the
> SC> secondary has to respond to a lease request on behalf of the primary
> SC> (which is down) then the lease time will be MCLT. Additionally the
> SC> first time any lease is issued by either server to a new client it
> SC> will be issued as MCLT, this is to allow the background updates to
> SC> take place between the failover association. When the failover is
> SC> restored the server which was down will wait MCLT before starting to
> SC> issue new leases, this is to allow the servers time to resync.
>
> SC> Beware putting the system into partner down when the partner isn't
> SC> actually down, I've seen this halt both servers from issuing leases
> SC> for MCLT, panic mode then ensues as you can't get on the network at
> SC> all.
>
> SC> There is no situation that I'm aware of that the system wouldn't
> SC> automatically rebalance (though I don't know how it would handle only
> SC> having 1 remaining lease, I would assume one system would get it and
> SC> the other would then have no free leases).
>
>
> SC> On 11 May 2013 01:05, Gregory Sloop <gregs at sloop.net> wrote:
>>> So, yes, I did have a VLAN leak. [eek - not enough sleep, too little
>>> thinking!]
>>>
>>> But that's resolved now - thanks for the tip.
>>> So, now I have failover working, as well as VLAN/Multi-segment. [Very
>>> nice.]
>>>
>>> I must say "Thanks!!" for all those who do the work on this product.
>>> It's a core piece of virtually every network and like most IT work,
>>> you never get credit when it works and does so unobtrusively without a
>>> bunch of babying etc. But you can always guarantee when it doesn't
>>> work, they haven't forgotten where to whine to either!
>>>
>>> ---
>>> But the discussion about the split values and lease-balancing is one
>>> I'd like to discuss...
>>>
>>> I'm happy to start a new thread, but since we started discussing here,
>>> I thought it might make sense to continue. Google should find it in a
>>> search in any case...
>>>
>>> ---
>>> So the relevant params for address recovery etc seem to be:
>>> mclt - which is only _somewhat_ comprehensible to me.
>>> [I see it's the maximum lease time for any lease when in partner-down
>>> state - but I don't understand what it has to do with recovery of
>>> leases in in PDS.]
>>>
>>> But if I thought that was bad, I really don't grok:
>>> max-lease-misbalance
>>> max-lease-ownership
>>> min-balance
>>> max-balance
>>>
>>> At least not really.
>>>
>>> ---
>>> Is there some layman, dumb-oaf version of what happens when one of the
>>> partner servers runs out of leases? [Like Thag just stumbled into
>>> your data center and was looking for a job configuring DHCP servers!?
>>> :) ]
>>>
>>> I've read the section several times, and really get fairly lost.
>>>
>>> Here's how I understand it.
>>> In short, as the master/peer hand out addresses, they split the
>>> addresses 50/50. [with a few exceptions]
>>> They then hand out addresses and try to balance the free address pool
>>> on master/pool so they remain equivalent to each other.
>>>
>>> When the system detects that it may run out of addresses on either the
>>> master or the pool [over X time-frame] , it tries to re-balance the
>>> free leases again to meet a 50/50 split [again with some exceptions
>>> too complicated to finish explaining in the next few hours or so.]
>>>
>>> Does this generally sound right?
>>> ---
>>>
>>> But does mclt have anything to do with lease re-balancing? [The
>>> description seems to indicate it does, but after reading it multiple
>>> times, I don't really think it does.]
>>>
>>> ---
>>> So, as a final thought. What kinds of situations would run you in risk
>>> of having a wildly mis-balanced pool and running out of addresses on a
>>> master/peer - where the system wouldn't "automagically" re-balance to
>>> save itself?
>>>
>>> What settings would help in this regard, and what values might one
>>> pick.
>>>
>>> I'd guess this discussion has occurred before, so I'm more than glad to
>>> be pointed at a thread somewhere and do the slog to read it and see if
>>> that helps.
>>>
>>> Sorry for the long post and thanks in advance for your help!
>>>
>>> -Greg
>>>
>>>
>>>
>>>
>>> SC> No, regardless of the split the leases will still be shared 50/50 with
>>> SC> both servers, so you could still run into an issue where the secondary
>>> SC> runs out of addresses. When both servers are online and one is running
>>> SC> low on leases they will rebalance the lease pool and share the
>>> SC> remaining leases 50/50. (This bit really needs to be documented better
>>> SC> as lots of people fall into that trap)
>>>
>>> SC> 255 would make the primary respond to all requests when both systems
>>> SC> are online. When the primary goes offline you will have a limited
>>> SC> amount of time before the leases will be depleted, at which point you
>>> SC> will need to tell the secondary that its partner is down and the
>>> SC> secondary will then assume control of the full lease pools.
>>>
>>> SC> My general advice to anyone using DHCP failover is if either of the
>>> SC> systems is going to be out for longer than the period of your smallest
>>> SC> lease time then set the partner to be down as once that minimum lease
>>> SC> time is up you will already have started eating into additional
>>> SC> leases.
>>>
>>>
>>>
>>> SC> On 10 May 2013 08:58, Gregory Sloop <gregs at sloop.net> wrote:
>>>>> It might be, it is a test environment - but I didn't think I had
>>>>> anything that whacked.
>>>>>
>>>>> I'll do some more testing the next chance I get. Any other ideas are
>>>>> more then welcome.
>>>>>
>>>>> ---
>>>>> As for split - I generally intend for all requests to be handled by
>>>>> the primary and only fail to the peer. [Fail-over only, no
>>>>> load-balance]
>>>>>
>>>>> I'm not sure if that's the best idea - but it seems more
>>>>> straightforward. (Essentially my worry is if the blocks are split and
>>>>> a peer goes down, could we run out of addresses in the block for the
>>>>> "up" server before reclaiming them from the "down" server. I suspect
>>>>> this worry is mostly because I don't fully grasp how it is handling
>>>>> things, despite reading the docs - but not as carefully as I probably
>>>>> need to do.)
>>>>>
>>>>> [So, I assume a split of 255 would then make it do what I want, having
>>>>> all requests served by the primary - instead of load-balance, right?]
>>>>>
>>>>>
>>>>> -Greg
>>>>>
>>>>>
>>>>> SC> Sounds like you have a leak in your network and broadcast packets are
>>>>> SC> leaking from one VLAN into another.
>>>>>
>>>>> SC> One other thing, is there a reason you are using "split 0;"? This
>>>>> SC> would mean the secondary peer will answer all lease requests. For a
>>>>> SC> balanced approach you should use 128 which will allow both DHCP
>>>>> SC> servers to respond to lease requests.
>>>>>
>>>>> SC> On 10 May 2013 08:19, Gregory Sloop <gregs at sloop.net> wrote:
>>>>>>> As a follow-up, because it may well impact the answer to my duplicate
>>>>>>> DHCPOFFER issue, let me describe how the DHCP servers are connected in
>>>>>>> relation to VLANS etc.
>>>>>>>
>>>>>>> The DHCP Servers are on VLAN1, say 10.1.1.11/10.1.1.12 [master/peer]
>>>>>>>
>>>>>>> The L3 switch is configured to forward dhcp sessions to 10.1.1.11 and
>>>>>>> 10.1.1.12
>>>>>>>
>>>>>>> ---
>>>>>>> The duplicate messages are seen on DHCP negotiations from VLAN3 [and, I assume VLAN2]
>>>>>>>
>>>>>>> But I have not tested VLAN1 or VLAN2 attached clients to see what
>>>>>>> happens on those VLANs.
>>>>>>>
>>>>>>> TIA for any assistance!
>>>>>>>
>>>>>>> -Greg
>>>>>>>
>>>>>>> GS> @Kyle
>>>>>>> GS> Yes, that's it exactly. Thanks!
>>>>>>>
>>>>>>> GS> ---
>>>>>>> GS> I did find a post about putting it in a pool block after posting
>>>>>>> GS> my query, just about the time you posted your response - but
>>>>>>> GS> hadn't had a chance to test it - so that's great. It now works.
>>>>>>>
>>>>>>> GS> BUT...
>>>>>>> GS> When I run it, I see odd stuff [running dhcpd in -d -f
>>>>>>> GS> debug/foreground mode]...
>>>>>>>
>>>>>>> GS> ---
>>>>>>> GS> I see a pair of DHCPDISCOVERs
>>>>>>>
>>>>>>> GS> One from ETH0 and the other from the IP/DHCP helper on the L3 switch.
>>>>>>> GS> i.e.
>>>>>>> GS> DHCPDISCOVER from so:me:ma:ca:dd:rs on eth0
>>>>>>> GS> DHCPDISCOVER from so:me:ma:ca:dd:rs on 10.1.2.1
>>>>>>> GS> [This second one is the layer 3 switch, which is forwarding the DHCP session to the DHCP server]
>>>>>>>
>>>>>>> GS> Then dhcpd makes two offers - one on 10.1.1.X and one on 10.1.2.X
>>>>>>> GS> Since the station isn't on the 10.1.1.X VLAN and *is* on the 10.1.2.X
>>>>>>> GS> VLAN it "accepts" the 10.1.2.X address and it "works."
>>>>>>>
>>>>>>> GS> But I'm sure it's not supposed to be this way.
>>>>>>> GS> [And I'm pretty sure I'm doing something obvious and perhaps
>>>>>>> GS> stupid, but I just don't know where to look or what to try.]
>>>>>>>
>>>>>>> GS> How do I go about making it only see the forwarded DHCP session
>>>>>>> GS> and not the one on eth0 [or some other option I'm simply not aware of...]
>>>>>>>
>>>>>>> GS> ---
>>>>>>>
>>>>>>> GS> -Greg
>>>>>>>
>>>>>>>
>>>>>>> GS> Are you looking for something like this?
>>>>>>>
>>>>>>> GS> subnet 172.21.27.0 netmask 255.255.255.0 {
>>>>>>> GS>   option subnet-mask 255.255.255.0;
>>>>>>> GS>   option broadcast-address 172.21.27.255;
>>>>>>> GS>   option routers 172.21.27.1;
>>>>>>> GS>   ddns-domainname "example.com.";
>>>>>>> GS>   option domain-search "example.com";
>>>>>>> GS>   pool {
>>>>>>> GS>     failover peer "dhcp-failover";
>>>>>>> GS>     range 172.21.27.5 172.21.27.254;
>>>>>>> GS>   }
>>>>>>> GS> }
>>>>>>>
>>>>>>>
>>>>>>> GS> On Thu, May 9, 2013 at 8:08 PM, Gregory Sloop <gregs at sloop.net> wrote:
>>>>>>> GS> So, I've done a fair bit of reading and searching - and this general
>>>>>>> GS> template is what I thought would work, but it doesn't.
>>>>>>>
>>>>>>> GS> Let me post the dhcp.conf file and then discuss what's wrong and ask
>>>>>>> GS> for pointers.
>>>>>>>
>>>>>>> GS> ---
>>>>>>> GS> authoritative;
>>>>>>> GS> #ddns-update-style interim;
>>>>>>> GS> ignore client-updates;
>>>>>>> GS> #option host-name = config-option server.ddns-hostname;
>>>>>>>
>>>>>>> GS> #include "/etc/rndc.key";
>>>>>>>
>>>>>>> GS> option domain-name              "somedom.local";
>>>>>>> GS> option domain-name-servers      10.1.1.190,10.1.2.1,10.1.1.17;
>>>>>>> GS> option time-offset              -18000; # Pacific Standard Time
>>>>>>> GS> option ntp-servers              10.1.1.14
>>>>>>> GS> one-lease-per-client off;
>>>>>>>
>>>>>>> GS> #4 hour lease
>>>>>>> GS> default-lease-time 14400;
>>>>>>> GS> max-lease-time 14400;
>>>>>>> GS> option ip-forwarding off;
>>>>>>>
>>>>>>> GS> failover peer "dhcp-failover" {
>>>>>>> GS>   primary; # declare this to be the primary server
>>>>>>> GS>   # Address if THIS dhcp server, or what address to listen ON
>>>>>>> GS>   address 10.1.1.1;
>>>>>>> GS>   port 647;
>>>>>>> GS>   # Address of the DHCP fail-over peer.
>>>>>>> GS>   peer address 10.1.1.2;
>>>>>>> GS>   peer port 647;
>>>>>>> GS>   max-response-delay 60;
>>>>>>> GS>   max-unacked-updates 10;
>>>>>>> GS>   #load balance max seconds 3;
>>>>>>> GS>   mclt 3600;
>>>>>>> GS>   split 0;
>>>>>>> GS> }
>>>>>>>
>>>>>>> GS>     subnet 10.1.1.0 netmask 255.255.255.0 {
>>>>>>> GS>         range 10.1.1.1 10.1.1.254;
>>>>>>> GS>         option routers                  10.1.1.1;
>>>>>>> GS>         option subnet-mask              255.255.255.0;
>>>>>>> GS>         failover peer "dhcp-failover";
>>>>>>> GS>     }
>>>>>>>
>>>>>>> GS>     subnet 10.1.2.0 netmask 255.255.255.0 {
>>>>>>> GS>         range 10.1.2.1 10.1.2.254;
>>>>>>> GS>         option routers                  10.1.2.1;
>>>>>>> GS>         option subnet-mask              255.255.255.0;
>>>>>>> GS>         failover peer "dhcp-failover";
>>>>>>> GS>     }
>>>>>>>
>>>>>>> GS>     subnet 10.1.3.0 netmask 255.255.255.0 {
>>>>>>> GS>         range 10.1.3.1 10.1.3.254;
>>>>>>> GS>         option routers                  10.1.3.1;
>>>>>>> GS>         option subnet-mask              255.255.255.0;
>>>>>>> GS>         failover peer "dhcp-failover";
>>>>>>> GS>     }
>>>>>>>
>>>>>>> GS> ---
>>>>>>> GS> Now, I've disabled DDNS updates for simplicity sake. Once I get the
>>>>>>> GS> multi-subnet/VLAN setup and failover working I'll add that back.
>>>>>>>
>>>>>>> GS> Perhaps that impacts things somehow, so if you'll keep that in mind,
>>>>>>> GS> I'd appreciate it.
>>>>>>>
>>>>>>> GS> So, when I try this config I get an error saying that a failover needs
>>>>>>> GS> to be inside a shared network block.
>>>>>>>
>>>>>>> GS> But if I do that, I've been told [read] that the DHCP server won't
>>>>>>> GS> know how to assign the different subnets. [This would apply to a
>>>>>>> GS> network where I wanted to share all the 10.1.1.1-10.1.3.254 as a
>>>>>>> GS> single pool/block and assign any station any IP in the whole block.]
>>>>>>>
>>>>>>> GS> But I have a L3 switch and I want these assigned to each VLAN.
>>>>>>>
>>>>>>> GS> ---
>>>>>>> GS> So, I setup the conf file without a shared-network and it works fine
>>>>>>> GS> with the L3 DHCP helper/proxy. Clients on VLAN1 get 10.1.1.0 blocks
>>>>>>> GS> and VLAN2 get 10.1.2.0 blocks etc.
>>>>>>>
>>>>>>> GS> So, with the "failover" block commented out, it works charmingly! Very
>>>>>>> GS> cool!
>>>>>>>
>>>>>>> GS> ---
>>>>>>> GS> But I *also* want to use failover.
>>>>>>>
>>>>>>> GS> And when I put in a fail-over outside a shared-network, it complains
>>>>>>> GS> that it must be inside a shared network.
>>>>>>>
>>>>>>> GS> So, how to I use fail-over AND maintain the subnet grouping above?
>>>>>>>
>>>>>>> GS> ---
>>>>>>> GS> I'll keep reading, but I've tinkered with this quite a bit and for the
>>>>>>> GS> life of me, I can't see how one would go about it.
>>>>>>>
>>>>>>> GS> -Greg
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Gregory Sloop, Principal: Sloop Network & Computer Consulting
>>>>>>> Voice: 503.251.0452 x82
>>>>>>> EMail: gregs at sloop.net
>>>>>>> http://www.sloop.net
>>>>>>> ---
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dhcp-users mailing list
>>>>>>> dhcp-users at lists.isc.org
>>>>>>> https://lists.isc.org/mailman/listinfo/dhcp-users
>>>>> SC> _______________________________________________
>>>>> SC> dhcp-users mailing list
>>>>> SC> dhcp-users at lists.isc.org
>>>>> SC> https://lists.isc.org/mailman/listinfo/dhcp-users
>>>>>
>>>>> --
>>>>> Gregory Sloop, Principal: Sloop Network & Computer Consulting
>>>>> Voice: 503.251.0452 x82
>>>>> EMail: gregs at sloop.net
>>>>> http://www.sloop.net
>>>>> ---
>>>>>
>>>>> _______________________________________________
>>>>> dhcp-users mailing list
>>>>> dhcp-users at lists.isc.org
>>>>> https://lists.isc.org/mailman/listinfo/dhcp-users
>>> SC> _______________________________________________
>>> SC> dhcp-users mailing list
>>> SC> dhcp-users at lists.isc.org
>>> SC> https://lists.isc.org/mailman/listinfo/dhcp-users
>>>
>>> --
>>> Gregory Sloop, Principal: Sloop Network & Computer Consulting
>>> Voice: 503.251.0452 x82
>>> EMail: gregs at sloop.net
>>> http://www.sloop.net
>>> ---
>>>
>>> _______________________________________________
>>> dhcp-users mailing list
>>> dhcp-users at lists.isc.org
>>> https://lists.isc.org/mailman/listinfo/dhcp-users
> SC> _______________________________________________
> SC> dhcp-users mailing list
> SC> dhcp-users at lists.isc.org
> SC> https://lists.isc.org/mailman/listinfo/dhcp-users
>
> --
> Gregory Sloop, Principal: Sloop Network & Computer Consulting
> Voice: 503.251.0452 x82
> EMail: gregs at sloop.net
> http://www.sloop.net
> ---
>
> _______________________________________________
> dhcp-users mailing list
> dhcp-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/dhcp-users