excessive failover pool balancing, leases files getting out of sync

Gordon A. Lang glang at goalex.com
Fri Jun 17 18:47:58 UTC 2011


One of the things we considered is the possibility that the offers/acks are 
getting lost on the network (i.e. the client is behaving like you would 
expect in the event it doesn't receive the respose it expects).  We have 
looked at the network, and we found no clues.  We have yet to try and 
capture a failure when it happens to see if maybe an offer or an ack being 
logged doesn't actually get released into the network.

Let me know if you figure anything out, and I will likewise post my 
findings.

--
Gordon A. Lang

----- Original Message ----- 
From: Marc Perea
To: dhcp-users at lists.isc.org
Sent: Friday, June 17, 2011 10:55 AM
Subject: Re: excessive failover pool balancing,leases files getting out of 
sync


>From: "Gordon A. Lang" glang at goalex.com

>While most clients are happily getting leases, many clients keep
>retrying as if they never got the offer/acks or else they simply don't
>like what they are getting.
>
>The clients who experience trouble one day are not typically the
>same clients who experience trouble the next day -- the problems
>seem to be randomly and uniformly distributed across all users
>(thousands of users) and all subnets (hundreds of subnets).
>
>Does this ring a bell with anyone?

Hi Gordon,
this sounds exactly like a problem we are currently investigating. We've 
looked into our core, BRAS, transport, access, and CPE vendors alike. I 
wonder if we could see if we have any similarities? We don't use failover, 
but instead of a couple dhcp servers with the same config handing back 
static host IPs.

For us, the problem appears to be that at some point, just like you are 
seeing, it looks like traffic starts to fail in the downstream path, in that 
either ACKs and OFFERs are either not getting to the client, or the client 
is unhappy with what it is receiving. Things will be going along fine, 1/2 
of lease duration renewals will be occurring, and then at some point the 
backoff algorithm will get invoked, sending more and more renewal attempts 
until the lease expires, after which we'll see a DISCOVER - OFFER loop 
continue indefinitely. We're a DSL ISP, and a modem reboot or retrain fixes 
it, at which point a full DORA occurs. Our CPE vendor provided us custom 
firmware so that if the router sees a retrain it reboots the dhcp daemon, 
and it's interesting that a down/up on our access shelf for the port will 
fix it too.

We've worked with our CPE vendor who claims that they aren't receiving any 
OFFERs when it gets stuck in the D-O loop, and our access vendor claims they 
are absolutely certain they are putting the OFFER - that we are certain is 
being ingressed by access shelf - out on the wire to the customer as ATM. 
We're thinking that possibly the line conditions have changed enough that 
the lower frequencies on the adsl (upstream) are still good enough to pass 
our traffic upstream, but the higher frequencies (downstream) are so poor by 
this point that they are causing the CPE to discard the frames, but we are 
still working on proving this theory.

Any of this sound familiar?

--Marc



_______________________________________________
dhcp-users mailing list
dhcp-users at lists.isc.org
https://lists.isc.org/mailman/listinfo/dhcp-users 




More information about the dhcp-users mailing list