excessive failover pool balancing,leases files getting out of sync

Marc Perea marccp at srttel.com
Fri Jun 17 14:55:37 UTC 2011


>From: "Gordon A. Lang" glang at goalex.com 

>While most clients are happily getting leases, many clients keep
>retrying as if they never got the offer/acks or else they simply don't
>like what they are getting.
>
>The clients who experience trouble one day are not typically the
>same clients who experience trouble the next day -- the problems
>seem to be randomly and uniformly distributed across all users
>(thousands of users) and all subnets (hundreds of subnets).
>
>Does this ring a bell with anyone?
Hi Gordon,
this sounds exactly like a problem we are currently investigating. We've looked into our core, BRAS, transport, access, and CPE vendors alike. I wonder if we could see if we have any similarities? We don't use failover, but instead of a couple dhcp servers with the same config handing back static host IPs.
 
For us, the problem appears to be that at some point, just like you are seeing, it looks like traffic starts to fail in the downstream path, in that either ACKs and OFFERs are either not getting to the client, or the client is unhappy with what it is receiving. Things will be going along fine, 1/2 of lease duration renewals will be occurring, and then at some point the backoff algorithm will get invoked, sending more and more renewal attempts until the lease expires, after which we'll see a DISCOVER - OFFER loop continue indefinitely. We're a DSL ISP, and a modem reboot or retrain fixes it, at which point a full DORA occurs. Our CPE vendor provided us custom firmware so that if the router sees a retrain it reboots the dhcp daemon, and it's interesting that a down/up on our access shelf for the port will fix it too.
 
We've worked with our CPE vendor who claims that they aren't receiving any OFFERs when it gets stuck in the D-O loop, and our access vendor claims they are absolutely certain they are putting the OFFER - that we are certain is being ingressed by access shelf - out on the wire to the customer as ATM. We're thinking that possibly the line conditions have changed enough that the lower frequencies on the adsl (upstream) are still good enough to pass our traffic upstream, but the higher frequencies (downstream) are so poor by this point that they are causing the CPE to discard the frames, but we are still working on proving this theory.
 
Any of this sound familiar?
 
--Marc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/dhcp-users/attachments/20110617/7a8a9477/attachment.html>


More information about the dhcp-users mailing list