excessive failover pool balancing, leases files getting out of sync
Gordon A. Lang
glang at goalex.com
Fri Jun 17 18:47:58 UTC 2011
One of the things we considered is the possibility that the offers/acks are
getting lost on the network (i.e. the client is behaving like you would
expect in the event it doesn't receive the respose it expects). We have
looked at the network, and we found no clues. We have yet to try and
capture a failure when it happens to see if maybe an offer or an ack being
logged doesn't actually get released into the network.
Let me know if you figure anything out, and I will likewise post my
findings.
--
Gordon A. Lang
----- Original Message -----
From: Marc Perea
To: dhcp-users at lists.isc.org
Sent: Friday, June 17, 2011 10:55 AM
Subject: Re: excessive failover pool balancing,leases files getting out of
sync
>From: "Gordon A. Lang" glang at goalex.com
>While most clients are happily getting leases, many clients keep
>retrying as if they never got the offer/acks or else they simply don't
>like what they are getting.
>
>The clients who experience trouble one day are not typically the
>same clients who experience trouble the next day -- the problems
>seem to be randomly and uniformly distributed across all users
>(thousands of users) and all subnets (hundreds of subnets).
>
>Does this ring a bell with anyone?
Hi Gordon,
this sounds exactly like a problem we are currently investigating. We've
looked into our core, BRAS, transport, access, and CPE vendors alike. I
wonder if we could see if we have any similarities? We don't use failover,
but instead of a couple dhcp servers with the same config handing back
static host IPs.
For us, the problem appears to be that at some point, just like you are
seeing, it looks like traffic starts to fail in the downstream path, in that
either ACKs and OFFERs are either not getting to the client, or the client
is unhappy with what it is receiving. Things will be going along fine, 1/2
of lease duration renewals will be occurring, and then at some point the
backoff algorithm will get invoked, sending more and more renewal attempts
until the lease expires, after which we'll see a DISCOVER - OFFER loop
continue indefinitely. We're a DSL ISP, and a modem reboot or retrain fixes
it, at which point a full DORA occurs. Our CPE vendor provided us custom
firmware so that if the router sees a retrain it reboots the dhcp daemon,
and it's interesting that a down/up on our access shelf for the port will
fix it too.
We've worked with our CPE vendor who claims that they aren't receiving any
OFFERs when it gets stuck in the D-O loop, and our access vendor claims they
are absolutely certain they are putting the OFFER - that we are certain is
being ingressed by access shelf - out on the wire to the customer as ATM.
We're thinking that possibly the line conditions have changed enough that
the lower frequencies on the adsl (upstream) are still good enough to pass
our traffic upstream, but the higher frequencies (downstream) are so poor by
this point that they are causing the CPE to discard the frames, but we are
still working on proving this theory.
Any of this sound familiar?
--Marc
_______________________________________________
dhcp-users mailing list
dhcp-users at lists.isc.org
https://lists.isc.org/mailman/listinfo/dhcp-users
More information about the dhcp-users
mailing list