Tons of "lease imbalance messages" then crash

ashley.hatch at unlv.edu ashley.hatch at unlv.edu
Wed Feb 7 21:34:51 UTC 2007


I run a pair of ISC DHCPd 3.0.1-2 debian based servers, serving 5000+ 
clients reliably for over 4 years now. Recently (~6 months) we had to 
upgrade from an earlier version of ISC DHCPd which did not work properly 
with our redundant DHCP helpers. the old version never hiccuped once 
(sadly I cannot recall the old version but it is from Dec 2003).  The new 
version solved the multiple DHCP helpers issue, but seems to have 
introduced a new failover based problem and I wanted to get some input on 
the problem before I blindly upgrade again, as I cannot find a bug fix 
that I can say exactly matches our problem.
What happens is one the two servers, and it has occured to both but now 
only occurs on the primary, will stop "hearing" DHCP requests and will 
only log entries like: 
Feb  7 12:23:40 merry dhcpd: lease imbalance - lts = 13
Feb  7 12:23:40 merry dhcpd: lease imbalance - lts = 7
Feb  7 12:23:40 merry dhcpd: lease imbalance - lts = 3
Feb  7 12:23:40 merry dhcpd: lease imbalance - lts = 8
..... 100's of times.

When it gets into this mode it stops doing any failover or DHCP service, 
it will handle maybe one DHCP request to 100 lts log entries and it 
creates a few dozen of the LTS log entries a second for minutes at a time. 
Sometimes it comes back on its own, other times it will quietly exit with 
no core dump or error message. While I have always seen the LTS messages 
as part of normal operation, they are normaly spread apart and happen only 
in small clusters, not 1000 at a time. I have verified that network 
connectivity is not being lost at the servers by using constant pings both 
to and from both machines to multiple hosts. I have also verified that 
both the memory and processor are working properly using Memtest86+ and 
Prime95 on both servers.

I have tried rebooting both boxes which does not help. It can go a month 
without crashing but it has been doing it multiple times a day lately and 
it could potentially be a real problem if it continues. I am at the point 
of giving up and just upgrading to 3.0.4, but I hate changing versions 
when I don't know what is causing the problem in the first place.

Any insight would be appreciated, or just "upgrade the server" from 
someone wise in the newer versions.

Thanks,
Ashley Hatch



More information about the dhcp-users mailing list