dhcp fails with big dhcpd.leases

dorian dorian33 at o2.pl
Tue Aug 31 21:05:11 UTC 2010


Simon Hobson wrote:
> dorian wrote:
>
>> Here is a little bit longer another log snippet
>> Aug 31 13:51:47 [dhcpd] DHCPDISCOVER from 7c:c5:37:21:d9:7c via br0
>> Aug 31 13:51:47 [dhcpd] DHCPOFFER on 172.18.93.227 to 7c:c5:37:21:d9:7c
>> via br0
>> Aug 31 13:51:49 [dhcpd] DHCPDISCOVER from 00:23:14:c0:61:28 (BLU060)
>> via br0
>> Aug 31 13:51:49 [dhcpd] DHCPOFFER on 172.18.90.186 to 00:23:14:c0:61:28
>> (BLU060) via br0
>> Aug 31 13:51:50 [dhcpd] DHCPDISCOVER from 00:25:d3:d8:71:1c
>> (Malgos-Komputer) via br0
---cut ---
>> Aug 31 13:52:10 [dhcpd] DHCPOFFER on 172.18.93.237 to 00:22:43:95:d1:1e
>> (TWOJA-6VJZP1GTV) via br0
>> Aug 31 13:52:10 [dhcpd] DHCPDISCOVER from 00:25:bc:0e:09:83
>> (iPhone-SZAST) via br0
>>
>> If you wish I can post a whole log file which is rather long but I don't
>> think it is any meaning to do that.
>> There is nothing interesting inside (a bunch of lines with DHCPDISCOVER
>> & DHCPOFFER messages without DHCPACK between them) - no warnings nor
>> errors.
>>
>> Looking at the above snippet:   host with MAC 7c:c5:37:21:d9:7c asked
>> several times for dhcp data.
>> The first logs concerning this MAC which can be found are:
>> Aug 31 12:54:03 [dhcpd] DHCPDISCOVER from 7c:c5:37:21:d9:7c via br0
>> Aug 31 12:54:04 [dhcpd] DHCPOFFER on 172.18.93.227 to 7c:c5:37:21:d9:7c
>> via br0
>>
>> It means the host haven't got IP.
>
> But note also, it does NOT request the address.
It does. But I omit the request:
Aug 31 13:27:52 [dhcpd] DHCPREQUEST for 10.0.1.8 from 7c:c5:37:21:d9:7c
via br0: wrong network.
Aug 31 13:27:52 [dhcpd] DHCPNAK on 10.0.1.8 to 7c:c5:37:21:d9:7c via br0

I do not remember whole dhcp protocol. So I don't know what it is really
exchanged between client and server.
But according my -maybe naive consideration- the host should be able to
ask for a quite new IP without querying for assigning the "old" one.
Especially when it tries to get dhcp data connecting  totally fresh
network: there is no address to ask about.
> If there is no Request, then the server has nothing to Ack.
Ok. I undestand - the DHCPACK is posted only when the host asks about
the IP address and the IP is confirmed.
> The ONLY request in that snippet is where 00:18:51:ce:b3:69 requests
> 172.27.140.7 but it is not a known lease. There isn't another instance
> of that MAC address in the log you posted.
>
> Now, why is it unknown ? Probably because you have broken your DHCP
> server by deleting the leases file.
First of all - the core of the problem is:
a) when the dhcpd.leases became "big" the server stops serving DHCP data
(or clients don't received them)
b) stopping server, removing dhcpd.leases and starting server - fixes
the problem immediately
And this is the problem being the _main subject_ of my mails.

The message exchange consideration is the results of my suspicions being
a result of my ignorance regarding the protocol.
BTW: I have never wrote that I just delete the lease file.

> This is something you really, really should not be doing as it breaks
> stuff badly. It means the server has no knowledge whatsoever of
> "promises" it has previously made to clients, and so it will tend to
> make offers for addresses that are already in use.
>
>>  > The leases file is a log file - the server only ever appends to it,
>>>  and during operations it never reads from it. It is only ever read
>>>  during startup when it reads each lease in turn and populates it's
>>>  internal tables. Even then, it does not (I assume) read the file into
>>>  memory - it just has to parse each lease as it munches through the
>>> file.
>>>
>> Well. Having big dhcpd.leases file (with the size near mentioned above)
>> I've found the server has to read the dhcpd.leases since start takes
>> ~10minutes (it is not an error  -10 minutes!)
>
> Which is what I wrote - it reads the file **during startup** in order
> to populate the internal data structures with the leases that have
> been previously given out. It is never read at any other time.
>
>> According to my experience - removing the dhcpd.leases and restart fixes
>> the disfunctionality of the server immediately whereas restarting the
>> server with big dhcpd.leases changes nothing (apart from the restart is
>> extremely long)
>
> But deleting the leases file DOES fundamentally break your server config.
Well. So, was Sten Carlsen wrong writing " The leases file is a log file
- the server only ever appends to it, and during operations it never
reads from it." ?
Because if he was right deleting lease file (during server run time)
should not break the server - or there is a bug in the software as
writing to opened (i.e using file handle to) file  which has been
removed should be detected in the software.
I know that noone assumes such stupid user action but for the services
running 24/24 everything may happen - the file system can crash (or
whole HDD having the partition with this file can corrupt) so such
service should "behave correctly" and report the error in other way.

Anyway.
I have never wrote that I am deleting file without stopping the server.
And dhcpd.lease file remove is "legal" when server is not running, isn't
it ?
>
>>  > To avoid the file growing ever larger, the server will periodically
>>>  clean up. It does this by writing out it's current in-memory tables to
>>>  a new leases file, and swapping it into place by renaming the original
>>>  file and then renaming the new file into place.
>>>
>> How long is the "period" ?
>> I've never found the file dhcpd.leases became smaller...
>
> The period is a (compiled in) default of 1 hour. If you look, you
> should see something like "dhcpd.leases" and "dhcpd.leases~". The
> second of these is the previous version.
Ok. Thanks for info.
>
>
> You should see the new version is slightly smaller than the old one
> immediately after the cleanup. 
Ok. Yes, it is smaller.
> It will never be 'small' on a server with that configuration because
> it will have to keep track of up to about 260,000 addresses. Even when
> a lease has expired, the last state of it is kept indefinitely in case
> the client should return to the network - and it is only replaced when
> the server runs out of "never used before" addresses and starts
> reusing expired leases in a "least recently used" manner.
Please be so kind and clarify Sten Carlsen info mentioned above: if he
is right and there is no restarts the track should be kept in RAM rather
than on HDD.
> I'm not trying to say you don't have a problem, but so far the log
> snippets don't show it. Have you tried picking a client MAC and
> 'grep'ing for that in the log ?
>
You are right: so far the log snippets don't show the problem.
And looking at the logs (I am keeping ALL of the logs - nothing is
deleted or rotated) I cannot find the problem.
Server process looks like it is working - but in fact does not.

If you confirm I can send you all logs files (even for whole last month
if you wish).
But I am not sure if it makes sense -  grep'ing them for an 'error'
phrase gives nothing.
>
>
>> Sorry. I do not understand.
>> What is illegal or unusual with it?
>>
>> 172.16.8.0 belongs to 172.16.0.0/14
>> and 172.16.0.0/14 is a part of 172.16.0.0/12 private class
>>
>> So what does mean 'behave "funny".' ?
>
> There is nothing illegal or funny, but it is known that a small number
> of badly programmed clients cannot cope with the last octet being 0 or
> 255 since everyone "knows" that 0 is the network address and 255 is
> the broadcast address.
???
Do you know which ones? Windows? MacOS? Mobile OSes?
Quite new info for me! Detecting net on IP base only?
I've ever assumed that to get net & broadcast I need IP and mask.
Well, its very interesting...

> Complete rubbish, but there are people who have never used anything
> but a /24 subnet and just cannot comprehend anything else - and that
> includes some supposedly professional IT people I've worked with !
>
> For that reason alone, it's suggested to avoid them by splitting your
> ranges thus :
>
> range 172.16.8.1 172.16.8.254;
> range 172.16.9.1 172.16.9.254;
> range 172.16.10.1 172.16.10.254;
> ...
> Something of a pain for the number of addresses you have !
Good idea, but I am afraid it will not solve the problem.
My linux box could not to obtain the IP when the server became
disfunctional and I assume this OS will not 'behave "funny".'

What is more interesting: the problem doesn't mean  total disfunction -
another PC received IP.
And another one - not.
But generally IP is not assigned to the clients.
>
>>  > That range is over a quarter of a million addresses. Does the server
>>>  still have issues with very large ranges ?
>> Yes it is.
>> And even if not - in my opinion this doesn't concern the point of the
>> problem...
>
> Well the point is that it's a large number of addresses, and from
> memory of threads I didn't pay much attention to as I only run small
> servers, there are aspects (hash table IIRC) that don't scale too well
> for very large address spaces. Even for addresses that aren't used,
> the server must build a them into an internal list.
> There are few people running such large spaces, but from memory I
> don't think yours if the biggest that's been mentioned on this list.
>
>>  > I vaguely recall there used to be issues with memory usage and
>> startup
>>>  times.
>> The host is equipped with 16GB RAM so...
>>>
>>>  It does sound a rather excessive number of addresses - even for a
>>>  public access point.
>>>
>> As above: this is not a point of the problem - or maybe is it? But if so
>> please say it clearly.
>> Are there any limits on served IP ranges or classes?
>
> There are no specific limits, other than memory and I/O bandwidth. As
> mentioned above, there are some elements of the design that don't
> scale well - or didn't in earlier versions. On that point - what
> version are you using ?
Version of which stuff?
I am using Gentoo Linux and the dhcp version is 3.1.2_p1
Everything is compiled for 64bit platform.
>
>> I need such big IP range since in fact I have a network of hotspots
>> working in bridge and centrally controlled from one host.
>
> How many clients do you normally have on the network in any 2 hour
> period ?
Daily I have about 60 client per point and it grows.
Now I have 10 points. The plans are to have up to 1000 points.
> Looking at your original log snippet, you seem to have less than one
> request per second. For 250,000 clients and a 2 hours lease, you
> should be seeing not less than about 34 request-ack or
> discover-offer-request-ack exchanges per second.
>
> I'd suggest it's worth cutting back on the address space and see if it
> makes a difference.
>
The lease time is 72000 not 7200 which gives 20 hours (in practice =
whole day).
The wide range of IP let me to assume that the same client (=same MAC)
will have same IP a day.
What is more with a high probability he will get the same IP in another
hotspot  next day(s).

And differentiating between clients is very important for the business.
> Also, almost as an aside, I notice that you have timeouts for DNS
> updates. This suggests that your DDNS isn't set up correctly - it
> might be worth turning it off while you are trying to troubleshoot
> this problem.
>
Could you be more precise?
I am not an expert as far as dhcp settings.
In small (<100 hosts) networks I have been involved till now the
settings were ok, so I would be obliged for any advices/directions what
I need to learn else to manage with this case...




More information about the dhcp-users mailing list