dhcpd process hitting data size limit

Mon Mar 3 14:26:12 UTC 2008

I have a 3.1.0 server running as primary in a failover configuration,
around 100k leases, normal process size is around 90 - 100MB. Today the
dhcpd process on this server ballooned to over 500MB, and then hit the
default data size limit of 512 MB. In the logs I found the following:

Mar  3 14:23:21 dhcp2 dhcpd: dhcp_failover_put_message: something went wrong.
Mar  3 14:23:21 dhcp2 dhcpd: peer dhcp1-dhcp2: disconnected
Mar  3 14:23:21 dhcp2 dhcpd: failover peer dhcp1-dhcp2: I move from normal to communications-interrupted
Mar  3 14:23:22 dhcp2 dhcpd: uid lease 193.71.113.38 for client 00:00:e2:94:6a:61 is duplicate on 193.71.112/21
Mar  3 14:23:23 dhcp2 dhcpd: uid lease 81.191.9.183 for client 00:08:da:53:b9:df is duplicate on 81.191.0/20
Mar  3 14:23:26 dhcp2 dhcpd: dhcp_failover_put_message: something went wrong.
Mar  3 14:23:26 dhcp2 dhcpd: peer dhcp1-dhcp2: disconnected
Mar  3 14:23:26 dhcp2 dhcpd: failover: connect: no matching state.
Mar  3 14:23:26 dhcp2 dhcpd: no memory for option buffer.
Mar  3 14:23:26 dhcp2 dhcpd: no memory for option buffer.
Mar  3 14:23:26 dhcp2 dhcpd: no memory for option buffer.
Mar  3 14:23:26 dhcp2 dhcpd: no memory for option buffer.
Mar  3 14:23:26 dhcp2 dhcpd: no memory for option buffer.
Mar  3 14:23:26 dhcp2 dhcpd: no memory for option buffer.
(repeat ad nauseam)

On the failover peer, where the dhcpd process stayed at its normal size,
I found the following:

Mar  3 14:23:21 slam2 dhcpd: peer dhcp1-dhcp2: disconnected
Mar  3 14:23:21 slam2 dhcpd: failover peer dhcp1-dhcp2: I move from normal to communications-interrupted
Mar  3 14:23:22 slam2 dhcpd: uid lease 195.0.206.75 for client 00:17:3f:96:d8:06 is duplicate on 195.0.200/21
Mar  3 14:23:24 slam2 dhcpd: uid lease 81.191.61.134 for client 00:0b:82:0d:06:0a is duplicate on 81.191.48/20
Mar  3 14:23:26 slam2 dhcpd: peer dhcp1-dhcp2: disconnected
Mar  3 14:23:28 slam2 dhcpd: uid lease 81.191.126.180 for client 00:a0:c5:c0:35:ea is duplicate on 81.191.112/20
Mar  3 14:23:31 slam2 dhcpd: uid lease 193.90.168.171 for client 00:a0:c5:db:5a:97 is duplicate on 193.90.160/20
Mar  3 14:23:35 slam2 dhcpd: uid lease 81.191.199.70 for client 00:a0:c5:80:84:37 is duplicate on 81.191.192/20
Mar  3 14:23:40 slam2 dhcpd: uid lease 193.91.143.135 for client 00:17:3f:5c:28:64 is duplicate on 193.91.128/20
Mar  3 14:23:41 slam2 dhcpd: failover: link startup timeout
Mar  3 14:23:42 slam2 dhcpd: uid lease 81.191.182.196 for client 00:13:49:4a:c3:b0 is duplicate on 81.191.176/20
Mar  3 14:23:44 slam2 dhcpd: uid lease 81.191.2.218 for client 00:a0:c5:56:a5:cc is duplicate on 81.191.0/20
Mar  3 14:23:46 slam2 dhcpd: failover: link startup timeout
Mar  3 14:23:46 slam2 dhcpd: failover: link startup timeout

I ended up restarting the dhcpd process on both servers, and everything
seems to be back to normal now. Both servers are running FreeBSD 6.3.

So, my questions are:

- Any idea what might have happened here? As far as we know there's been
communication between the failover peers at all times.
- Any rules of thumb for how big the dhcpd process is expected to grow,
presumably based on number of leases?

Steinar Haug, Nethelp consulting, sthaug at nethelp.no