tuning for maximum dhcp performance

Mon Apr 28 06:11:02 UTC 2008

----- Original Message ----- 
From: "Glenn Satchell" <Glenn.Satchell at uniq.com.au>
To: <dhcp-users at isc.org>
Sent: Sunday, April 27, 2008 10:32 AM
Subject: Re: tuning for maximum dhcp performance

> How long is your lease time? If you make it longer than 24 hours, then
> those clients that can't get a response from the dhcp server when
> everyone logs on at 9am (say) will still have a valid lease and could
> just keep on working? This is a bit simplistic.

2 day leases, but mandatory power-off of all workstations at the end of
each day.

> One option I have used where there were remote sites was to use a
> 'spoke' design. One central dhcp server and a peer server in each
> remote site. Each remote server only handles the local IP ranges for
> that site and peers with the central server. The central server has
> different failover peers for each of the remote sites. This also has
> the benefit that dhcp is available in remote sites if the neetwork is
> isolated fromt the main office for any reason.

I am hoping to get to a point where we can have a DHCP server
at each remote site, but the justification has always been for fault
tolerance, which is a weak business case since there is relatively
little work that can be done at a remote site when the WAN
circuit is down.  But if I present it as a way to off-load the central
servers to reduce the risk of overload, then maybe it will fly.

> The other pain you couldbe feelinghere are routers that drop the UDP
> dhcp packets when they get overloaded. Is there some configuration that
> can be done on the routers to prefer to not drop dhcp packets?

I hadn't considered that as a possible explanation for the problems
we have had.  Maybe I'll open a Cisco TAC case to find out under
what load conditions will broadcast packets NOT get forwarded.
I'm not familiar with any configuration parameters related to "helper-
address" load, but I've never tried to find any before, so who knows.

> I feel your pain, dhcp is meant to be one of those services that is
> "just there" all the time, but engineering it to do so is a difficult
> task. Sunfire 280R is pretty old now, is management prepared to pay for
> new hardware to provide the always there dhcp service?

We are right now in the process of doubling the number of 280R's
we have for DHCP and DNS.  The one pair will still carry about
75% of the aggregate load, but every bit helps.

Thanks for you response.

--
Gordon A. Lang

>>From: "Gordon A. Lang" <glang at goalex.com>
>>Date: Sat, 26 Apr 2008 10:29:56 -0400
>>
>>After considerable engineering, I have decided to do the following to
>>improve the robustness of our systems:
>>
>>We have a pair of SunFire 280R's doing both DHCP and DNS....using dhcp
>>failover protocol.
>>
>>1. send all logs over the network (through a dedicated NIC) to a remote
>>   syslog server (partially to eliminate disk-write competition between
>>   named/syslog and dhcpd, partially to consolidate the multiple logs,
>>   and partially to eliminate log processing off of the dns/dhcp boxes).
>>
>>2. introduce a third server to act as a hidden master and take on all
>>   dynamic dns traffic (and associated log messages, also sent to the
>>   remote syslog server).
>>
>>3. upgrading to dhcp 3.1.1 as soon as it is released.  This is mainly
>>   to take advantage of the improvements in the failover protocol
>>   since all of our past problems were related to using failover
>>   protocol under heavy load conditions.
>>
>>
>>And I am also looking for a battery-backed ramdisk (haven't found one
>>yet) to store nothing but the dhcp leases file.
>>
>>(comments?)
>>
>>
>>Our environment:
>>
>>Consider a surge of dhcp requests in a medium sized corporate HQ where
>>95% of all requests are handled okay, but 5% of the users need to
>>manually do an "ipconfig" or reboot to try again.  That means 200 users
>>are calling the helpdesk all at the same time -- exceeding the capacity
>>of the helpdesk.
>>
>>This is a career threatening event for someone in the I.T. staff -- who
>>ever is stuck with the hot potato.
>>
>>In this environment, the required benchmark is that 100% of all dhcp
>>requests are always processed, and no client ever times out.  Or else.!
>>In our environment, upper management expects that all systems will
>>continue to function in their full capacity at all times, or else a lot
>>of middle management is subjected to intense scrutiny.  And you know
>>which direction it rolls....
>>
>>Every day, 85% or more of the staff boots their computers up within a
>>10 minute window.  The system supports roughly 6000 dhcp clients
>>(including the remote sites) without problem most of the time.  But the
>>4 times in 3 years that the systems became over run, causing dozens
>>or hundreds of helpdesk calls, is terribly unacceptable.
>>
>>So, some engineering was mandatory.  It's not like a bunch of cable
>>users whose expectations are lower and whose only recourse might be
>>to cancel service - but they rarely ever do from what I've seen.
>>Not a big deal in comparison.  And with cable users, it is a not an
>>every day event that a bulk of them all are seeking addresses at the
>>same time.
>>
>>Every environment is different.
>>
>>--
>>Gordon A. Lang
>>
>>
>>
>>----- Original Message ----- 
>>From: "Frank Bulk - iNAME" <frnkblk at iname.com>
>>To: <dhcp-users at isc.org>
>>Sent: Friday, April 25, 2008 9:49 PM
>>Subject: RE: tuning for maximum dhcp performance
>>
>>
>>>I serve up 10,000 leases ranging from 3 to 14 days.  I haven't spent a
>>> second optimizing it.  It just works and has worked no matter what the
>>> client outage conditions have been.
>>>
>>> Unless you're serving up a campus where there is a real possibility that
>>> thousands of like clients (i.e. VoIP phone) may power up and come back
>>> online, there's no need to spend time over-engineering.  If there were 
>>> 20k
>>> computers on a campus that lost power and power came back on
>>> simultaneously,
>>> many of the PCs would stay off (configured in the BIOS), and those
>>> configured to power on after power failure would reach the DHCP request
>>> phase at different spots.  At 80/second, it would take just a bit over 4
>>> minutes to serve them all (if the requests were linear).  Would it 
>>> really
>>> matter if in the worst of all cases it took 10 minutes for every client 
>>> to
>>> be back online?
>>>
>>> It's those networks that serve hundreds of thousands of clients that 
>>> need
>>> to
>>> spend time engineering a solution that serves up IPs in a timely 
>>> fashion.
>>>
>>> Frank
>>>
>>> -----Original Message-----
>>> From: dhcp-users-bounce at isc.org [mailto:dhcp-users-bounce at isc.org] On
>>> Behalf
>>> Of Dan
>>> Sent: Friday, April 25, 2008 1:01 PM
>>> To: dhcp-users at isc.org
>>> Subject: tuning for maximum dhcp performance
>>>
>>>
>>> I'm currently constructing a replacement for an old Cisco Network
[................]