Watching performance on a DHCP Server

Sun Feb 10 22:35:37 UTC 2008

this experience is with a derivative of version 2 of the
server, but as the basic functionality has not changed
significantly for IPv4, it may be instructive....

at the time, our environment had about 12,000 clients split
roughly 55/45 between two servers...  each server was
connected by two links to each of approximately 120 remote
subnets, each link diversely routed to minimize disruption
due to network problems, but also delivering 2 copies of
every client message to the servers

we suffered a massive regional power failure that lasted
2-1/2 days before complete restoration...  our clients
received 7-day leases, largely grouped with their renewal
times between 8 am and 6 pm, so in a 2-1/2 day outage, we
could expect renewal requests to come from about half of our
clients, and certainly init-reboot requests to come from
all...  so, that is roughly 18,000 requests to be serviced
as power is restored....

of course, the power restoral didn't occur all at once, but
was somewhat randomly distributed over a period of roughly
32 hours

entirely by coincidence, we had instrumented the server to
capture detailed message arrival rates and response times,
expecting a normal, boring weekend...  but then the power
failed, and...  we got lots more data than we expected!

the real-time clock on our computers was capable of only 1
millisecond resolution, so I must extrapolate....  our
servers survived a nearly CONTINUOUS load of more than 1,000
requests per second for 32 hours...

of course, your mileage may vary, but by choosing an
appropriate lease lifetime, you will probably see similar or
better performance.

--Barr Hibbs

> -----Original Message-----
> From: dhcp-users-bounce at isc.org
> [mailto:dhcp-users-bounce at isc.org]On
> Behalf Of David W. Hankins
> Sent: Friday, February 08, 2008 08:55
> To: dhcp-users at isc.org
> Subject: Re: Watching performance on a DHCP Server
>
>
> On Thu, Feb 07, 2008 at 06:07:51PM -0600, Blake
> Hudson wrote:
> > By default in my distribution the leases file
> is stored in
> > /var/lib/dhcpd/dhcpd.leases. This happens to be
> on a RAID1 array with
> > 15k scsi disks and iostat shows the array as
> being maxed out once it
> > reaches ~ 300 I/O's per second. DHCP logging is
> done asynchronously to
> > the same array (which normally experiences ~ 50
> I/O ops). With CPU and
> > memory barely breaking a sweat, this leads me
> to believe that the
> > limitation is with the disks (lots of tiny writes).
> >
> > I could move the leases file to a different
> array, or to tmpfs, but
> > before I do I just want to know if these
> results are typical and that I
> > have interpreted the test data correctly and
> made the correct
> > determination as to the bottleneck.
>
> those results are typical for that kind of
> hardware, and you have
> interpreted the test data correctly: fsync() is
> the biggest
> bottleneck.
>
> in 4.1.0a1, you will find a feature, however,
> which was provided to
> us in a patch by Christof Chen.  it permits the
> server to queue
> multiple ACKs behind a single fsync(); default 28
> (576 byte DHCP
> packets filling default socket buffer send
> sizes).  the burst of acks
> are sent presently if the sockets go dry, and
> shortly will be backed
> up with a sub-second timeout.
>
> it has some bugs we're working on, particularly
> with failover, but
> we'll address those in alpha.
>
> you may find that it provides some form of
> multiplicative benefit to
> your performance stats, since fsync() is the
> bottleneck, and now there
> are 28 acks per fsync max.
>
> so if you are only pushing 50 requests/s
> currently, you may live
> comfortably in a 250 request/s buffer for some
> months until the
> 4.1.x code is stable?
>
> > Also, I would appreciate any anecdotal evidence
> with regards to how many
> > requests are typical in a large network under
> normal (or abnormal)
> > conditions. If 10,000 users all of a sudden
> came online, how many
> > requests would they really generate per second?
>
> there have been a few folks who suffered mass
> power outages, i don't
> know what search query to use, but you can find
> them on the old
> dhcp-server mailing list.  they did not report
> problems, rather the
> surprise at the lack of problem.
>
> --
> Ash bugud-gul durbatuluk agh burzum-ishi krimpatul.
> Why settle for the lesser evil?
https://secure.isc.org/store/t-shirt/
--
David W. Hankins	"If you don't do it right the first time,
Software Engineer		     you'll just have to do it again."
Internet Systems Consortium, Inc.		-- Jack T. Hankins