Watching performance on a DHCP Server

Wed Feb 13 15:59:41 UTC 2008

Enrique Perez-Terron <enrio at online.no> writes:
> The total should be, that in the event of a outage, i.e. once or twice
> per decade, a handfull, at most the pre-commit list size (eg.20), often
> far fewer, computers have to do DNS updates.

That's (eg.20) * number-of-pools, (or 17140 for me presently).

   ...

> Or, perhaps there is yet another way: delay responses to the clients,
> process other incoming request, and upon timeout (a few milliseconds),
> send all the offers to the log in a single transaction, fsync(), then
> send out all the offers to the clients. 

  At last you have finally lurched uncontrollable to the truth.
  This is what, as I understand it, the patch in the pipeline will do.

>                                         However, this may require a far
> bigger rewrite of the server.

  I wouldn't think so (but the dhcp source code is pretty inscrutable
  sometimes).  If we idealize a dhcp server as:

    while (!shutdown) {
        request = recv();
        (new_lease, reply) = decipher_and_process(request);
	if (new_lease) {
            write_lease(new_lease);
            fsync(lease_file);
        }
        send(reply);
    }

then it just becomes something like this:

    while (!shutdown) {
        if ((n_buffered_leases == MAX) ||           /* full or */
            ((n_buffered_leases > 0) &&             /*  something and */
             ((recv_queue == 0) ||                  /*    idle or */
	      (time(NULL) > buffer_max_hold)))) {   /*    held long enough */
                write(buffered_leases);
                fsync(leasefile)_
                send(buffered_replies);
                n_buffered_leases = 0; /* etc */
        }
        request = recv();
        (new_lease, reply) = decipher_and_process(request);
	if (new_lease) {
            buffer(new_lease, reply);
        } else { 
            send(reply);
        }
    }

If the fsync time dominates, then during times when your load is
such that you are backing up the input queue, then the first lease
in each buffer waits a tiny bit longer for its reply, but all the
rest win big -- a big overal win.

If we just put an fsync at 1/100 sec (very fast disks indeed) and
reading/processing a request at 1/10000 sec (fairly conserative I should think)
and it takes 1/100000 sec to write the reply packet (easy on a 100Mb link)
and if our buffer is only 4 requests big (very conservative) then
lets assume we get 4 requests more or less simultaneously at time 0,
then, with hopefully a minimum of hand-waving, we get:

              time to reply
request   standard    buffered
     1      .01011    .01041           .01 + (4 * .0001) + (1 * .00001)
     2      .02022    .01042                                2
     3      .03033    .01043                                3
     4      .04044    .01044                                4

John