Performance issue ( maybe )

Bjarne Blichfeldt bjb at jndata.dk
Mon Sep 6 12:56:44 UTC 2010


OK, so I think I forgot something in the heat - must be getting old.., just checked the message logiles.

When I started on work Friday,  dhcpd  process on dhcp1 was not running, so I started it at 6:53.

In dhcpd.log, from the night I get a gazillion :
Sep  3 00:01:40 b00011100859 dhcpd: DHCPACK on 10.19.81.13 to 00:1f:29:39:6c:51 (B00019A0411064) via bond0
Sep  3 00:01:40 b00011100859 dhcpd: balancing pool aa9c7a8 10.242.230.0/24  total 200  free 78  backup 116  lts -19  max-own (+/-)19
Sep  3 00:01:40 b00011100859 dhcpd: balanced pool aa9c7a8 10.242.230.0/24  total 200  free 78  backup 116  lts -19  max-misbal 29
Sep  3 00:01:40 b00011100859 dhcpd: balancing pool aa93498 10.242.0.0/24  total 200  free 66  backup 98  lts -16  max-own (+/-)16
:: etc

The last entry was from 02:47.
During this time, dhcp1 has only answered a few INFORM with ACK.

All through the evening and night there are a lot of :
Sep  2 21:20:24 b00011100859 dhcpd: failover peer ipc-dhcp1-ipc-dhcp2: peer moves from communications-interrupted to communications-interrupted
Sep  2 21:20:34 b00011100859 dhcpd: failover peer ipc-dhcp1-ipc-dhcp2: I move from communications-interrupted to normal
Sep  2 21:23:49 b00011100859 dhcpd: timeout waiting for failover peer ipc-dhcp1-ipc-dhcp2
Sep  2 21:23:54 b00011100859 dhcpd: failover peer ipc-dhcp1-ipc-dhcp2: I move from normal to communications-interrupted
Sep  2 21:24:28 b00011100859 dhcpd: failover: link startup timeout
Sep  2 21:24:33 b00011100859 dhcpd: failover peer ipc-dhcp1-ipc-dhcp2: peer moves from communications-interrupted to communications-interrupted
Sep  2 21:24:39 b00011100859 dhcpd: failover peer ipc-dhcp1-ipc-dhcp2: I move from communications-interrupted to normal
Sep  2 21:30:12 b00011100859 dhcpd: failover: link startup timeout
Sep  2 21:30:20 b00011100859 dhcpd: timeout waiting for failover peer ipc-dhcp1-ipc-dhcp2
Sep  2 21:30:25 b00011100859 dhcpd: failover peer ipc-dhcp1-ipc-dhcp2: I move from normal to communications-interrupted
Sep  2 21:30:42 b00011100859 dhcpd: failover peer ipc-dhcp1-ipc-dhcp2: peer moves from communications-interrupted to communications-interrupted
Sep  2 21:30:53 b00011100859 dhcpd: failover peer ipc-dhcp1-ipc-dhcp2: I move from communications-interrupted to normal
Sep  2 21:36:18 b00011100859 dhcpd: timeout waiting for failover peer ipc-dhcp1-ipc-dhcp2 

According to the network guys, nothing special was going on. No backups are running at that time, no (known) load during the night.
Am I using correct values in failover :

failover peer "ipc-dhcp1-ipc-dhcp2" {
        secondary;
        address 10.11.90.74;
        port 647;
        peer address 10.11.90.73;
        peer port 647;
        max-response-delay 90;
        max-unacked-updates 20;
        load balance max seconds 5;
}

Or do I have something else going on ? 

Maybe all this comes from some recovery problem.

Regards
Bjarne


> -----Original Message-----
> From: Bjarne Blichfeldt
> Sent: 6. september 2010 13:28
> To: Users of ISC DHCP
> Subject: RE: Performance issue ( maybe )
> 
> > -----Original Message-----
> > From: Tom Schmitt
> > Sent: 6. september 2010 09:01
> > To: Users of ISC DHCP
> > Subject: Re: Performance issue ( maybe )
> >
> >
> > > Von: Bjarne Blichfeldt <bjb at jndata.dk>
> > > Betreff: Performance issue ( maybe )
> >
> >
> > > During clients startup, the servers takes a very long time  ~20-30
> > > seconds> to answer a DISCOVER.
> >
> > As a first step before you look for the source on the server, I would do a
> > tcpdump on the server interface to make sure that the delay you are
> > experiencing are really originate there and not somewhere else on the
> network.
> Absolutely. Unfortunately, in the heat last Friday, the traces where not
> coordinated. If there comes a next time
> the traces will get started asap.
> 
> >
> >
> >
> >
> > > è Cpu2  :  0.0%us,  0.3%sy,  0.0%ni, 1.3%id,  98%wa,  0.0%hi,  0.0%si,
> > > 0.0%st
> >
> > So regarding the wait I/O: first step is looging without sync, second step
> is
> > logging only errors and higher third step is no logging at all (only for a
> > testperiod to see if your problem is going away)
> I have now changed to logging without sync.
> 
> 
> >
> > > The only process really working is kjournald.
> >
> > So are ther any other logmessages in high numbers beside the ones from
> dhcpd?
> No, but I have noticed that the all the balancing lines seems to be duplicated
> :
> Sep  3 06:53:40 b00011100859 dhcpd: balancing pool 9c8de78 10.229.16.0/24
> total 248  free 124  backup 124  lts 0  max-own (+/-)25
> Sep  3 06:53:40 b00011100859 dhcpd: balanced pool 9c8de78 10.229.16.0/24
> total 248  free 124  backup 124  lts 0  max-misbal 37
> Sep  3 06:53:40 b00011100859 dhcpd: balancing pool 9c822c8 10.229.15.0/24
> total 248  free 124  backup 124  lts 0  max-own (+/-)25
> Sep  3 06:53:40 b00011100859 dhcpd: balanced pool 9c822c8 10.229.15.0/24
> total 248  free 124  backup 124  lts 0  max-misbal 37
> 
> And one (of many different) specific :
> Sep  3 08:19:53 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
> 199  free 102  backup 88  lts 7  max-misbal 29
> Sep  3 08:20:56 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
> 199  free 102  backup 88  lts 7  max-misbal 29
> Sep  3 08:21:50 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
> 199  free 102  backup 88  lts 7  max-misbal 29
> Sep  3 08:22:52 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
> 199  free 102  backup 88  lts 7  max-misbal 29
> Sep  3 08:23:52 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
> 199  free 102  backup 88  lts 7  max-misbal 29
> Sep  3 08:24:31 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
> 199  free 102  backup 88  lts 7  max-misbal 29
> Sep  3 08:25:34 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
> 199  free 102  backup 88  lts 7  max-misbal 29
> Sep  3 08:26:35 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
> 199  free 102  backup 88  lts 7  max-misbal 29
> Sep  3 08:27:37 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
> 199  free 102  backup 88  lts 7  max-misbal 29
> Sep  3 08:28:31 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
> 199  free 102  backup 88  lts 7  max-misbal 29
> Sep  3 08:29:32 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
> 199  free 102  backup 88  lts 7  max-misbal 29
> Sep  3 08:30:33 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
> 199  free 102  backup 88  lts 7  max-misbal 29
> Sep  3 08:31:11 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
> 199  free 102  backup 88  lts 7  max-misbal 29
> 
> Also , why are there so many "balanced pool" messages ?. Is that a result of
> our high numbers of outstanding OFFERS ?
> 
> 
> >
> >
> >
> > >
> > > 1)      anybody seen something similar ?
> > No.
> >
> > >
> > > 2)      Good ideas to further investigate ?  What about the network
> > > topology ? Any gotcha's when sending DISCOVERY through two cisco routers ?
> > Shouldn't be an isuue. I do the same without having problems.
> > Or did you you just upgrade your IOS?
> No new IOS
> 
> >
> >
> > Is the problem occuring on both on your dhcpd-servers at the same time? Or
> > only on one of them?
> Both at the same time.
> 
> >
> > And are there any messages in the log which could give you a hint?
> Except for the very high number of "balanced pool", no.
> 
> >
> >
> > > Also, what would be the consensus of disabling pingcheck ?
> > >
> > > ping-check false;
> > >
> > > The ping adds at least one  second to every discovery/offer,
> >
> > You would win one second, but haveing other problems. I won't recomment it.
> > Better you find the real problem than working around it.
> Yes, I kind of thought so. I will not change that for now.
> 
> >
> >
> >
> >
> > >         option option-150 10.11.75.10 ;
> > >         filename "\\mboot.0<file:///\\mboot.0>" ;
> > >         next-server 10.2.2.240 ;
> >
> > Could it be, that not your server is the source of the problem, but the
> server
> > 10.2.2.240?
> No, this is just a small sample of the conf file, to show a typical example,
> we have 1352 subnets with different next-server statements.
> 
> 
> Thanks for your input. It is really appreciated.
> 
> Regards,
> Bjarne
> _______________________________________________
> dhcp-users mailing list
> dhcp-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/dhcp-users


More information about the dhcp-users mailing list