Performance issue ( maybe )

Glenn Satchell glenn.satchell at uniq.com.au
Mon Sep 6 13:08:40 UTC 2010


Ok, so this looks like some sort of networking issue, perhaps NIC, 
cables or switch port? Check you have up to date drivers for your NICs.

Run ifconfig and look for any errors or collisions. Check the speed and 
duplex settings for the NIC and ask the network guys to check the same 
settings on the switch port. Check cables are well seated in the server, 
and if you can, on the switch.

Try an ftp of a largish (few 10s of megabytes) file between each of the 
servers and a third one to see if one works well and the other has some 
problems? This will help isolate the system and give you a nice test case.

Good luck, but at least a problem has been found, now to fix it!

regards,
-glenn

On 09/06/10 22:56, Bjarne Blichfeldt wrote:
> OK, so I think I forgot something in the heat - must be getting old.., just checked the message logiles.
>
> When I started on work Friday,  dhcpd  process on dhcp1 was not running, so I started it at 6:53.
>
> In dhcpd.log, from the night I get a gazillion :
> Sep  3 00:01:40 b00011100859 dhcpd: DHCPACK on 10.19.81.13 to 00:1f:29:39:6c:51 (B00019A0411064) via bond0
> Sep  3 00:01:40 b00011100859 dhcpd: balancing pool aa9c7a8 10.242.230.0/24  total 200  free 78  backup 116  lts -19  max-own (+/-)19
> Sep  3 00:01:40 b00011100859 dhcpd: balanced pool aa9c7a8 10.242.230.0/24  total 200  free 78  backup 116  lts -19  max-misbal 29
> Sep  3 00:01:40 b00011100859 dhcpd: balancing pool aa93498 10.242.0.0/24  total 200  free 66  backup 98  lts -16  max-own (+/-)16
> :: etc
>
> The last entry was from 02:47.
> During this time, dhcp1 has only answered a few INFORM with ACK.
>
> All through the evening and night there are a lot of :
> Sep  2 21:20:24 b00011100859 dhcpd: failover peer ipc-dhcp1-ipc-dhcp2: peer moves from communications-interrupted to communications-interrupted
> Sep  2 21:20:34 b00011100859 dhcpd: failover peer ipc-dhcp1-ipc-dhcp2: I move from communications-interrupted to normal
> Sep  2 21:23:49 b00011100859 dhcpd: timeout waiting for failover peer ipc-dhcp1-ipc-dhcp2
> Sep  2 21:23:54 b00011100859 dhcpd: failover peer ipc-dhcp1-ipc-dhcp2: I move from normal to communications-interrupted
> Sep  2 21:24:28 b00011100859 dhcpd: failover: link startup timeout
> Sep  2 21:24:33 b00011100859 dhcpd: failover peer ipc-dhcp1-ipc-dhcp2: peer moves from communications-interrupted to communications-interrupted
> Sep  2 21:24:39 b00011100859 dhcpd: failover peer ipc-dhcp1-ipc-dhcp2: I move from communications-interrupted to normal
> Sep  2 21:30:12 b00011100859 dhcpd: failover: link startup timeout
> Sep  2 21:30:20 b00011100859 dhcpd: timeout waiting for failover peer ipc-dhcp1-ipc-dhcp2
> Sep  2 21:30:25 b00011100859 dhcpd: failover peer ipc-dhcp1-ipc-dhcp2: I move from normal to communications-interrupted
> Sep  2 21:30:42 b00011100859 dhcpd: failover peer ipc-dhcp1-ipc-dhcp2: peer moves from communications-interrupted to communications-interrupted
> Sep  2 21:30:53 b00011100859 dhcpd: failover peer ipc-dhcp1-ipc-dhcp2: I move from communications-interrupted to normal
> Sep  2 21:36:18 b00011100859 dhcpd: timeout waiting for failover peer ipc-dhcp1-ipc-dhcp2
>
> According to the network guys, nothing special was going on. No backups are running at that time, no (known) load during the night.
> Am I using correct values in failover :
>
> failover peer "ipc-dhcp1-ipc-dhcp2" {
>          secondary;
>          address 10.11.90.74;
>          port 647;
>          peer address 10.11.90.73;
>          peer port 647;
>          max-response-delay 90;
>          max-unacked-updates 20;
>          load balance max seconds 5;
> }
>
> Or do I have something else going on ?
>
> Maybe all this comes from some recovery problem.
>
> Regards
> Bjarne
>
>
>> -----Original Message-----
>> From: Bjarne Blichfeldt
>> Sent: 6. september 2010 13:28
>> To: Users of ISC DHCP
>> Subject: RE: Performance issue ( maybe )
>>
>>> -----Original Message-----
>>> From: Tom Schmitt
>>> Sent: 6. september 2010 09:01
>>> To: Users of ISC DHCP
>>> Subject: Re: Performance issue ( maybe )
>>>
>>>
>>>> Von: Bjarne Blichfeldt<bjb at jndata.dk>
>>>> Betreff: Performance issue ( maybe )
>>>
>>>
>>>> During clients startup, the servers takes a very long time  ~20-30
>>>> seconds>  to answer a DISCOVER.
>>>
>>> As a first step before you look for the source on the server, I would do a
>>> tcpdump on the server interface to make sure that the delay you are
>>> experiencing are really originate there and not somewhere else on the
>> network.
>> Absolutely. Unfortunately, in the heat last Friday, the traces where not
>> coordinated. If there comes a next time
>> the traces will get started asap.
>>
>>>
>>>
>>>
>>>
>>>> è Cpu2  :  0.0%us,  0.3%sy,  0.0%ni, 1.3%id,  98%wa,  0.0%hi,  0.0%si,
>>>> 0.0%st
>>>
>>> So regarding the wait I/O: first step is looging without sync, second step
>> is
>>> logging only errors and higher third step is no logging at all (only for a
>>> testperiod to see if your problem is going away)
>> I have now changed to logging without sync.
>>
>>
>>>
>>>> The only process really working is kjournald.
>>>
>>> So are ther any other logmessages in high numbers beside the ones from
>> dhcpd?
>> No, but I have noticed that the all the balancing lines seems to be duplicated
>> :
>> Sep  3 06:53:40 b00011100859 dhcpd: balancing pool 9c8de78 10.229.16.0/24
>> total 248  free 124  backup 124  lts 0  max-own (+/-)25
>> Sep  3 06:53:40 b00011100859 dhcpd: balanced pool 9c8de78 10.229.16.0/24
>> total 248  free 124  backup 124  lts 0  max-misbal 37
>> Sep  3 06:53:40 b00011100859 dhcpd: balancing pool 9c822c8 10.229.15.0/24
>> total 248  free 124  backup 124  lts 0  max-own (+/-)25
>> Sep  3 06:53:40 b00011100859 dhcpd: balanced pool 9c822c8 10.229.15.0/24
>> total 248  free 124  backup 124  lts 0  max-misbal 37
>>
>> And one (of many different) specific :
>> Sep  3 08:19:53 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
>> 199  free 102  backup 88  lts 7  max-misbal 29
>> Sep  3 08:20:56 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
>> 199  free 102  backup 88  lts 7  max-misbal 29
>> Sep  3 08:21:50 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
>> 199  free 102  backup 88  lts 7  max-misbal 29
>> Sep  3 08:22:52 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
>> 199  free 102  backup 88  lts 7  max-misbal 29
>> Sep  3 08:23:52 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
>> 199  free 102  backup 88  lts 7  max-misbal 29
>> Sep  3 08:24:31 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
>> 199  free 102  backup 88  lts 7  max-misbal 29
>> Sep  3 08:25:34 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
>> 199  free 102  backup 88  lts 7  max-misbal 29
>> Sep  3 08:26:35 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
>> 199  free 102  backup 88  lts 7  max-misbal 29
>> Sep  3 08:27:37 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
>> 199  free 102  backup 88  lts 7  max-misbal 29
>> Sep  3 08:28:31 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
>> 199  free 102  backup 88  lts 7  max-misbal 29
>> Sep  3 08:29:32 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
>> 199  free 102  backup 88  lts 7  max-misbal 29
>> Sep  3 08:30:33 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
>> 199  free 102  backup 88  lts 7  max-misbal 29
>> Sep  3 08:31:11 b00011100859 dhcpd: balanced pool 8a552c0 10.20.65.0/24  total
>> 199  free 102  backup 88  lts 7  max-misbal 29
>>
>> Also , why are there so many "balanced pool" messages ?. Is that a result of
>> our high numbers of outstanding OFFERS ?
>>
>>
>>>
>>>
>>>
>>>>
>>>> 1)      anybody seen something similar ?
>>> No.
>>>
>>>>
>>>> 2)      Good ideas to further investigate ?  What about the network
>>>> topology ? Any gotcha's when sending DISCOVERY through two cisco routers ?
>>> Shouldn't be an isuue. I do the same without having problems.
>>> Or did you you just upgrade your IOS?
>> No new IOS
>>
>>>
>>>
>>> Is the problem occuring on both on your dhcpd-servers at the same time? Or
>>> only on one of them?
>> Both at the same time.
>>
>>>
>>> And are there any messages in the log which could give you a hint?
>> Except for the very high number of "balanced pool", no.
>>
>>>
>>>
>>>> Also, what would be the consensus of disabling pingcheck ?
>>>>
>>>> ping-check false;
>>>>
>>>> The ping adds at least one  second to every discovery/offer,
>>>
>>> You would win one second, but haveing other problems. I won't recomment it.
>>> Better you find the real problem than working around it.
>> Yes, I kind of thought so. I will not change that for now.
>>
>>>
>>>
>>>
>>>
>>>>          option option-150 10.11.75.10 ;
>>>>          filename "\\mboot.0<file:///\\mboot.0>" ;
>>>>          next-server 10.2.2.240 ;
>>>
>>> Could it be, that not your server is the source of the problem, but the
>> server
>>> 10.2.2.240?
>> No, this is just a small sample of the conf file, to show a typical example,
>> we have 1352 subnets with different next-server statements.
>>
>>
>> Thanks for your input. It is really appreciated.
>>
>> Regards,
>> Bjarne



More information about the dhcp-users mailing list