Failover peer separation revisted

Robert Blayzor rblayzor.bulk at inoc.net
Mon Nov 17 10:47:24 UTC 2008


On Nov 15, 2008, at 3:45 PM, David Pick wrote:
> For what it's worth, I've been hunting a problem between pairs
> of (FreeBSD 6) machines on a backbone LAN, but nothing to do
> with DHCP traffic. So far, I've found that under some yet-to-be-
> defined circumstances one machine gets into a state where it
> issues an ARP request, receives a reply (according to "tcpdump"),
> but does not put the MAC address in that received packet into the
> ARP tables. At the same time (more-or-less) using the "arp" user-
> level program to try and delete an entry taked 15-20 seconds to
> complete, but with normal very small processor time. I'm starting
> to suspect some sort of lock problem in the kernel, but can't pin
> it down yet. The problem eventually clears itself (for a while)...
>
> I'd be interested in hearing anything you find to either confirm
> or refute the possibility that it's the same problem.



I don't believe that's the problem in our situation.  If you look at  
the packet traces each and every packet the servers send to each other  
makes it, ie: TCP ACk's.   The timeout comes quick right at a time  
when the servers are actually talking back and forth.  If it were a  
loss of ARP of any other network problem, I think you'd see one server  
send packets and the other not receive them.  Though YOUR problem on  
FreeBSD sounds interesting; I've not see that one yet.

So the mystery on why they actually disconnect is unknown still.  In  
our traces the network level looks fine, it looks like the application  
just thinks the other side times out and closes the connection, within  
a second or two. (so that'd be a pretty fast timeout!)  Definitely  
looks application level.

The second part of the question also remains... why do they never try  
to reconnect?  I think there is some internal bug that's breaking the  
session and making failover go away all together.

-- 
Robert Blayzor, BOFH
INOC, LLC
rblayzor at inoc.net
http://www.inoc.net/~rblayzor/






More information about the dhcp-users mailing list