meaning of "xid mismatch"??

David W. Hankins David_Hankins at isc.org
Mon Feb 25 17:12:16 UTC 2008


On Mon, Feb 25, 2008 at 11:16:22AM -0500, Jeff A. Earickson wrote:
> bind update on [IP] got ack from colby: xid mismatch.
> 
> I googled and searched the MARC mailing list archives for
> a plain-english description of what this means (and what to
> do about it), but didn't find one.  So what does it really
> mean?

They're a reminder for future work, and it happens when the server
sends multiple updates for one lease over the failover wire before
receiving the earlier ack(s).

The 3.1.x failover changes were to track the latest failover draft,
one of the various things this meant was that you couldn't rely on
the peer to transmit a BNDACK with a potential expiration included
(3.0.x used to do this, in fact it would copy the entire lease state
in the ACK).

So in 3.0.x, we would accept any BNDACK made to multiple updates,
and just record the potential-expiry from each ack as we received
them.

Since in 3.1.x we couldn't rely on this, we had to track the 'xid'
on the BNDUPD we transmit, and make sure it matches, then use the
potential expiry that we transmitted.  You're seeing the consequent
mismatches, earlier updates.

At the time I was neck-deep in failover guts and I didn't know if the
condition was possible, and wasn't convinced it was safe, so I put a
log line there so we could evaluate it if it happens.

Since then we've seen it on our own systems and quite a few reports
from others.  It turns out to be quite possible, and so far has turned
out to be harmless.  We should probably remove the log line in maint,
but I'd like to see us come up with a different strategy in the long
run.

The only implication is that we /could/ have accepted a lower
potential expiry from the peer, but didn't.  We can either keep a
queue of pending xid's per lease (but this seems spammy), or I'm
thinking the better approach is if we try to update a lease that is
already in the ack queue, mark it so that when we receive an ack (and
record that potential expiry), we requeue it for an update.  Where the
number of updates is more than 2, this suppresses redundant updates as
well.

-- 
David W. Hankins	"If you don't do it right the first time,
Software Engineer		you'll just have to do it again."
Internet Systems Consortium, Inc.	-- Jack T. Hankins


More information about the dhcp-users mailing list