Warning: ID mismatch:

Maria Iano maria at iano.org
Mon Sep 13 16:56:04 UTC 2004


I agree with you on this - the ID mismatch error was a red herring. I'm won=
dering now if there is some issue with unexpectedly high memory use.

I am still experiencing this issue, now almost daily during the work week. =
At the times of day when we get the most lookups (lunchtime when everyone s=
tarts surfing) one or the other of the servers stops responding to queries.=
 In the debugging it looks like when this happens, it receives queries, and=
 forwards them to the outside resolver, but doesn't recognize the reply fro=
m the outside server. I can see the packets returning from the outside serv=
er. The broken piece seems to occur at that point.

One thing I have noticed about these inside resolvers is that they are runn=
ing at about 100% memory use (1 Gb of RAM on each) at all times. Everything=
 else in the system is fine. The load reports as 0. Things like UDP socket =
use, and all sorts of data from sar, are all fine. The outside resolvers th=
at they forward to are identical builds on identical hardware, yet they run=
 at about 50% memory use. The outside resolvers are also used by about 180 =
other locations, and get at least 10 times the number of queries, yet they =
are the ones doing fine.

There are really two differences between the servers that are fine (the out=
side ones), and the servers that keep ceasing to resolve (the inside ones).=
 The outside ones resolve queries in the usual iterative way. The inside on=
es resolver queries by forwarding to other servers. The other difference is=
 that the inside servers get a lot of reverse 1918 queries which forward to=
 other internal (Windows) servers, and those servers sometimes don't answer=
. In fact those servers sometimes forward the queries back out, but thankfu=
lly I don't see a loop occurring, so the inside resolvers seem smart enough=
 to drop thing there. I am about to get this issue fixed, in that the Windo=
ws servers are about to be told they own all of that space, it has just tak=
en a week to get the process accomplished for this to happen. Last week I a=
lso created a lot of dummy zones for the reverse space on our inside resolv=
ers, so the servers could answer right away. I'm not convinced that will fi=
x the issue anyway.

I am trying to determine why the inside servers run at 100% while the outsi=
de servers run at about 50% memory usage. I'm also building an updated repl=
acement for one of the inside resolvers to use fedora in place of RH8, and =
to no longer use the grsecurity patch, to see if that helps.

Thanks,
Maria

-----Original Message-----
From: bind-users-bounce at isc.org [mailto:bind-users-bounce at isc.org] On
Behalf Of Ladislav Vobr
Sent: Friday, September 10, 2004 7:39 PM
Cc: BIND Users Mailing List
Subject: Re: Warning: ID mismatch:

sometimes, when you try to query unreachable domains, you recursive
servers tries to retry several times to all of the remote name severs
and  most of the time there is no reply from your caching servers before
the dig time-out, sometimes there is a SERVFAIL reply later than the
time-out.

so if you repeat the dig command, several times for the same domain, you
might get the first reply for the second dig you have issued, thus
seeing this message (ID Mismatch) and it is perfectly valid, but came in
the wrong time :-). Nothing wrong with your firewall or server itself.

So you have to think little bit about the situation :-) I remember using
nslookup once and it is so stupid, it doesn't even check the source ip
address in the reply packets, I was troubleshooting it through the
firewall, with misconfigured NAT and nslookup keeps working even when
the reply came from different ip :-) than you sent it. (But the server
obviously not :-) Somebody did really poor job with nslookup. But this
is different story :-)

Ladislav


Maria Iano wrote:
> This same issue is recurring! This time it is on res1 again. res1 has
address 172.21.0.100 and res2 has address 172.21.0.200. Below I have
pasted in the series of dig commands I ran on res2 sending queries to
res1. Below that I have pasted in the tethereal output during those
commands.
>=20
> Since this issue seems to only be a problem for data which isn't
cached, I wonder if there is any connection with the thread with subject
'Weird named act!'. So I also issued this command suggested in that
thread:
>=20
> res1 in:  bind$ ps -flp 24708
> Warning: /boot/System.map has an incorrect kernel version.
>   F S UID        PID  PPID  C PRI  NI ADDR    SZ  WCHAN STIME TTY
TIME CMD
> 140 S bind     24708     1  0  74   0    -  3596 14372d Sep07 ?
00:00:55 [named]
>=20
> This server has a non-modular kernel with the grsecurity patch. In
case it's relevant here is the output of uname -a:=20
> res1 in:  bind$ uname -a
> Linux ent-mocux15.moc.gci 2.4.20-grsec #3 Tue Mar 25 09:21:41 EST 2003
i686 i686 i386 GNU/Linux
>=20
> Thanks in advance for any help!
> Maria
>=20
> ###################################################
> Commands issued on res2
> ###################################################
>=20
> res2 in:  bind$ dig @res1.moc.gci www.silver.com
>=20
> ; <<>> DiG 9.2.3 <<>> @res1.moc.gci www.silver.com
> ;; global options:  printcmd
> ;; connection timed out; no servers could be reached
> res2 in:  bind$ dig @res1.moc.gci www.silver.com
> ;; Warning: ID mismatch: expected ID 56696, got 10590
> ;; Warning: ID mismatch: expected ID 56696, got 10590
>=20
> ; <<>> DiG 9.2.3 <<>> @res1.moc.gci www.silver.com
> ;; global options:  printcmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56696
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 0
>=20
> ;; QUESTION SECTION:
> ;www.silver.com.                        IN      A
>=20
> ;; ANSWER SECTION:
> www.silver.com.         86400   IN      A       205.150.176.184
>=20
> ;; AUTHORITY SECTION:
> silver.com.             259200  IN      NS      ns1.ktrafic.com.
> silver.com.             259200  IN      NS      ns2.ktrafic.com.
>=20
> ;; Query time: 2716 msec
> ;; SERVER: 172.21.0.100#53(res1.moc.gci)
> ;; WHEN: Wed Sep  8 12:19:43 2004
> ;; MSG SIZE  rcvd: 92
>=20
> res2 in:  bind$ dig @res1.moc.gci www.gold.com
>=20
> ; <<>> DiG 9.2.3 <<>> @res1.moc.gci www.gold.com
> ;; global options:  printcmd
> ;; connection timed out; no servers could be reached
> res2 in:  bind$ dig @res1.moc.gci www.gold.com
>=20
> ; <<>> DiG 9.2.3 <<>> @res1.moc.gci www.gold.com
> ;; global options:  printcmd
> ;; connection timed out; no servers could be reached
> res2 in:  bind$ dig @res1.moc.gci www.gold.com
>=20
> ; <<>> DiG 9.2.3 <<>> @res1.moc.gci www.gold.com
> ;; global options:  printcmd
> ;; connection timed out; no servers could be reached
> res2 in:  bind$ dig @res1.moc.gci www.purple.com
> ;; Warning: ID mismatch: expected ID 58216, got 51960
> ;; Warning: ID mismatch: expected ID 58216, got 51960
> ;; Warning: ID mismatch: expected ID 58216, got 36737
> ;; Warning: ID mismatch: expected ID 58216, got 36737
> ;; Warning: ID mismatch: expected ID 58216, got 20208
> ;; Warning: ID mismatch: expected ID 58216, got 20208
>=20
> ; <<>> DiG 9.2.3 <<>> @res1.moc.gci www.purple.com
> ;; global options:  printcmd
> ;; connection timed out; no servers could be reached
> res2 in:  bind$ dig @res1.moc.gci www.gold.com
>=20
> ; <<>> DiG 9.2.3 <<>> @res1.moc.gci www.gold.com
> ;; global options:  printcmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46790
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 2, ADDITIONAL: 0
>=20
> ;; QUESTION SECTION:
> ;www.gold.com.                  IN      A
>=20
> ;; ANSWER SECTION:
> www.gold.com.           86313   IN      CNAME   gold.com.
> gold.com.               86311   IN      A       198.70.201.51
>=20
> ;; AUTHORITY SECTION:
> gold.com.               86311   IN      NS      extns1.jewels.com.
> gold.com.               86311   IN      NS      extns2.jewels.com.
>=20
> ;; Query time: 1 msec
> ;; SERVER: 172.21.0.100#53(res1.moc.gci)
> ;; WHEN: Wed Sep  8 12:21:41 2004
> ;; MSG SIZE  rcvd: 109
>=20
> <performed rndc flush on res1>
>=20
> res2 in:  bind$ dig @res1.moc.gci www.gold.com
>=20
> ; <<>> DiG 9.2.3 <<>> @res1.moc.gci www.gold.com
> ;; global options:  printcmd
> ;; connection timed out; no servers could be reached
>=20
> ###################################################
> Output of tethereal during those commands
> ###################################################
>=20
>   0.000000 172.21.0.200 -> 172.21.0.100 DNS Standard query A
www.blue.com
>   0.000124 172.21.0.100 -> 172.21.0.200 DNS Standard query response
CNAME blue.com A 216.91.187.86
>   4.991126 Ibm_7b:a6:69 -> Ibm_7b:a4:a3 ARP Who has 172.21.0.200?
Tell 172.21.0.100
>   4.991493 Ibm_7b:a4:a3 -> Ibm_7b:a6:69 ARP 172.21.0.200 is at
00:02:55:7b:a4:a3
>   6.320441 172.21.0.200 -> 172.21.0.100 DNS Standard query A
www.silver.com
>  11.318427 Ibm_7b:a4:a3 -> Ibm_7b:a6:69 ARP Who has 172.21.0.100?
Tell 172.21.0.200
>  11.318438 Ibm_7b:a6:69 -> Ibm_7b:a4:a3 ARP 172.21.0.100 is at
00:02:55:7b:a6:69
>  11.328548 172.21.0.200 -> 172.21.0.100 DNS Standard query A
www.silver.com
>  24.820791 172.21.0.200 -> 172.21.0.100 DNS Standard query A
www.silver.com
>  27.536065 172.21.0.100 -> 172.21.0.200 DNS Standard query response A
205.150.176.184
>  27.536121 172.21.0.100 -> 172.21.0.200 DNS Standard query response A
205.150.176.184
>  27.536184 172.21.0.100 -> 172.21.0.200 DNS Standard query response A
205.150.176.184
>  36.446784 172.21.0.200 -> 172.21.0.100 DNS Standard query A
www.gold.com
>  41.449517 172.21.0.200 -> 172.21.0.100 DNS Standard query A
www.gold.com
>  49.777125 172.21.0.200 -> 172.21.0.100 DNS Standard query A
www.gold.com
>  54.769991 Ibm_7b:a4:a3 -> Ibm_7b:a6:69 ARP Who has 172.21.0.100?
Tell 172.21.0.200
>  54.770002 Ibm_7b:a6:69 -> Ibm_7b:a4:a3 ARP 172.21.0.100 is at
00:02:55:7b:a6:69
>  54.779985 172.21.0.200 -> 172.21.0.100 DNS Standard query A
www.gold.com
>  61.418983 172.21.0.200 -> 172.21.0.100 DNS Standard query A
www.gold.com
>  66.420344 172.21.0.200 -> 172.21.0.100 DNS Standard query A
www.gold.com
>  76.502267 172.21.0.200 -> 172.21.0.100 DNS Standard query A
www.purple.com
>  77.687081 172.21.0.100 -> 172.21.0.200 DNS Standard query response
CNAME gold.com A 198.70.201.51
>  77.687142 172.21.0.100 -> 172.21.0.200 DNS Standard query response
CNAME gold.com A 198.70.201.51
>  77.687208 172.21.0.100 -> 172.21.0.200 DNS Standard query response
CNAME gold.com A 198.70.201.51
>  77.687263 172.21.0.100 -> 172.21.0.200 DNS Standard query response
CNAME gold.com A 198.70.201.51
>  77.687328 172.21.0.100 -> 172.21.0.200 DNS Standard query response
CNAME gold.com A 198.70.201.51
>  77.687382 172.21.0.100 -> 172.21.0.200 DNS Standard query response
CNAME gold.com A 198.70.201.51
>  81.510874 172.21.0.200 -> 172.21.0.100 DNS Standard query A
www.purple.com
>  82.684071 Ibm_7b:a6:69 -> Ibm_7b:a4:a3 ARP Who has 172.21.0.200?
Tell 172.21.0.100
>  82.684293 Ibm_7b:a4:a3 -> Ibm_7b:a6:69 ARP 172.21.0.200 is at
00:02:55:7b:a4:a3
>  96.508164 172.21.0.100 -> 172.21.0.200 DNS Standard query response A
153.104.63.227
>  96.508232 172.21.0.100 -> 172.21.0.200 DNS Standard query response A
153.104.63.227
>  96.508587 172.21.0.200 -> 172.21.0.100 ICMP Destination unreachable
>  96.508589 172.21.0.200 -> 172.21.0.100 ICMP Destination unreachable
> 101.501576 Ibm_7b:a4:a3 -> Ibm_7b:a6:69 ARP Who has 172.21.0.100?
Tell 172.21.0.200
> 101.501587 Ibm_7b:a6:69 -> Ibm_7b:a4:a3 ARP 172.21.0.100 is at
00:02:55:7b:a6:69
> 145.126659 172.21.0.200 -> 172.21.0.100 DNS Standard query A
www.gold.com
> 145.127129 172.21.0.100 -> 172.21.0.200 DNS Standard query response
CNAME gold.com A 198.70.201.51
> 150.123148 Ibm_7b:a4:a3 -> Ibm_7b:a6:69 ARP Who has 172.21.0.100?
Tell 172.21.0.200
> 150.123159 Ibm_7b:a6:69 -> Ibm_7b:a4:a3 ARP 172.21.0.100 is at
00:02:55:7b:a6:69
> =20
> <performed rndc flush on res1>
>=20
> 229.285189 172.21.0.200 -> 172.21.0.100 DNS Standard query A
www.gold.com
> 234.276056 Ibm_7b:a4:a3 -> Ibm_7b:a6:69 ARP Who has 172.21.0.100?
Tell 172.21.0.200
> 234.276067 Ibm_7b:a6:69 -> Ibm_7b:a4:a3 ARP 172.21.0.100 is at
00:02:55:7b:a6:69
> 234.286050 172.21.0.200 -> 172.21.0.100 DNS Standard query A
www.gold.com
> 269.304469 172.21.0.100 -> 172.21.0.200 DNS Standard query response
CNAME gold.com A 198.70.201.51
> 269.304526 172.21.0.100 -> 172.21.0.200 DNS Standard query response
CNAME gold.com A 198.70.201.51
> 269.304821 172.21.0.200 -> 172.21.0.100 ICMP Destination unreachable
> 269.304822 172.21.0.200 -> 172.21.0.100 ICMP Destination unreachable
> 274.297311 Ibm_7b:a4:a3 -> Ibm_7b:a6:69 ARP Who has 172.21.0.100?
Tell 172.21.0.200
> 274.297324 Ibm_7b:a6:69 -> Ibm_7b:a4:a3 ARP 172.21.0.100 is at
00:02:55:7b:a6:69
> On Wed, Sep 08, at 10:58%P so wrote Ladislav Vobr
(lvobr at ies.etisalat.ae):
>=20
>=20
>>Maria Iano wrote:
>>
>>>I have two caching servers, res1 and res2, running BIND 9.2.3 on Red
Hat Linux release 8.0 (Psyche). They sit inside a firewall, and forward
queries to four different caching servers on the outside, as well as
some internal servers authoritative for internal zones.=20
>>>
>>>Last week res2 starting being slow and failing resolution
intermittently. Dig queries sent from res2 to the outside resolvers
worked correctly. Dig queries sent from res2 to res1 worked correctly.
However, dig queries from res1 to res2 produced error messages like
this:
>>>
>>>;; Warning: ID mismatch: expected ID 3325, got 34596
>>>
>>>with various different IDs produced from different queries. It was
late at night (I had been paged) so I went ahead and rebooted res2. This
cleared up the issue.
>>>
>>>Now, a week later, this same issue is occurring on res1. res1 is slow
to respond to queries and intermittently failing to resolve names. digs
issued on res1 pointing to the outside resolvers work fine. Digs issued
on res1 pointing to res2 work fine. Digs issued on res2 pointing to res1
produce the ID mismatch errors again.
>>>
>>>I suspect that if I reboot it the error will clear up again, but
before I do that I want to try and work out what is going on.
>>>
>>>Any advice?
>>
>>You might possibly use a packetsniffer to see what you send and what=20
>>other side received and similiarly for the reply. On linux you can use

>>tcpdump or ethereal for example. I faced once these messages, when I
was=20
>>using query-source port 53 on my recursive nameserver, and I patched
dig=20
>>to use port 53 as a source port as well, than I got lot of these=20
>>everytime I issued such a command from the recursive server prompt,
but=20
>>it was understandable, since regular replies coming to my nameserver=20
>>confused dig.
>>
>>
>=20
>=20




----- End forwarded message -----


More information about the bind-users mailing list