how to log all recursive query responses?
Kevin Darcy
kcd at chrysler.com
Fri Aug 8 22:44:14 UTC 2008
David Sparks wrote:
> Does the above log the responses or just the queries?
>
> I'm trying to debug why two 1000qps BIND servers side by side are giving out
> different (cached?) results (one SERVFAIL, one correct answer) from a close
> (one Internet hop but in the same data centre) rbldnsd server. The SERVFAIL
> is incorrect and I can't figure out how named got things wrong in the first place.
>
> The incorrect SERVFAIL also seems to be cached but I can't see anything about
> the query from rndc dumpdb output.
>
> rndc dumpdb -cache shows that the server with the correct answer has cached
> values. What I don't understand is why the named that doesn't have a cached
> answer doesn't resolve the query, instead it returns SERVFAIL immediately?
>
> This only happens after named has been running hard for several days. I've
> pasted an example below, ns1 gets SERVFAIL and ns2 gets the proper answer.
>
> daves at sentinel ~ $ host -v -t a X.X.X.213.fur.ca1.sophosxl.com. ns1
> Trying "X.X.X.213.fur.ca1.sophosxl.com"
> Received 49 bytes from 10.99.159.11#53 in 89 ms
> Trying "X.X.X.213.fur.ca1.sophosxl.com"
> Using domain server:
> Name: ns1
> Address: 10.99.159.11#53
> Aliases:
>
> Host X.X.X.213.fur.ca1.sophosxl.com not found: 2(SERVFAIL)
> Received 49 bytes from 10.99.159.11#53 in 88 ms
>
>
> daves at sentinel ~ $ host -v -t a X.X.X.213.fur.ca1.sophosxl.com. ns2
> Trying "X.X.X.213.fur.ca1.sophosxl.com"
> Using domain server:
> Name: ns2
> Address: 10.99.159.12#53
> Aliases:
>
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36177
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1
>
> ;; QUESTION SECTION:
> ;X.X.X.213.fur.ca1.sophosxl.com. IN A
>
> ;; ANSWER SECTION:
> X.X.X.213.fur.ca1.sophosxl.com. 2100 IN A 127.0.4.2
>
> ;; AUTHORITY SECTION:
> fur.ca1.sophosxl.com. 352 IN NS ca1.sophosxl.com.
>
> ;; ADDITIONAL SECTION:
> ca1.sophosxl.com. 569 IN A 209.17.179.166
>
> Received 95 bytes from 10.99.159.12#53 in 26 ms
>
>
>> If you want to capture the contents of the actual *packets* that named
>> is generating, I'd recommend a packet capture utility such as "tcpdump".
>> It's not too hard to restrict the captures to responses only, where the
>> RD flag in the header is set to 1 (indicating that the original query
>> was recursive). For the PC platform, there's also WireShark, but to be
>> honest, I haven't played much with its filtering capabilities.
>>
>
> I'm not sure how to filter on the RD flag? Will this filter be sufficient or
> do I also need the query packet to figure out what happened?:
>
> tcpdump -s 1024 src port 53 and not src host ns1
>
>
Ugh, this is a bit of a difficult problem, especially if you're at the
1000qps level (lots of data to wade through, eyeballing is not really an
option).
Looking at the data between the client and the BIND server is probably
not going to be very useful, you'll just see a question come in, and, at
some point, a SERVFAIL response going back.
To get to the root cause, you'll probably want to look at the data
passing back and forth between the BIND boxes and rbldnsd, to pinpoint
why BIND caches a SERVFAIL in the first place. Is it a timeout? Is it a
SERVFAIL response from rbldnsd? Something else?
If there is a *specific* name you want to focus on, it's possible to do
that with tcpdump, but it's rather painful, e.g.
tcpdump -v -x udp and port 53 and 'udp[20] == 3' and 'udp[21] == 102'
and 'udp[22] == 111' and 'udp[23] == 111'
would limit the capture to only packets with a Question Section
containing a first label of "foo" (3 is the label size, 102 is the ASCII
code for "f", 111 is the ASCII code for "o"). The Question Section is
copied from the original query to the response, so this should catch
responses too.
If, on the other hand, you're trying to answer the question "why do I
get a SERVFAIL, some of the time, for some names, seemingly at random?",
then I don't know that a targeted tcpdump is going to help. You might
have to capture *everything*, detect the error, and then wade through
the data later.
- Kevin
More information about the bind-users
mailing list