what should bind do after receiving a SERVFAIL

Tue Jun 17 19:02:16 UTC 2008

The two servers you're looking at are misbehaving, most likely due to  
a software bug. Here's what dig reports (minus the irrelevant parts) -  
the response is the same regardless of server:

$ dig www.deltapoint.be +norec @ns4.combell.net
;; Warning: Message parser reports malformed message packet.
;; Truncated, retrying in TCP mode.
[...]
;; QUESTION SECTION:
;www.deltapoint.be.		IN	A

;; ANSWER SECTION:
www.deltapoint.be.	3600	IN	CNAME	virtualhosting.brightsites.be.
virtualhosting.brightsites.be. 3600 IN	CNAME	virtualhosting.newlink.cz.

I haven't done a packet capture to see just what's malformed about the  
UDP response, but from what you describe, it sounds like the auth  
servers are sending a SERVFAIL response that also contains CNAME  
records. That's just bizarre. The rcode should be NOERROR, with the  
dangling CNAMEs contained in a referral.

I'm not surprised that BIND reacts badly to this. It probably believes  
the SERVFAIL rcode and considers that server to be lame. After getting  
this response from both auth servers, it gives up, having no other  
auth servers to query. However, if you manually query for CNAME  
records, you get the first CNAME record in your cache. Then when you  
ask your server for the A record for that name, it re-queries these  
misbehaving servers for the second CNAME - which they return without  
incident - and then goes and finds the final address record.

I suppose the CNAME->CNAME with SERVFAIL response could be caused by a  
name server reacting badly to this bad configuration (CNAME chains are  
technically against the rules), but RFC 1034 (or 1035) states that a  
name server should tolerate this anyway. So even though the records  
are against RFC, the name server's misbehavior is also against RFC.

Chris Buxton
Professional Services
Men & Mice

On Jun 17, 2008, at 11:46 AM, Holemans Wim wrote:

> we have a problem reaching a domain www.deltapoint.be, which is a
> webserver hosted by Combell. It seems there is something wrong with  
> the
> nameresolution, but i can't figure out if it is our nameserver (bind
> 9.2.4) or the authoritive server that is doing something wrong.
> The record www.deltapoint.be is a cname.
> the NS records for deltapoint.be point to ns3.combell.net and
> ns4.combell.net
> If we use host or nslookup or dig without a TYPE option, the lookup
> fails. If we specify the type=cname option, the query succeeds and the
> entry is put into the cache and the host is 'known' to our users.
>
> I used dig on our nameserver and nslookup on windows (with a packet
> capture) and server=ns3.combell.net and found the following :
> if i don't specify a type or set type=all, the combell server responds
> with a SERVFAIL error but also contains the relevant CNAME  
> information.
> It seems as if bind sees the SERVFAIL info and stops the query,  
> ignoring
> the data in the RR records sent along.
>
> I did some google searches and looked at the Bind mailing list but  
> can't
> figure out what the expected behaviour should be. Should bind ignore  
> the
> SERVFAIL warning and use the extra info in the data to continue his
> queries or is the responding nameserver making an error by sending a
> SERVFAIL errorcode along the respons ?
>
> Is there an way to instruct bind to ignore these SERVFAIL messages if
> the message also contains extra RRs that contain useful information ?
>
> Greetings,
>
>
>
> Wim Holemans
>
> NetworkServices Universiteit Antwerpen
>
>
>
>
>