Help with unresolvable domain (subdomain, actually)

Wed Mar 2 19:18:56 UTC 2011

On 3/2/2011 1:57 PM, David Sparro wrote:
> On 3/2/2011 1:20 PM, Kevin Darcy wrote:
>
>> I'm not saying I agree with this perspective, only that I've dealt with
>> load-balancer vendors enough (Cisco in particular) to understand that
>> this is where they're coming from.
>>
>> Besides, what alternative is there? If the load-balancer returns an
>> address that it knows to not be working, then it's purposely causing the
>> client to go into a relatively-slow connection-timeout failure mode. Is
>> that responsible behavior?
>
> Short answer: yes.  The DNS side of the load-balancer has does't know 
> why it got the query.  Maybe I was trying to ping the endpoint, I 
> could have been trying to make an FTP connection, or HTTPS, etc.  In 
> order for it to be consistent, it would have to be able to figure out 
> that a SERVFAIL should be returned for the query from  my gopher:// 
> connection, but an IP should be returned for http://.
That's an implementation decision. If an implementor decides to run a 
bunch of disparate services under a single FQDN (as opposed to, say, 
www.example.com/ftp.example.com/gopher.example.com and so forth), then 
they'd need to come up with a reasonable way with their load-balancer 
keepalives to decide when the whole thing is "down" or not. If the vast 
majority of their traffic is web-based (typical), they may choose to 
call the whole thing "down" if the web part is down, and the other parts 
(FTP, gopher, whatever) will just have to suffer. That's the price to be 
paid for the convenience of having a single name for a bunch of 
different services -- lack of granularity.

Things would be better, of course, if clients used SRV records for 
accessing resources -- then a single "service" name could be 
differentiated by protocol. But for whatever reason client software 
authors have not, by and large, embraced this idea.
>
>> If it gives a "normal" response that is
>> lacking answer information (NODATA, NXDOMAIN), then this response gets
>> negatively cached, and the negative cache entry may delay clients from
>> re-trying the resource even after it recovers. So, what's left? NOTIMP?
>> FORMERR? REFUSED? NOTAUTH? Those aren't any better than SERVFAIL from a
>> strictly functional perspective, and are even more misleading and
>> confusing with respect to the real source of the problem.
>
> SERVFAIL caching is coming to a BIND server release this year.  (I 
> listened to the BIND 9.8 features webinar this morning.  I don't 
> remember which version (9.9 or 9.10) had this attached to it on the 
> What's Next slide.)
>
I think Mark has the right approach: return a "special" address (e.g. 
0.0.0.0 or the IPv6 equivalent) in this situation, instead of messing 
with the RCODE.

                                                                         
                                                                         
                     - Kevin