Help with unresolvable domain (subdomain, actually)

Wed Mar 2 18:49:09 UTC 2011

On Mar 2, 2011, at 1:20 PM, Kevin Darcy wrote:

> On 3/2/2011 10:34 AM, David Sparro wrote:
>>
>>
>> On 3/1/2011 5:27 PM, Kevin Darcy wrote:
>>> See my other post. This is designed-in behavior for Cisco GSSes,  
>>> since
>>> there is no "service unavailable, try again later" RCODE.
>>>
>>
>> When the question is "what is the ip address of 'foo'" an answer of  
>> "the web server is down" in nonsensical.
>>
> Hmmm... matter of perspective I suppose. Load-balancer architecture  
> sees DNS as just the externally-visible portion of a whole  
> subsystem. The SERVFAIL, in their view, does not communicate a DNS  
> problem _per_se_, but a problem with the whole subsystem. It's more  
> of a "what you're trying to get to is unavailable right now"  
> message, communicated, in their view, _through_ DNS (as a sort of  
> conduit), not necessarily _about_ DNS. They don't see it as  
> specifically meaning "I've got a DNS problem".

But, everyone else *will*.

>
> I'm not saying I agree with this perspective, only that I've dealt  
> with load-balancer vendors enough (Cisco in particular) to  
> understand that this is where they're coming from.
>
> Besides, what alternative is there? If the load-balancer returns an  
> address that it knows to not be working, then it's purposely causing  
> the client to go into a relatively-slow connection-timeout failure  
> mode. Is that responsible behavior? If it gives a "normal" response  
> that is lacking answer information (NODATA, NXDOMAIN), then this  
> response gets negatively cached, and the negative cache entry may  
> delay clients from re-trying the resource even after it recovers.
> So, what's left? NOTIMP? FORMERR? REFUSED? NOTAUTH? Those aren't any  
> better than SERVFAIL from a strictly functional perspective, and are  
> even more misleading and confusing with respect to the real source  
> of the problem.

A few options:
1: once the LB knows that all back-ends are down, it can continue to  
answer with the correct A, but drop the TTL to be much shorter -- this  
allows things to recover faster.
2: have the LB itself serve a 'sorry' page -- the ability to serve  
static content locally should be simple, but if it not able to do so  
it can always return a set of 'sorry' servers optimized for this  
purpose.

You shouldn't be breaking both your serving *and* 'sorry' backends  
often enough for there to be special handling needed (and, if you are,  
you shouldn't make things worse by making other folk waste their time  
debugging your problem).

W

>
>                                                                                                                                                                                                                    - Kevin
>
>
> _______________________________________________
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
>

-- 
I had no shoes and wept.  Then I met a man who had no feet.  So I  
said, "Hey man, got any shoes you're not using?"