BIND9 negative cache after timeout.

Sun Aug 10 04:30:56 UTC 2003

Jan,

As I have sent earlier, I 100% agree with you, this should be considered
and I believe it will have more positive impact than negative if
implemented in bind.
I personally had a hard time with this situation and could not find an
easy solution. We are also ISP and some time back we had a lot of users
inflected by Yaha-H virus, there was no spoofing, and there were all our
valid paying customers. It was around 20-40 thousands of users what I
have estimated from the traffic that had the virus that time and growing
every day. The virus on all the clients pcs was trying to continuously
resolve one domain name to flood it later, and remote nameservers (all)
timeout, of course since I believe whole world was flooding them. And if
the virus did not get any answer from the our recursive dns servers it
immediately retry and retry forever, thus putting incredible load on our
recursive servers, which were retrying forever with the remote
unreachable dns. If timeout caching is implemented this will 100% help,
and anti-spoofing or recursion acl restriction in this case are useless.
I have raised this issue that time here, and  I got answers that bind 9
copes this in a better way and the recursive and authoritative requests
are being separated into different queues in fact authoritative requests
do not have any queue.

    There is powerdns which implements something called query
throttling, which should give you more control over such repetitive
queries. You can use some smart l4-7 balancers which can analyze the
traffic on dns application level and blackhole requests for particular
domain, not letting it reach the nameservers, you can become
authoritative for such a domain sending the clients back the reply
without doing recursion, you can mark bogus this nameservers and bind
will stop retrying to them. I have tried to decrease the
recursive-clients limit but it did not  take the effect, and I did not
test it later. I believe the way bind handles this is fine for
traditional client or server but it should be able to survive even for
the worse one (being it set up purposely or not). Dns people should not
say I can not do anything, call the network guys :-).

I am sure bind developers has a long list of to-do things, but I believe
this feature will really enhance the recursive service robustness.

Ladislav

Jan Gyselinck wrote:

>On Tue, Aug 05, 2003 at 02:49:40PM +0100, Simon Waters wrote:
>  
>
>>-----BEGIN PGP SIGNED MESSAGE-----
>>Hash: SHA1
>>
>>Jan Gyselinck wrote:
>>    
>>
>>>On Thu, Jul 03, 2003 at 09:47:21AM +1000, Mark_Andrews at isc.org wrote:
>>>
>>>      
>>>
>>>>	And it is also a easy one to prevent.  Don't have a wide
>>>>	open caching server.  Apply anti-spoofing filters at the
>>>>	IP level.
>>>>        
>>>>
>>>It helps somewhat, but that's not preventing the problem.
>>>You don't need a wide open resolver to get this.  Enough
>>>customers that use the resolver are enough to hit this often
>>>enough too.
>>>      
>>>
>>Mark is addressing the question of deliberate attack. Ultimately you can
>>make any service unusuable if it doesn't restrict your clients (or
>>others) ability to use it.
>>    
>>
>
>And I'm arguing that you don't need a deliberate attack to get this
>kind of problems.  Restricting who can use your nameserver isn't
>necessairily enough to solve those problems.  
> 
>  
>
>>Have you tried upping the "recursive client" limit to a value more
>>suitable for your number of clients?
>>    
>>
>
>Yes, but that means you'll loose lots of memory on that too.  And it
>still doesn't solve the problem, it does not scale.  You know, some
>clients like to use the available resolvers to process their 
>I-don't-know-what logs and get all IP's in there resolved to hostnames.
>Ofcourse there are ways to do that that don't stress ISP resolvers,
>but ofcourse you'll always find clueless people on that subject.
>So it's a fact of life you will get repetitive queries, and sometimes
>a lot more than you would have wished.  It's not that rare to encounter
>multiple non-resolving IP's in such logs.  And also, because not all
>such scripts do caching, you can be sure if there's 5000 times in a short
>time a certain unresolvable IP in there, you lost 5000 "recursive client"
>slots on one customer (as it takes time before bind realizes that
>the authorative nameservers don't reply) and in reality it's probably
>even worse, as stub resolvers tend to retry when an answer doesn't
>come back in time.  
>
>  
>
>>>there are lots of stubresolvers out there that keep
>>>querying for the same name if it doesn't resolve (ServFail and
>>>friends).
>>>      
>>>
>>If you can identify it log a bug report for the application re: RFC1123.
>>    
>>
>
>That's a solution.  Another solution would be to have the customer run
>his own caching resolver.  All very nice, but a) you can't fix all
>defective applications and b) you can't fix all your customers.  It
>would be nice if we could fix c) our caching resolvers to handle this
>gracefully.
>
>
>Jan Gyselinck
>
>  
>