request drops in BIND?

Fri May 14 15:21:07 UTC 2004

Hi Paul,

Paul Vixie wrote:

>>But the problem here is, "what level of bulk requests that BIND starts
>>to not respond to queries either because of UDP stack overflow or
>>BIND's mistake?"
>>    
>>
>
>there is no level of bulk requests at which BIND8 will intentionally drop
>requests.  we drop requests only if we spend too much time (more than 3
>seconds) reloading a zone or performing some other long synchronous
>operation.  this is because after 3 seconds, the queries in the kernel
>socket buffer have likely been retransmitted by their clients, or have
>been tried on other servers, and a reply by us would only cause client
>kernels to generate ICMP-portunreach (because the resolver has moved on.)
>  
>
Yes. I saw that code. BIND drops requests after handling 500 requests in 
the udp buffer.
500 is roughly 32KB(~= 64*500), which makes sense when the buffer size 
is 32KB.
But what if the buffer is bigger? My linux stack says the default buffer 
size is 64KB.
You might end up dropping 500 requests silently.

I don't know the default retransmission timout from the 
server-side(maybe 2-3seconds),
but the default retransmission timout of BIND resolver in the 
client-side is five seconds.
(Although you can set it to other value, normal people don't even bother.)
So, it might hurt the response time to requests from clients.

One question. Does BIND use the same udp socket to handle remote server 
queries?
I think I saw some code to read data from the udp buffer and to tell if 
it's a request or response.
If the same buffer is shared to handle remote server queries(like 
queries to root servers/gTDL servers) as well
as the requests from clients, that would lead to buffer overflow even 
faster.
(300 bytes of response * 200 = 60K, so 300 such responses would easily 
overflow the buffer)

>  
>
>>Actually I tested sending 10000 requests/second with an equal
>>interval. (1 request/100 us).  BIND could handle about 2000
>>requests/second, while my UDP PING answered all perfectly.  All the
>>test I did is to do just one local domain name lookup.
>>    
>>
>
>can i know the operating system, CPU, and memory bandwidth which
>generated this result?  those numbers are lower than my results on a
>1GHz FreeBSD i386-style box with 400MByte/sec memory read bandwidth (as
>measured by lmbench.)
>  
>
It's on Linux 7.2, Compaq au600, 1GB. I don't know the memory read 
bandwidth of this machine.
My test scheme is to send that many requests at once, and wait and 
record the responses for 10 seconds.

One interesting behavior is that the first timee(I do this 10 times) 
gets always more responses
than the subsequent ones.  I don't know why at this point.

Here is the trace from my program to issue 300 requests at burst/sec.

1-steps lost : 28(9.333%), total lost : 28/300(9.333%)
2-steps lost : 121(40.333%), total lost : 149/600(24.833%)
3-steps lost : 122(40.667%), total lost : 271/900(30.111%)
4-steps lost : 122(40.667%), total lost : 393/1200(32.750%)
5-steps lost : 121(40.333%), total lost : 514/1500(34.267%)
6-steps lost : 122(40.667%), total lost : 636/1800(35.333%)
7-steps lost : 122(40.667%), total lost : 758/2100(36.095%)
8-steps lost : 121(40.333%), total lost : 879/2400(36.625%)
9-steps lost : 122(40.667%), total lost : 1001/2700(37.074%)
10-steps lost : 122(40.667%), total lost : 1123/3000(37.433%)
Recvd = 1877, avg responses per one time = 187.7

>  
>
>>P.S. I still believe BIND should increase the UDP buffer size for port 53.
>>       That will ease the situation a lot.
>>    
>>
>
>can you patch it to do so and tell us the results?  i think that it will
>change the granularity of your packet loss but not the total percentage.
>note that we chose 32K because that was a local maximum here.  if it's
>possible to set it even higher then we're willing to consider a patch.
>
Yes. I'll try it sometime after my paper deadline.

Thanks,
KyoungSoo