BIND 9.2.4rc8 Multithreading on Win32

Sat Sep 11 14:42:00 UTC 2004

At 11:29 PM 9/10/2004, you wrote:
>At 11:54 PM 9/9/2004, Vinny Abello wrote:
>>>>What I'm trying to accomplish is to make BIND continue responding to 
>>>>queries if I try to load a large RBL zone file. Is this the correct 
>>>>direction to go in, or does BIND still have trouble with doing this? 
>>>>Would increasing the number of I/O Completion Port worker threads allow 
>>>>BIND to continue responding to queries while it's loading a zone file?
>>>
>>>
>>>In this case it might well be helpful since other worker threads are
>>>available to respond to queries while it's downloading. I highly recommend
>>>IXFR rather than AXFR for this zone.
>>
>>OK... The unfortunate part is any RBL that I serve secondary for works in 
>>either one of two ways. The first is AXFR. I don't have control over this.
>
>You can't get them to run BIND 9 and use IXFR?

It's a free service (sbl.spamhaus.org). I never even contacted them. They 
allow anyone to do zone transfers and they are not IXFR (based on the 
information I see in the logs anyway) so I don't believe I have any say in 
it, unfortunately. I could try to reach out to them to find out.

>>  The second way is rsync where it's reloaded in a script after being 
>> transferred. Both these methods result in BIND not responding to queries 
>> for a period of time. I've seen that this is an issue that's gone back 
>> as far as RBL zones have existed and people have been trying to use them 
>> in BIND. It seems a lot of people use alternate programs that handle 
>> this better. I just can't understand after all these years why BIND is 
>> unable to both load a large file and continue to respond to queries. 
>> MSDNS handles this just fine as does other DNS software, but I prefer 
>> BIND of course. :)
>
>Multithreading and multiple CPU's largely solves this. I don't know what you
>are seeing so it's hard to answer. I had set it up to have one more worker
>thread than CPU's (n+1) to allow for situations like this.

It doesn't seem to work like that at all unless you have more than one CPU 
and in certain situations only.

As far as the rsync RBL, what I am seeing is if I do a "rndc reload 
zonename" on the server after the rsync is done, my server stops responding 
to queries for a while and the CPU usage rockets on a single CPU (actually 
it bounces around from one to another over the span of time this happens). 
This is even on a machine with two hyperthreaded processors. (Windows 2003).

The zone is around 31MB in size. Even though BIND detects "found 4 CPUs, 
using 4 worker threads", whenever that zone is reloaded, I can query the 
server all I like and it does not reply even for zones it is master for. As 
long as I see the one CPU pegged, it will not respond (even though there 
are three other "processors" doing nothing). This is also on BIND 9.3.0rc4 
on Windows which I currently have upgraded to (I like some of the 
additional logging information and check-names and am reading up on other 
new features).

The other machines with a single processor I noted that when an AXFR zone 
transfer occurs, they also stop responding to queries for a brief amount of 
time, despite your n+1 worker thread design based on # of CPU's. That zone 
is a lot smaller (around 5MB) and I've never detected a problem on the 
machine with the two hyperthreaded processors having this issue when doing 
a zone transfer, only the ones with a single CPU, so that is kind of 
interesting.

My synopsis is that when doing large AXFR zone transfers, multiple CPUs (or 
worker threads) helps in keeping BIND responding to queries. However, if a 
reload or reconfig is done via rndc that causes BIND to load a large zone, 
this does not apply and it will still stop responding to queries. That is 
basically what I have observed, again, even with multiple worker threads. 
Is there a reason for this or is this a flaw/bug? And why does this happen 
even with zone transfers on a single CPU server when it's supposed to be 
doing n+1 worker threads?

>>I'm very curious how the GTLD servers (I assume running BIND) are able to 
>>deal with the huge .com, .net, etc. zones without this problem. Have they 
>>been modified to work better with these large zones? Or are they not 
>>running BIND?
>
>You should be asking in the newsgroup. Paul might tell you since he runs the
>F-root servers. I believe all of the GTLD servers run BIND. Paul knows those
>kinds of details.

Ooops. I thought I did cc the newsgroup. No big deal. :) This one will be 
going to the list/newsgroup. I think this is a valuable conversation.

Vinny Abello
Network Engineer
Server Management
vinny at tellurian.com
(973)300-9211 x 125
(973)940-6125 (Direct)
PGP Key Fingerprint: 3BC5 9A48 FC78 03D3 82E0  E935 5325 FBCB 0100 977A

Tellurian Networks - The Ultimate Internet Connection
http://www.tellurian.com (888)TELLURIAN

There are 10 kinds of people in the world. Those who understand binary and 
those that don't.