BIND 9.2.4rc8 Multithreading on Win32

Tue Sep 14 04:35:33 UTC 2004

At 09:46 PM 9/12/2004, Vinny Abello wrote:
>At 12:21 PM 9/12/2004, Danny Mayer wrote:
>>At 10:42 AM 9/11/2004, Vinny Abello wrote:
>>>At 11:29 PM 9/10/2004, you wrote:
>>>>>OK... The unfortunate part is any RBL that I serve secondary for works 
>>>>>in either one of two ways. The first is AXFR. I don't have control over this.
>>>>
>>>>You can't get them to run BIND 9 and use IXFR?
>>>
>>>It's a free service (sbl.spamhaus.org). I never even contacted them. 
>>>They allow anyone to do zone transfers and they are not IXFR (based on 
>>>the information I see in the logs anyway) so I don't believe I have any 
>>>say in it, unfortunately. I could try to reach out to them to find out.
>>
>>The IXFR is initiated by the client, not the server. If the server is 
>>unable to
>>handle IXFR the client falls back to AXFR. If they're running BIND 9 there
>>should be no problem.
>
>If the server isn't lying, they're running the same version that I am now. 
>9.3.0rc4. I was assuming it was AXFR because I didn't see mention of IXFR 
>in the logs... but now that I'm looking at it. I don't see either. Is 
>there an easy way to tell from the slave side what's happening? I don't 
>see much reference to what the zone transfer types are in the logs unless 
>they're from my server to a slave.

If it's running 9.3.0 then it certainly can handle IXFR. You need to specify
it on your side and they need to allow it on their side. See section 6.2.18
of the ARM. You need to specify request-ixfr for the zone and they need
provide-ixfr for the zone. It's to their benefit to do this as it lessens 
the load
on the server.

>>>>>  The second way is rsync where it's reloaded in a script after being 
>>>>> transferred. Both these methods result in BIND not responding to 
>>>>> queries for a period of time. I've seen that this is an issue that's 
>>>>> gone back as far as RBL zones have existed and people have been 
>>>>> trying to use them in BIND. It seems a lot of people use alternate 
>>>>> programs that handle this better. I just can't understand after all 
>>>>> these years why BIND is unable to both load a large file and continue 
>>>>> to respond to queries. MSDNS handles this just fine as does other DNS 
>>>>> software, but I prefer BIND of course. :)
>>>>
>>>>Multithreading and multiple CPU's largely solves this. I don't know 
>>>>what you
>>>>are seeing so it's hard to answer. I had set it up to have one more worker
>>>>thread than CPU's (n+1) to allow for situations like this.
>>>
>>>It doesn't seem to work like that at all unless you have more than one 
>>>CPU and in certain situations only.
>>
>>Just to clarify, what I said about worker threads only applies to the I/O not
>>the tasks that needs to be managed. I did check the code and multithreading
>>is enabled on Windows so the task manager should be using more than
>>one thread to handle the zone transfer and handle queries. This should
>>be okay on a multi-CPU system at least. There may be a bug in the task
>>code but that's much harder to figure out.
>
>Yes, it seems fine on multi-CPU systems as normal large zone transfers 
>don't seem to cause any interruption at all on my multi-processor system. 
>Just my two single processor ones. Yet you said it was n+1. Oddly enough 
>on the 2 (4 logical) processor system, it doesn't display this problem so 
>there it would seem there would be 5 worker threads for I/O. On the single 
>processor ones it would 2 if I understand correctly. Despite the two on 
>the single processor system (the default if I'm understanding right) it 
>still stops responding to queries. As I think you are trying to explain, 
>this may have nothing to do with worker threads for I/O, but what else 
>would be different between the multiprocessor system and the single 
>processor systems as far as BIND goes that might affect this?

You won't get much of a benefit on single-CPU systems as you can't run threads
simultaneously.

I will leave it to Mark to comment on whether or not what you are seeing is
normal on a single-CPU system.
>>>My synopsis is that when doing large AXFR zone transfers, multiple CPUs 
>>>(or worker threads) helps in keeping BIND responding to queries. 
>>>However, if a reload or reconfig is done via rndc that causes BIND to 
>>>load a large zone, this does not apply and it will still stop responding 
>>>to queries. That is basically what I have observed, again, even with 
>>>multiple worker threads. Is there a reason for this or is this a 
>>>flaw/bug? And why does this happen even with zone transfers on a single 
>>>CPU server when it's supposed to be doing n+1 worker threads?
>>
>>It's possible that there is a bug but not in those worker threads which don't
>>deal with file I/O.
>
>It would sound that way... I was just curious if it was limited to Win32 
>platforms or not seeing as I'm sure there are many BIND servers (mostly 
>running on some *nix variant) that load large zones. I thought I've heard 
>this problem existed just within BIND itself based on research I've done 
>and combing through the list. That's why I was wondering how the GTLD 
>servers reload such huge zones without downtime in answering queries... 
>seeing that in my experience (although Windows and *nix are very 
>different) BIND stops responding to queries when reloading a very large 
>zone from a file it is master for.

This is no different on Windows than it is on Unix running a multithreaded
server.

Danny

>By the way, thanks for all your detailed information and time in 
>responding to these questions, Danny! :)
>
>Vinny Abello
>Network Engineer
>Server Management
>vinny at tellurian.com
>(973)300-9211 x 125
>(973)940-6125 (Direct)
>PGP Key Fingerprint: 3BC5 9A48 FC78 03D3 82E0  E935 5325 FBCB 0100 977A
>
>Tellurian Networks - The Ultimate Internet Connection
>http://www.tellurian.com (888)TELLURIAN
>
>There are 10 kinds of people in the world. Those who understand binary and 
>those that don't.
>