Bind 9.2.1 W2K server crashes repeatedly

Danny Mayer mayer at gis.net
Mon Nov 18 00:15:16 UTC 2002


At 01:20 PM 11/17/02, Scott MacLean wrote:

>Yesterday I migrated a production server running 4.9.5 to 9.2.1,
>running on a two-processor W2K box with a gig of RAM, serving
>approximately 200 domains. About 3 hours after it started, 9.2.1
>stopped answering queries. It still showed running in the services
>applet however. Running a restart failed:
>
>C:\>rndc start
>errno2result.c:61: unable to convert errno to isc_result: 10057:
>Socket is not connected
>rndc: send failed: connection refused

You can't start named from rndc. On Windows you can use
net start "ISC BIND" to start the service. Similarly rndc restart
won't work either. In any case, this response seems to indicate
that named was not listening on port 953 which is why you got
the rndc failure you did.  Check the the application log when
it starts to ensure that it is listening on port 953.

>I stopped BIND from the service applet and started it again, which
>solved the problem. Looking at the debug logs, I found this:
>
>Nov 15 19:22:43.130 general: critical: socket.c:1621:
>INSIST(!sock->pending_recv) failed
>Nov 15 19:22:43.130 general: critical: exiting (due to assertion
>failure)

That sounds like a possible threading problem. Since you are running
a two-processor system, it's possible that it may have a locking
problem. BIND 9 on Win32 is multithreaded.


>I checked the executables and DLL's, and all are the proper versions
>and dates.
>
>Two hours after I restarted it, the same thing happened again.
>
>Going back to 4.9.5 is not an option at this point. Would I be better
>off:
>
>- Writing some kludge application that keeps querying BIND, and if it
>doesn't get an answer, forcibly stops and restarts the service

IPSentry does a good job of this and can watch more than just a
nameserver.


>or
>
>- Upgrading to the 9.2.2rc1

You should do this anyway. There were some other bugs in 9.2.1
which got fixed in 9.2.2.

>Is this a known problem?

I've never seen a report of this specific error though I've run into frequently
during debugging.

>  Is there a workaround or fix?

It shouldn't be an issue in BIND 9.3.0 as the code was totally rewritten.

>  Right now I've
>set the service to auto-restart, and have scheduled a "kill named.exe"
>to run every 15 minutes, so the most it could be down for is 15
>minutes. Not really an acceptable solution.

I agree.

Danny
>No unhelpful wisecracks about changing operating systems, please. It's
>not my decision, and it's not an option.



More information about the bind-users mailing list