BIND 9.4.2-P2-W1 stops responding

Vinny Abello vinny at tellurian.com
Fri Sep 5 22:06:28 UTC 2008


> -----Original Message-----
> From: bind-users-bounce at isc.org [mailto:bind-users-bounce at isc.org] On
> Behalf Of Vinny Abello
> Sent: Friday, September 05, 2008 5:20 PM
> To: mayer at gis.net
> Cc: bind-users at isc.org
> Subject: RE: BIND 9.4.2-P2-W1 stops responding
>
> > -----Original Message-----
> > From: Danny Mayer [mailto:mayer at gis.net]
> > Sent: Friday, September 05, 2008 4:18 PM
> > To: Vinny Abello
> > Cc: bind-users at isc.org
> > Subject: Re: BIND 9.4.2-P2-W1 stops responding
> >
> > Vinny Abello wrote:
> > > OK, this happened again. This time I noticed that BIND was not
> > responding on the primary IP bound to the server that it usually
> would
> > previously respond on. It kept answering queries on a secondary IP
> > bound to the NIC however. Again, nothing in the logs indicating any
> > type of problem that I can see. Perhaps this is related to having
> > multiple IP's bound to the machine. I restarted the service and it
> > started working again on both IP addresses.
> > >
> > > Any ideas?
> > >
> > >> -----Original Message-----
> > >> From: bind-users-bounce at isc.org [mailto:bind-users-bounce at isc.org]
> > On
> > >> Behalf Of Vinny Abello
> > >> Sent: Friday, September 05, 2008 1:33 PM
> > >> To: bind-users at isc.org
> > >> Subject: BIND 9.4.2-P2-W1 stops responding
> > >>
> > >> I just upgraded from BIND 9.4.2 to BIND 9.4.2-P2-W1 on Windows
> > Server
> > >> 2003. The service no longer crashes like it did in P1 and P2,
> > however
> > >> after about 12 hours of load, named just stops responding to
> queries
> > >> completely. The service appears that it is still running but will
> > not
> > >> respond to any type of query. I've restarted it and it came back
> to
> > >> life again. I'm going to watch it more carefully to look for any
> > other
> > >> types of symptoms. I checked the log files and nothing out of the
> > >> ordinary was in the logs. In fact, according to the logs, it
> appears
> > >> that zone transfers were still happily taking place while it was
> not
> > >> responding to queries.
> > >>
> > >> I don't know if these have anything to do with the issue, but
> there
> > are
> > >> a few odd errors I noted after starting it back up that are
> > appearing
> > >> in the logs. They are:
> > >>
> > >> 05-Sep-2008 13:19:26.827 dispatch: dispatch 03E25098: shutting
> down
> > due
> > >> to TCP receive error: <unknown address, family 48830>: network
> > >> unreachable
> > >>
> > >> 05-Sep-2008 13:20:38.171 general: .\socket.c:2340: unexpected
> error:
> > >> 05-Sep-2008 13:20:38.171 general: unable to convert errno to
> > >> isc_result: 121: The semaphore timeout period has expired.
> > >>
> > >> 05-Sep-2008 13:21:14.733 dispatch: dispatch 03E288B0: shutting
> down
> > due
> > >> to TCP receive error: <unknown address, family 48830>: network
> > >> unreachable
> > >>
> > >> 05-Sep-2008 13:21:44.122 general: .\socket.c:2340: unexpected
> error:
> > >> 05-Sep-2008 13:21:44.122 general: unable to convert errno to
> > >> isc_result: 121: The semaphore timeout period has expired.
> > >>
> > >> 05-Sep-2008 13:23:35.351 general: .\socket.c:2340: unexpected
> error:
> > >> 05-Sep-2008 13:23:35.351 general: unable to convert errno to
> > >> isc_result: 121: The semaphore timeout period has expired.
> > >>
> > >> 05-Sep-2008 13:24:41.300 general: .\socket.c:2340: unexpected
> error:
> > >> 05-Sep-2008 13:24:41.300 general: unable to convert errno to
> > >> isc_result: 121: The semaphore timeout period has expired.
> > >>
> > >>
> > >> There are other normal messages in between those errors. I just
> > picked
> > >> them out.
> > >>
> > >> Some possible information that might help with this server's
> > >> configuration. This server has multiple IPv4 IP addresses bound to
> > the
> > >> same network and same NIC. There is no IPv6 stack installed on the
> > >> server. This server currently does recursion and also hosts some
> > >> secondary zones as well.
> > >>
> > >>
> > >> -Vinny
> >
> > Try setting max-cache and see if that helps with the queries. Don't
> > worry about those other error messages. They're harmless.
> >
> > Danny
>
> OK, I've added the avoid-v4-udp-ports to my named.conf with all the UDP
> ports I could identify were being used by other applications including
> my RADIUS service. I've restarted and I'll see if this helps at all.

Well, that had no effect. Still seems to die pretty frequently. I can't easily catch and restart BIND every 30 minutes so I'm going to have to replace this server with a different one running an operating system that behaves better with BIND. I already did this on my other two name servers. If you have any other ideas or reasons I shouldn't abandon BIND on Windows, let me know while I can still test it.

-Vinny


More information about the bind-users mailing list