BIND dying

Terrence Koeman root at mediamonks.net
Fri Dec 14 20:23:31 UTC 2001


 
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

[errors]
> > > >12-Dec-2001 09:32:34.000 default: warning: zone transfer timeout
> > > >for "zone"; pid 536 kill failed Errcode: 10035: Operation would
> > > >block 12-Dec-2001 09:33:04.000 default: warning: zone transfer
> > > >timeout for "zone"; second kill pid 536 - forgetting, processes
> > > >may accumulate 
[version]
> > > >BIND 8.2.5-NT on Windows 2000 AS SP2
> > >
> > > It's most unlikely that warning caused BIND to die. These
> > > messages indicate
> > > that you have a timeout during the attempted zone transfer. You
> > > should check
> > > your network for connectivity to the other server.
> >
> >The connectivity is not perfect, but it's good enough.
> 
> That would concern me.

Excuse me, it _was_ good enough.  I just checked and the average ping
to the master was 900.
So the zone-transfer would probably time out then, but I think BIND
still shouldn't die, right?

> > > Is this the slave?
> >Yes.
> > > What other messages are there in the logs to indicate that it
> > > died? 
> >
> >None, they are always the last messages in the log before BIND dies
> >and is automatically restarted by the service manager. The service
> >manager logs an unexpected termination to the eventlog.
> >
> >Event ID: 7031
> >
> >"The ISC BIND service terminated unexpectedly.  It has done this 56
> >time(s).  The following corrective action will be taken in 1000
> >milliseconds: Restart the service."
> >
> > > How large is the zone file?
> >
> >About 32Mb.
> 
> That's huge.  Is this by any chance the antispam list that maintains?
> I would not surprised if you get timeouts with a large zone transfer
> like that and so-so connectivity.

OK, I'll come clean. It is in fact a anti-spam list, and I don't want
to reveal the zone name in this list until I have an idea what's going
on. For all I know now there is a possibility someone is using some DOS
exploit on my server (highly unlikely though). I'll mail you the zone
name off-list if you need it.
 
> >I wrote a script to save the PID files in a different directory each
> >startup, and I'm not sure but it seems the pid BIND is trying to
> >kill ('pid xxx kill') is not a pid of named-xfer but of named
> >itself, and thus BIND is killing itself while it should kill a
> >named-xfer process. 
> 
> Can you be sure of that? Win32 doesn't actually use the PID file 
> for anything it's just there because Unix uses them.

Yes, I'm aware of that, but the named.pid should contain the actual
pid, right?

> named-xfer should exit by itself, I don't think that named tries to
> kill it, but I haven't checked the code to be sure of this.

Well, I think the errors at the top at least indicate BIND is trying to
kill _something_ and fails twice.

The only thing I can say about that is that the pid in the errors often
(can't say always) is the same as the pid of named.exe and that named
always dies right after these errors, which only differ in pid each
time.

Further a named-xfer.exe process is created every second and writes an
information event to the eventlog. These processes seem to exit by
themselves.

This is btw the same server that is unable to query other servers by
TCP.

I'll let BIND run some hours on debug 4, maybe that will reveal
something. I wouldn't be surprised if BIND died right after some TCP
socket error, as this problem appeared almost at the same time as the
'can't query by TCP' problem, which you forwarded to the buglist.

Let me know if you need anything.

- -- 
Regards,

Terrence Koeman

Technical Director/Administrator
MediaMonks B.V. (www.mediamonks.nl)

Please quote all replies in correspondence. 

-----BEGIN PGP SIGNATURE-----
Version: PGP 7.1

iQA/AwUBPBpfw3xo/qu3lMSREQIQqgCfWsKxPWPEYBExo0e7pinGzgem+4MAoLZu
XlHiGdXFJqzy5P52QlITLml5
=k90Y
-----END PGP SIGNATURE-----



More information about the bind-users mailing list