problems with rip.psg.com

Andrew McNamara andrewm at connect.com.au
Tue Nov 16 10:53:51 UTC 1999


>> I must admit, I've seen your postings to the bind mailing list, but
>> didn't really let it sink in. From memory, you're seeing a selective
>> failure to load new zones (what about updated old zones - are these
>> also affected)? Is it failing to transfer, or failing to load zones?
>
>as it is utterly unlogged, i have no idea and suspect the bug is causing
>bind to just ignore a few entries in $include files.

Do you have a lot of included files? We only use one. Maybe it's a file
descriptor leak in the handling of $includes, or maybe you're just
bumping up against the user file descriptor limit? 

I can't remember what OS you use - if solaris, you should have
"/usr/proc/bin/pfiles <pid>" - "lsof" may be an option.

If named has run out of file descriptors, it's possible (depending on
how it's doing it's logging) that there wouldn't be enough file
descriptors to log a message (syslog delays the opening of it's socket
until the first message is logged).

Another option in this case would be to run truss/strace on the
offending named - if there's an unreported error, that should show up
in the truss output.

>> I presume you've looked at obvious things - like it's not bumping up
>> against the max number of named-xfer processes? This particular one
>> bit here the other day - I hadn't bothered checking it because it never
>> used to be a problem, but with the number of clueless admins on the net
>> these days, broken name servers were permanently eating up my reserves
>> of named-xfer slots.
>
>how would one determine if this was the cause?

In our case, it was just a lot of named-xfer processes. A ps should
reveal this.

I think we secondary less domains that you, and we seem to be sitting at
a steady 20-odd named-xfers at the moment.

>  and, if it was, would they
>not get picked up on a subsequent pass.

I guess it depends if they're loaded sequentially - if so, the pattern
may be the same each time a reload is done. It does eventually time
out, but if there's a lot of broken primaries (and we're certainly not
helping in the data we feed you - I'm sorry I haven't been able to make
much progress cleaning things up).

>remember, we're talking about zones not being secondaried for weeks.

If it's repeatable across reloads, that would suggest it may be
possible to insert debug statements into the code, and, in particular,
duplicate it offline somewhere just by copying your named data file
structure.

 ---
Andrew McNamara (System Architect)

connect.com.au Pty Ltd
Lvl 3, 213 Miller St, North Sydney, NSW 2060, Australia
Phone: +61 2 9409 2117, Fax: +61 2 9409 2111


More information about the bind-users mailing list