Secondaries sometimes don't respond to notify

John Wobus jw354 at cornell.edu
Thu Feb 3 20:37:26 UTC 2005


I'm not solving this, so I'll give an update and see if anyone has
good ideas to offer me.

The problem is two secondaries that randomly "drop" notifies from
the primary (BIND9.3, more details in previous message).

The servers are on Solaris 8 and I used snoop (like tcpdump)
to verify that the packets do indeed cross the network to the
secondary.  Then, sometimes the notify works as advertised,
but at random times two kinds of failures occur.  Sometimes
named on the secondary never logs that it received the notify.
In much fewer instances, named does log it, but snoop never
shows it responding with the notify response.  I've tried looking at
truss output, but didn't make progress fitting together much
more of the picture.

The ideas I still have left to try are (1) crank up and pore through
BIND debugging logging or (2) put the secondary on a bigger
server and see if the problem disappears.  I can believe that
the secondary is simply too busy, but would expect some sort
of logging or to hear confirmation that other sites have seen
this before.  In evidence, I do see the problem occurring
more often during busier times.

Any ideas/inspirations appreciated.

John Wobus

On Jan 24, 2005, at 5:23 PM, John Wobus wrote:

> Notifies sometimes get lost between our bind 9.3 servers.  What can I
> look for as a cause?
>
> Two secondary servers are showing the problem with the same primary
> server.  When the failure occurs, the primary server logs that it sent
> notifies, then logs 'notify retries exceeded' for the secondary in
> question.  The secondary's log shows nothing.  Zones and secondaries
> affected at any particular instance are random: failure occurs for only
> 10-40% of the notifications.  When one secondary fails for a particular
> zone, the other one often succeeds in loading it.  The new zone files
> have updated SOA serial numbers. The failing secondary later transfers
> the zone successfully, when the refresh interval expires.  None of the
> servers have firewall software.  The servers serve fewer than 300
> zones.
>
> I've checked the network, the bind config file options (which are
> generally the defaults), looked for other problems in the logs,
> searched my bind books/manuals and searched online and I have run out
> of ideas.
>
> John Wobus
> Cornell CIT
>
>



More information about the bind-users mailing list