loss of masters over ipsec hoses bind
Matt LaPlante
cyberdog3k at gmail.com
Sat Dec 22 16:10:56 UTC 2007
On Dec 21, 2007 10:29 PM, Barry Margolin <barmar at alum.mit.edu> wrote:
> In article <fkh44f$199f$1 at sf1.isc.org>,
> "Matt LaPlante" <cyberdog3k at gmail.com> wrote:
>
> > I'm currently running Bind 9.4.1 (Ubuntu Gutsy). I have several zones
> > in master->slave setups, which normally works just fine. The other
> > day, however, I ran into an odd problem. A couple of the slave zones
> > generally update over an ipsec connected network. The ipsec
> > connection went away, and shortly thereafter bind royally wedged
> > itself, refusing to serve any data (including basic forward lookups)
> > and was not even responding to rndc restarts. It took me a good while
> > of restarting the system and poking around logs to decide to strace
> > the process, which eventually lead me to removing the ipsec-dependant
> > slave zones from the config. As soon as I did this, Bind became
> > stable again. Interestingly, zones which updated over public IP space
> > behaved fine, even if the master server was unreachable. It was only
> > zones that were trying to go over the down ipsec connection that hosed
> > the daemon.
> >
> > This whole issue is logged in a bit more detail here, including output
> > from strace:
> > https://bugs.launchpad.net/ubuntu/+source/bind/+bug/177489
> >
> > I can (apparently) reproduce this issue again with little difficulty,
> > so I'd be glad to help debug it.
> >
> > -
> > Matt LaPlante
>
> I'm having a hard time imagining how IPSEC could be impacting this.
> named uses TCP and UDP exclusively, and the underlying connection
> topology should be transparent to it. Are you sure there aren't some
> configuration differences between the public and private zones, such as
> the refresh and retry intervals? If the retry intervals are extremely
> short, named could spend all its time retrying the zone transfers after
> a failure.
Here is the zone config from one of the private zones (there are only two):
182 ; serial
3600 ; refresh (1 hour)
600 ; retry (10 minutes)
2419200 ; expire (4 weeks)
86400 ; minimum (1 day)
I realize that things *should* be transparent, but the fact is I can
reproduce the outage exactly as documented. My working theory is that
the ipsec connection is failing to return some tcp/udp packet as timed
out or unreachable or something, and bind is just waiting forever.
This is causing a lockup in the code that is in turn causing all
functionality to cease. It may in fact even be an ipsec bug in some
way, but I think the error condition itself should not dos bind in the
process. I can certainly try to gather more information if it would
be helpful (although it may take extra time given the holidays).
>
> --
> Barry Margolin, barmar at alum.mit.edu
> Arlington, MA
> *** PLEASE post questions in newsgroups, not directly to me ***
> *** PLEASE don't copy me on replies, I'll read them in the group ***
>
>
>
More information about the bind-users
mailing list