loss of masters over ipsec hoses bind
Matt LaPlante
cyberdog3k at gmail.com
Thu Jan 10 06:13:26 UTC 2008
On Jan 9, 2008 8:45 AM, Adam Tkac <atkac at redhat.com> wrote:
> On Wed, Jan 09, 2008 at 07:33:31AM -0600, Matt LaPlante wrote:
> > > > > >
> > > > > >
> > > > > > I would say that some I/O is blocking when it shouldn't
> > > > > > with sockets which use ipsec. If this is the case it is
> > > > > > a kernel bug and named can't do anything to prevent it.
> > > > > > Named marks all sockets as non-blocking.
> > > > > >
> > > > > > Mark
>
> I also expect kernel bug..
>
> >
> > Ping...
> >
> > I'm still seeing this any time one of the ipsec endpoints goes away
> > (and it happens on either end, so it's definitely repeatable).
> >
>
> I've run into same problems in RH
> (https://bugzilla.redhat.com/show_bug.cgi?id=427629). Would it be
> possible send (me or here) stack traces where exactly named hangs?
I attempted to follow the instructions in the redhat bug and got the
following output:
(gdb) info threads
4 Thread -1213547632 (LWP 3040) 0xb7c612a1 in pthread_cond_wait@@GLIBC_2.3.2
() from /lib/libpthread.so.0
3 Thread -1221936240 (LWP 3041) 0xb7c61512 in
pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
2 Thread -1230324848 (LWP 3042) 0xb7bdfad7 in select () from /lib/libc.so.6
1 Thread -1213163856 (LWP 3039) 0xb7b4ddfd in sigsuspend ()
from /lib/libc.so.6
(gdb) bt 1
#0 0xb7b4ddfd in sigsuspend () from /lib/libc.so.6
(More stack frames follow...)
(gdb) bt 2
#0 0xb7b4ddfd in sigsuspend () from /lib/libc.so.6
#1 0xb7cb416c in isc_app_run () from /usr/lib/libisc.so.32
(More stack frames follow...)
(gdb) bt 3
#0 0xb7b4ddfd in sigsuspend () from /lib/libc.so.6
#1 0xb7cb416c in isc_app_run () from /usr/lib/libisc.so.32
#2 0x0806950b in ?? ()
(More stack frames follow...)
(gdb) bt 4
#0 0xb7b4ddfd in sigsuspend () from /lib/libc.so.6
#1 0xb7cb416c in isc_app_run () from /usr/lib/libisc.so.32
#2 0x0806950b in ?? ()
#3 0x00000000 in ?? ()
(gdb)
I don't have a lot of gdb-fu to draw on, so feel free to give more
extensive instructions and I'll be glad to run through them.
> It
> will point us where problem is. Mark's patch also point me that
> internal_connect functions uses errno directly (something like switch
> (errno) statement etc.). Not sure if something modifies errno and
> socket code has unexpected behavior. Code should start use statements
> like
>
> err = errno;
> switch (err) ...
>
> instead use errno directly.
>
> Adam
>
> --
> Adam Tkac, Red Hat, Inc.
>
More information about the bind-users
mailing list