Listen queue overflow
Lawrence K. Chen, P.Eng.
lkchen at ksu.edu
Mon Nov 18 23:57:24 UTC 2013
On 2013-11-14 17:04, Mark Andrews wrote:
> In message
> <FD9B2CB2B33E394FAE3B7466954760571D666C24 at DFWX10HMPTC01.AMER.DELL.CO
> M>, Vinny_Abello at Dell.com writes:
>> Hi Everyone,
>>
>> I recently had a recursive server running BIND 9.9.4 on FreeBSD 9.2
>> appear to wedge and stop responding to clients. I had a flurry of these
>> errors on the console:
>>
>> sonewconn: pcb 0xfffffe007211d930: Listen queue overflow: 16 already in
>> queue awaiting acceptance
>>
>> I couldn't trace that directly back to the named process by the time I
>> looked at it, but I suspect that's what it was since it's really the only
>> thing this machine is used for and it stopped working. It seems to have
>> oddly become unstuck when I logged into the machine and started looking
>> around. I never restarted named. Everything else on the server was
>> running normally from what I could tell and no other errors existed that
>> I could find. Unfortunately my logs rolled over too fast to check if
>> named had logged anything else interesting.
>>
>> From what I've found in googling, this is an OS level error stating the
>> process isn't accepting new TCP connections and it's an application
>> fault. I've only ever seen this on this particular machine, and just this
>> once. My other recursive servers are running older versions of FreeBSD.
>
> Or it's just a plain DoS attack. For any service it is possible to
> send tcp connection requests faster than the service can handle it.
>
>> Has anyone come across this before and know how to prevent or correct
>> this properly?
>
> You can tune tcp-listen-queue in named.conf. The current default is 10.
>
>> Thanks!
>>
>> -Vinny
>>
My logs have been filling up with
sonewconn: pcb 0xfffffe02bb7187a8: Listen queue overflow: 10 already in queue
awaiting acceptance
Which seems to have started since upgrading to FreeBSD 9.2 (though there have
been other changes, but on the email front...so looking at BIND hadn't
crossed my mind at all until I spotted this thread), though its only on one
server, so I had been hunting around trying to figure out where its been
coming from.
The hex number doesn't correspond to any socket that shows up with lsof,
though the sockets that lsof show some resemblence.
doing "lsof -i -T fqs" and looking at QLIM=, I had thought sendmail was the
culprit since its default Listen queue is 10. But bumping it to 128, didn't
stop the messages. And, I couldn't find any other sockets this way with
QLIM=10.
The sockets associated with named ... the tcp domain sockets have QLIM=3 and
the rndc socket has a QLIM=128. For these systems, they're all running the
system BIND (9.8.4-P2).
named 1276 bind 20u IPv4 0xfffffe00a73697a0 0t0 TCP zen:domain
(LISTEN QR=0 QS=0
SO=ACCEPTCONN,NOSIGPIPE,PQLEN=0,QLEN=0,QLIM=3,RCVBUF=524288,REUSEADDR,SNDBUF=524288
SS=NBIO TF=MSS=536,REQ_SCALE,REQ_TSTMP,SACK_PERMIT)
named 1276 bind 21u IPv4 0xfffffe00a73693d0 0t0 TCP
zen2:domain (LISTEN QR=0 QS=0
SO=ACCEPTCONN,NOSIGPIPE,PQLEN=0,QLEN=0,QLIM=3,RCVBUF=524288,REUSEADDR,SNDBUF=524288
SS=NBIO TF=MSS=536,REQ_SCALE,REQ_TSTMP,SACK_PERMIT)
named 1276 bind 22u IPv4 0xfffffe00a738b3d0 0t0 TCP
localhost:domain (LISTEN QR=0 QS=0
SO=ACCEPTCONN,NOSIGPIPE,PQLEN=0,QLEN=0,QLIM=3,RCVBUF=524288,REUSEADDR,SNDBUF=524288
SS=NBIO TF=MSS=536,REQ_SCALE,REQ_TSTMP,SACK_PERMIT)
named 1276 bind 23u IPv4 0xfffffe00a75223d0 0t0 TCP
localhost:rndc (LISTEN QR=0 QS=0
SO=ACCEPTCONN,NOSIGPIPE,PQLEN=0,QLEN=0,QLIM=128,RCVBUF=524288,REUSEADDR,SNDBUF=524288
SS=NBIO TF=MSS=536,REQ_SCALE,REQ_TSTMP,SACK_PERMIT)
FWIW, the only socket with QLIM=16 on my system is upsd (nut).
--
Who: Lawrence K. Chen, P.Eng. - W0LKC - Sr. Unix Systems Administrator
For: Enterprise Server Technologies (EST) -- & SafeZone Ally
More information about the bind-users
mailing list