Listen queue overflow

Mon Nov 18 23:57:24 UTC 2013

On 2013-11-14 17:04, Mark Andrews wrote:
> In message 
> <FD9B2CB2B33E394FAE3B7466954760571D666C24 at DFWX10HMPTC01.AMER.DELL.CO
> M>, Vinny_Abello at Dell.com writes:
>> Hi Everyone,
>> 
>> I recently had a recursive server running BIND 9.9.4 on FreeBSD 9.2
>> appear to wedge and stop responding to clients. I had a flurry of these
>> errors on the console:
>> 
>> sonewconn: pcb 0xfffffe007211d930: Listen queue overflow: 16 already in
>> queue awaiting acceptance
>> 
>> I couldn't trace that directly back to the named process by the time I
>> looked at it, but I suspect that's what it was since it's really the only
>> thing this machine is used for and it stopped working. It seems to have
>> oddly become unstuck when I logged into the machine and started looking
>> around. I never restarted named. Everything else on the server was
>> running normally from what I could tell and no other errors existed that
>> I could find. Unfortunately my logs rolled over too fast to check if
>> named had logged anything else interesting.
>> 
>> From what I've found in googling, this is an OS level error stating the
>> process isn't accepting new TCP connections and it's an application
>> fault. I've only ever seen this on this particular machine, and just this
>> once. My other recursive servers are running older versions of FreeBSD.
> 
> Or it's just a plain DoS attack.  For any service it is possible to
> send tcp connection requests faster than the service can handle it.
> 
>> Has anyone come across this before and know how to prevent or correct
>> this properly?
> 
> You can tune tcp-listen-queue in named.conf.  The current default is 10.
> 
>> Thanks!
>> 
>> -Vinny
>> 

My logs have been filling up with

sonewconn: pcb 0xfffffe02bb7187a8: Listen queue overflow: 10 already in queue 
awaiting acceptance

Which seems to have started since upgrading to FreeBSD 9.2 (though there have 
been other changes, but on the email front...so looking at BIND hadn't 
crossed my mind at all until I spotted this thread), though its only on one 
server, so I had been hunting around trying to figure out where its been 
coming from.

The hex number doesn't correspond to any socket that shows up with lsof, 
though the sockets that lsof show some resemblence.

doing "lsof -i -T fqs" and looking at QLIM=, I had thought sendmail was the 
culprit since its default Listen queue is 10.  But bumping it to 128, didn't 
stop the messages.  And, I couldn't find any other sockets this way with 
QLIM=10.

The sockets associated with named ... the tcp domain sockets have QLIM=3 and 
the rndc socket has a QLIM=128.  For these systems, they're all running the 
system BIND (9.8.4-P2).

named   1276 bind   20u    IPv4 0xfffffe00a73697a0      0t0    TCP zen:domain 
(LISTEN QR=0 QS=0 
SO=ACCEPTCONN,NOSIGPIPE,PQLEN=0,QLEN=0,QLIM=3,RCVBUF=524288,REUSEADDR,SNDBUF=524288 
SS=NBIO TF=MSS=536,REQ_SCALE,REQ_TSTMP,SACK_PERMIT)
named   1276 bind   21u    IPv4 0xfffffe00a73693d0      0t0    TCP 
zen2:domain (LISTEN QR=0 QS=0 
SO=ACCEPTCONN,NOSIGPIPE,PQLEN=0,QLEN=0,QLIM=3,RCVBUF=524288,REUSEADDR,SNDBUF=524288 
SS=NBIO TF=MSS=536,REQ_SCALE,REQ_TSTMP,SACK_PERMIT)
named   1276 bind   22u    IPv4 0xfffffe00a738b3d0      0t0    TCP 
localhost:domain (LISTEN QR=0 QS=0 
SO=ACCEPTCONN,NOSIGPIPE,PQLEN=0,QLEN=0,QLIM=3,RCVBUF=524288,REUSEADDR,SNDBUF=524288 
SS=NBIO TF=MSS=536,REQ_SCALE,REQ_TSTMP,SACK_PERMIT)
named   1276 bind   23u    IPv4 0xfffffe00a75223d0      0t0    TCP 
localhost:rndc (LISTEN QR=0 QS=0 
SO=ACCEPTCONN,NOSIGPIPE,PQLEN=0,QLEN=0,QLIM=128,RCVBUF=524288,REUSEADDR,SNDBUF=524288 
SS=NBIO TF=MSS=536,REQ_SCALE,REQ_TSTMP,SACK_PERMIT)

FWIW, the only socket with QLIM=16 on my system is upsd (nut).

-- 
Who: Lawrence K. Chen, P.Eng. - W0LKC - Sr. Unix Systems Administrator
For: Enterprise Server Technologies (EST) -- & SafeZone Ally