Socket buffer space?

Patrik Lundin patrik at sigterm.se
Fri Jun 7 08:01:37 UTC 2019


On Tue, Dec 11, 2018 at 05:46:10PM +0100, Havard Eidnes wrote:
> 
> Hmm, I already have that, but I wonder, how big is "bigger"?  Well,
> looks like the answer is that BIND tries to probe for the biggest it
> can be allowed to set on startup, by starting with a large value and
> approximately halfing it successively if I read the code right.  BIND
> doesn't log what setting it is using, though...
> 

I stumbled across this thread today after also investigating what socket
buffer size is actually chosen by BIND. I noticed the code behaved
a bit differently then what I tought from first looking at it.

On my linux machine with net.core.rmem_max set to the system default of
"212992" I was expecting the "again" goto loop to decrease the rcvbuf
until setsockopt succeeded, but setsockopt actually succeeds even if the
requested size is larger than what is allowed by that limit.

What happens is that the setsockopt succeeds, but the actual value set is the
maximum allowed by net.core.rmem_max (which ends up being doubled). The
doubling is described in socket(7):
===
SO_RCVBUF
 Sets or gets the maximum socket receive buffer in bytes.  The kernel doubles this value (to allow space for
 bookkeeping  overhead)  when  it is set using setsockopt(2), and this doubled value is returned by getsock‐
 opt(2).  The default value is set by the /proc/sys/net/core/rmem_default  file,  and  the  maximum  allowed
 value is set by the /proc/sys/net/core/rmem_max file.  The minimum (doubled) value for this option is 256.
===

Here is a standalone version of set_rcvbuf() that I yanked out of BIND 9.11.6
codebase with some printfs sprinkled in for added visibility:
===
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/udp.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>

#define ISC_SOCKADDR_LEN_T unsigned int
#define ISC_PLATFORM_HAVEIPV6 1

#define TUNE_LARGE 1

/*%
 * The size to raise the receive buffer to (from BIND 8).
 */
#ifdef TUNE_LARGE
#ifdef sun
#define RCVBUFSIZE (1*1024*1024)
#else
#define RCVBUFSIZE (16*1024*1024)
#endif
#else
#define RCVBUFSIZE (32*1024)
#endif /* TUNE_LARGE */

static int              rcvbuf = RCVBUFSIZE;

static void
set_rcvbuf(void) {
        int fd;
        int max = rcvbuf, min;
        ISC_SOCKADDR_LEN_T len;

        // Added stuff
        int final;
        ISC_SOCKADDR_LEN_T final_len;

        printf("requested SO_RCVBUF size (max): %d\n", max);

        fd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
#if defined(ISC_PLATFORM_HAVEIPV6)
        if (fd == -1) {
                switch (errno) {
                case EPROTONOSUPPORT:
                case EPFNOSUPPORT:
                case EAFNOSUPPORT:
                /*
                 * Linux 2.2 (and maybe others) return EINVAL instead of
                 * EAFNOSUPPORT.
                 */
                case EINVAL:
                        fd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_UDP);
                        break;
                }
        }
#endif
        if (fd == -1)
                return;

        len = sizeof(min);
        if (getsockopt(fd, SOL_SOCKET, SO_RCVBUF, (void *)&min, &len) == 0 &&
            min < rcvbuf) {

                printf("initial SO_RCVBUF size (min) %d is less than %d, attempting to increase it\n", min, rcvbuf);
 again:
                printf("attempting to set SO_RCVBUF to rcvbuf (%d)\n", rcvbuf);
                if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF, (void *)&rcvbuf,
                               sizeof(rcvbuf)) == -1) {
                        printf("setsockopt failed\n");
                        if (errno == ENOBUFS && rcvbuf > min) {
                                printf("errno was ENOBUFS\n");
                                printf("max: %d\n", max);
                                max = rcvbuf - 1;
                                printf("new max: %d\n", max);
                                rcvbuf = (rcvbuf + min) / 2;
                                printf("new rcvbuf: %d\n", max);
                                goto again;
                        } else {
                                //printf("errno was not ENOBUFS (was: %s)\n", strerror(errno));
                                rcvbuf = min;
                                printf("min rcvbuf: %d\n", rcvbuf);
                                goto cleanup;
                        }
                } else {
                        printf("setsockopt succeeded\n");
                        min = rcvbuf;
                        printf("new min: %d\n", min);
                }
                if (min != max) {
                        printf("min (%d) not equal to max (%d)\n", min, max);
                        rcvbuf = max;
                        goto again;
                }
        }

        final_len = sizeof(final);
        if (getsockopt(fd, SOL_SOCKET, SO_RCVBUF, (void *)&final, &final_len) == 0 ) {
                printf("final SO_RCVBUF size: %d\n", final);
        }

 cleanup:
        close (fd);
}

int main() {
        set_rcvbuf();
        return 0;
}
===

And the result from running it on my machine:
===
$ sysctl net.core.rmem_max
net.core.rmem_max = 212992

$ gcc -Wall -pedantic -Wextra bind_rcvbuf.c -o bind_rcvbuf

$ ./bind_rcvbuf
requested SO_RCVBUF size (max): 16777216
initial SO_RCVBUF size (min) 212992 is less than 16777216, attempting to increase it
attempting to set SO_RCVBUF to rcvbuf (16777216)
setsockopt succeeded
new min: 16777216
final SO_RCVBUF size: 425984
===

So here the socket buffer ends up at 425984 (that is, net.core.rmem_max*2), and
after setting net.core.rmem_max to 16777216 (the requested value when using
--with-tuning=large):
===
$ ./bind_rcvbuf
requested SO_RCVBUF size (max): 16777216
initial SO_RCVBUF size (min) 212992 is less than 16777216, attempting to increase it
attempting to set SO_RCVBUF to rcvbuf (16777216)
setsockopt succeeded
new min: 16777216
final SO_RCVBUF size: 33554432
===

As Håvard pointed out BIND does not log what it ends up using, which given the
above can be pretty confusing. Would it make sense to add some logging to
set_rcvbuf (from what I can tell it would only be run once since it is guarded
by rcvbuf_once)? For later versions of BIND I guess the same holds true for
set_sndbuf().

>
> However, it appears that BIND applies this same setting to each and
> every UDP socket BIND creates, ref. lib/isc/unix/socket.c's
> opensocket() function, which is probably not required.  I would have
> thought it would be sufficient to set it on those sockets which serve
> port 53, and not on those temporary sockets BIND creates to talk to
> other name servers in the process of doing recursion.  On a system
> which doesn't overcommit resources, this is responsible for needless
> waste.
>

I noticed this as well, is there a reason the increased SO_RCVBUF is used by
all sockets, not just the ones listening for requests?

-- 
Patrik Lundin


More information about the bind-users mailing list