Spurious SERVFAIL error returns?

Havard Eidnes he at uninett.no
Mon Apr 10 11:48:03 UTC 2017


Hi,

we are experiencing that our recursive resolver running BIND
9.9.9-P6 is sometimes apparently spuriously returning SERVAIL to
certain queries.

I suspect this is related to expiry of entries from the cache, as
this sort of error appears to occur more often for entries which
are published with a low TTL, as in e.g. '1', e.g. some local
admins have seen fit to publish cltrd012.mysql.db.uninett.no with
a 1s TTL, and after rebuilding BIND with with --enable-querytrace
I find these entries in the "client" log:

client.log.15:09-Apr-2017 12:34:52.490 query client=0x7f7feb0dd000 thread=0x7f7ff7b92000 (cltrd005.pgsql.db.uninett.no/A): query_find: unexpected error after resuming: address not available
client.log.15:09-Apr-2017 12:34:53.539 query client=0x7f7fe3404800 thread=0x7f7ff7b96000 (cltrd012.mysql.db.uninett.no/A): query_find: unexpected error after resuming: address not available
client.log.15:09-Apr-2017 12:34:53.590 query client=0x7f7fe7302800 thread=0x7f7ff7b92000 (cltrd012.mysql.db.uninett.no/A): query_find: unexpected error after resuming: address not available

etc. etc.

Sometimes names with more sensible TTLs experience these failures
as well:

client.log.25:09-Apr-2017 04:55:10.194 query client=0x7f7fe9f14000 thread=0x7f7ff7b98000 (ns.uninett.no/AAAA): query_find: unexpected error after resuming: address not available

After browsing the code a bit I thought that there was a lack of
available FDs to BIND which triggered this, but I've bumped the system
FD limit and BIND has been restarted and has all its 4096 FDs
available, and I doubt that's the actual reason.  This recursive
resolver typically processes between 500 and 1000 queries/second.

I tried googling the log message, but I could not find anything which
exactly matched; the closest appeared to be the thread from June last
year about "Issues resolving outlook.office365.org", but there I find
another error message after the "after resuming", so maybe it's a
different bug.

I would also expect that 9.9.9-P6 is a version which succeeds the
"next maintenance release" Mark was mentioning in that thread.

Regards,

- Håvard


More information about the bind-users mailing list