rpz testing -> shut down hung fetch while resolving

Havard Eidnes he at uninett.no
Thu Jan 26 18:03:37 UTC 2023


Hi,

I recently made an upgrade of BIND to version 9.18.11 on our
resolver cluster, following the recent announcement.  Shortly
thereafter I received reports that the validation that lookups of
"known entries" in our quite small RPZ feed (it's around 1MB
on-disk) no longer succeeds as expected, but instead take a long
time, finally gives SRVFAIL to the client, and associated with
this we get this log message:

Jan 26 18:41:27 xxx-res named[6179]: shut down hung fetch while resolving 'known-rpz-entry.no/A'

Initially I thought that this was new behaviour between BIND
9.18.10 and 9.18.11, but after downgrading to 9.18.10 on one of
the affected nodes, this problem is still observable there.
Also, only a subset of our 4 nodes exhibit this behaviour,
despite the unaffected ones running 9.18.11, which is quite
strange.  None of the name servers are under severe strain by any
measure -- one affected sees around 200qps, another around 50qps
at the time of writing.

I want to ask if this sort of issue is already known (I briefly
searched the issues on ISC's gitlab and came up empty), and also
to ask if there is any particular sort of information I should
collect to narrow this down if it is a new issue.

Regards,

- Håvard


More information about the bind-users mailing list