Zone refresh error: refresh: retry limit for master a.b.c.d#53 exceeded

Mon Jul 13 19:31:18 UTC 2015

Dear BIND users and developers,

I have 2 BIND 9.10.2-P2 servers, on the same OS and OS version, on
different networks, configured as slaves for many zones.

On one server, everything works well, and there isn't even a single
error in the log. But on the other, I see lots of errors like this:

13-Jul-2015 17:06:33.356 general: zone Z/IN/main: refresh: retry limit
for master a.b.c.d#53 exceeded (source 0.0.0.0#0)
13-Jul-2015 17:07:03.681 general: zone Z/IN/main: refresh: retry limit
for master a.b.c.e#53 exceeded (source 0.0.0.0#0)
13-Jul-2015 17:07:34.517 general: zone Z/IN/main: refresh: retry limit
for master a.b.c.f#53 exceeded (source 0.0.0.0#0)

My understanding of this error is that a SOA query over UDP for the zone
failed. However, if I use dig on this server where I see errors in the
log, to query for the SOA record of the zone, it succeeds, against each
master. There are no errors, and no timeouts.

On both servers, "try-tcp-refresh" is set to "no", because I don't want
the servers wasting time with TCP and timing out, if the UDP SOA query
has failed. Both zones have 4826 zones on them.

If I run tcpdump, I see queries for SOA records originating from the
server to the masters, and responses arriving from the masters, from the
correct source address and port, so the master servers are certainly okay.

The effect of these errors on the server are that zones on it are
frequently late with updates, whereas the other server updates promptly.

So what could cause these SOA lookup failures in BIND on one server, but
not another? Could the developers tell me how BIND does SOA queries over
UDP, and is there any way to mimic this with dig?

Regards,

Anand Buddhdev
RIPE NCC