TSIG issues, but only for one zone

Chris Peterson chris at lameness.info
Mon Jun 22 22:05:24 UTC 2009


... and only on one host.

So to start, yes my clocks are in sync to within 5 seconds.

First the info on the setup:

There's one master server ns00.example.net, and two slave servers  
ns01.example.net and ns11.example.net.
The master hosts about a dozen zones to the slaves, and uses TSIG for  
the transfers.
To make it more interesting, I can't replicate the issue transferring  
example.net with ns01, it's named does it fine, albeit with a  
different TSIG key.
This is on CentOS 5.3 i386, which has BIND 9.3.4-P1 (more specifically  
RPM says bind-9.3.4-10.P1.el5).


[root at ns11 ~]# rndc reload example.net
zone refresh queued
[root at ns11 ~]# Jun 22 14:28:21 ns11 named[1744]: 22-Jun-2009  
14:28:21.775 general: debug 1: received control channel command 'null'
Jun 22 14:28:21 ns11 named[1744]: 22-Jun-2009 14:28:21.776 general:  
debug 1: received control channel command 'reload example.net'
Jun 22 14:28:21 ns11 named[1744]: 22-Jun-2009 14:28:21.776 general:  
debug 1: queue_soa_query: zone example.net/IN: enter
Jun 22 14:28:21 ns11 named[1744]: 22-Jun-2009 14:28:21.776 general:  
debug 1: soa_query: zone example.net/IN: enter
Jun 22 14:28:22 ns11 named[1744]: 22-Jun-2009 14:28:22.247 general:  
debug 1: refresh_callback: zone example.net/IN: enter
Jun 22 14:28:22 ns11 named[1744]: 22-Jun-2009 14:28:22.247 general:  
info: zone example.net/IN: refresh: failure trying master 1.1.2.50#53  
(source 0.0.0.0#0): tsig verify failure
Jun 22 14:28:22 ns11 named[1744]: 22-Jun-2009 14:28:22.247 general:  
debug 1: queue_soa_query: zone example.net/IN: enter
Jun 22 14:28:22 ns11 named[1744]: 22-Jun-2009 14:28:22.278 general:  
debug 1: soa_query: zone example.net/IN: enter
Jun 22 14:28:22 ns11 named[1744]: 22-Jun-2009 14:28:22.279 general:  
debug 1: cancel_refresh: zone example.net/IN: enter

But when I do another zone, keep in mind this is to the same master,  
so the TSIG settings are exactly the same (I've set them up per-IP not  
per-zone).
Jun 22 14:31:14 ns11 named[1744]: 22-Jun-2009 14:31:14.008 general:  
info: zone example.com/IN: Transfer started.
Jun 22 14:31:14 ns11 named[1744]: 22-Jun-2009 14:31:14.008 general:  
debug 1: zone example.com/IN: requesting IXFR from 1.1.2.50#53
Jun 22 14:31:14 ns11 named[1744]: 22-Jun-2009 14:31:14.100 general:  
debug 1: zone example.com/IN: zone transfer finished: success
Jun 22 14:31:14 ns11 named[1744]: 22-Jun-2009 14:31:14.100 general:  
info: zone example.com/IN: transferred serial 2009062204: TSIG  
'ns11.example.net-ns01.example.net'

I can't make heads or tails of *WHY* exactly tsig is throwing the  
verify error, even with debugging turned up to 99 the above is all I  
get in my logs.

Just to make things more interesting, if I do a TSIG AXFR query  
directly from dig on ns11, it works with example.net!

[root at ns11 ~]# dig @1.1.1.50 example.net axfr -y ns11.example.net- 
ns01.example.net.:2HL0vpUE2JYFxv0YaAtrVg==
; <<>> DiG 9.3.4-P1 <<>> @1.1.1.50 example.net axfr -y  
ns11.example.net-ns01.example.net.
; (1 server found)
;; global options:  printcmd
example.net.		86400	IN	SOA	example.net. support.example.net.  
2009062202 600 300 3600000 86400
[snip]
example.net.		86400	IN	SOA	example.net. support.example.net.  
2009062202 600 300 3600000 86400
ns11.example.net-ns01.example.net. 0 ANY TSIG hmac-md5.sig- 
alg.reg.int. 1245707011 300 16 l+rb6H0RuqwXCT6H4G6JgQ== 49169 NOERROR 0
;; Query time: 272 msec
;; SERVER: 1.1.1.50#53(1.1.1.50)
;; WHEN: Mon Jun 22 14:43:31 2009
;; XFR size: 32 records (messages 1)

Help? I'm open to trying just about any crazy ideas at this point.





More information about the bind-users mailing list