2 serious DNSSEC issues

klaus.mailinglists at pernau.at klaus.mailinglists at pernau.at
Tue Jan 17 22:41:35 UTC 2017


Hi!

We use Bind with inline-signing as "bump-in-the-wire". We started with 
Bind 9.9, used 9.10 (several versions) and recently we switched to 
9.11.0-P2.

All of them showed the same 2 problems:

1. Bind is in a signing loop and consumes memory until killed by Linux' 
OOM-killer
2. Bind produces broken zones (signatures not updates, invalid 
signatures, missing RRSIGs ..)

Problem 1 was already reported in detail to bind-bugs at isc.org but we 
never received an answer.

So, I will describe the problems in more detail below. It would be great 
if you can give us some advice how we can track this down.

ad 1) Bind endlessly resigns a zone. In the logs this is shown as 
"sending NOTIFYs" due to the increased SOA and slaves fetching the zone. 
Bind itself slaves the zone from a hidden master. But the zone on the 
hidden master is not updated:

20:38:09 named[3374]: zone klaus-dev.dnssec-signiert.at/IN (signed): 
sending notifies (serial 5691271)
20:38:10 named[3374]: client @0x7fe570031500 11.22.34.27#53632 
(klaus-dev.dnssec-signiert.at): transfer of 
'klaus-dev.dnssec-signiert.at/IN': AXFR started
  (serial 5691289)
20:38:10 named[3374]: client @0x7fe570031500 11.22.34.27#53632 
(klaus-dev.dnssec-signiert.at): transfer of 
'klaus-dev.dnssec-signiert.at/IN': AXFR ended
20:38:10 named[3374]: client @0x7fe5780cb530 11.22.34.29#57629 
(klaus-dev.dnssec-signiert.at): transfer of 
'klaus-dev.dnssec-signiert.at/IN': AXFR started
  (serial 5691302)
20:38:10 named[3374]: client @0x7fe5780cb530 11.22.34.29#57629 
(klaus-dev.dnssec-signiert.at): transfer of 
'klaus-dev.dnssec-signiert.at/IN': AXFR ended
20:38:14 named[3374]: zone klaus-dev.dnssec-signiert.at/IN (signed): 
sending notifies (serial 5691381)
20:38:15 named[3374]: client @0x7fe578496d60 11.22.34.27#36770 
(klaus-dev.dnssec-signiert.at): transfer of 
'klaus-dev.dnssec-signiert.at/IN': AXFR started
  (serial 5691416)
20:38:15 named[3374]: client @0x7fe578496d60 11.22.34.27#36770 
(klaus-dev.dnssec-signiert.at): transfer of 
'klaus-dev.dnssec-signiert.at/IN': AXFR ended
20:38:15 named[3374]: client @0x7fe570031500 11.22.34.29#45449 
(klaus-dev.dnssec-signiert.at): transfer of 
'klaus-dev.dnssec-signiert.at/IN': AXFR started
  (serial 5691421)
20:38:15 named[3374]: client @0x7fe570031500 11.22.34.29#45449 
(klaus-dev.dnssec-signiert.at): transfer of 
'klaus-dev.dnssec-signiert.at/IN': AXFR ended
20:38:19 named[3374]: zone klaus-dev.dnssec-signiert.at/IN (signed): 
sending notifies (serial 5691509)

While doing this Bind consumes more and more memory until killed by OOM 
killer. After restarting Bind it is running fine again.

On our production server we have this issue every 2 or 3 month. On our 
development server we have this issue every second day. The difference 
are the ZSK rollover timings:
prod: ZSK rollover every 90 days, sig-validity-interval=30days, ~350 
zones
dev:  ZSK rollover every  2 days, sig-validity-interval=1day, ~10 zones, 
dnssec-dnskey-kskonly

On the dev system we have multiple published and active keys which is 
for sure an untypical setup, but nevertheless Bind should not endlessly 
resign the zone.


ad 2) Before we deploy the signed zone on the public name servers we 
verify the zone with validns, dnssec-verify and ldns-verify. When 
receiving an NOTIFY from Bind we AXFR the zone and then let the tools 
inspect the zone. Once a month we have a broken zone (reported 
identically by all 3 tools). Typical errors are (here the validns 
reports)
no corresponding NSEC3 found for ...
NSEC3 mentions RRSIG, but no such record found for ...
NSEC3 without a corresponding record (or empty non-terminal)
bad SHA-256 hash length
broken NSEC3 chain, expected ... but found ...
NSEC3 mentions NSEC3PARAM, but no such record found for

Sometimes we are lucky and we can solve the problem with "rndc sign ..." 
or "rndc retransfer ...". Most of the time all this tricks do not work 
and even a Bind restart does not help. In such a case we have to stop 
Bind, delete the zone file and the journal file, and then start Bind 
(causing a fresh new incoming AXFR and signing). We do have archived 
this broken zone files for inspection.


We are willing to spend time debugging these issues (when they happen 
again) if you can give us some advice what we should check in case of an 
error.

Thanks
Klaus







More information about the bind-users mailing list