BIND 9.7.0 introduced automatic in-server signature re-freshing and automatic key rollover. This allows BIND 9.7, if provided with the DNSSEC private key files, to sign records as they are added to the zone, or as the signatures need to be refreshed. This refresh happens periodically to spread out the load on the server and to even out zone transfer load.
However, BIND 9.7.0 and 9.7.1 will not perform this slow signing step in one case. When a new Zone Signing Key (ZSK) is being rolled to, BIND will very quickly re-sign the zone using this new key and remove the old key from use.
This was not an accident. We believed that this is what operators would want, since getting an old key out of the zone as quickly as possible means it can be removed from the DNSKEY record set as quickly as possible.
For operators of large zones, this caused problems. A small zone which can quite quickly be re-signed is unlikely to notice the effect our current method causes. Large zones may have sudden and large CPU increases as signatures are created, and large zone transfers which may interfere with publication requirements.
I will go into some technical detail about how the key timers work, how they are used in the command-line tools, and how BIND 9.7.0 and 9.7.1 uses them. I will also explain the functional change we plan to include in BIND 9.7.2 to change this behavior to ease operational problems.
Description of Key Timers
Before we can dig too deeply in how a gradual key roll may occur, I need to describe the internal state BIND 9 maintains on a particular key. This is a description of how a Zone Signing Key (ZSK) is tracked within BIND 9.7. A similar method is used for Key Signing Keys but is not documented here.
In BIND 9.6 external tools were used to re-sign the zone, namely dnssec-signzone. The keys were managed externally to the server process, usually manually. Key management was performed by controlling which private keys the command-line tool had access to when signing was performed. In BIND 9.7, management of keys has been moved into the server.
A ZSK changes states using defined points in time. These states are: Created, Published, Active, Inactive, and Removed. The private key file itself maintains these timers and they are set upon key creation or through a command line tool. Once specified, these timers trigger the necessary state changes. The key states are used by the server for key selection when signing and which to include in the zone DNSKEY record set.
Figure 1: ZSK Timer Relationships
In Figure 1, the arrows represent events (blue) or times (yellow) when the key state changes. Between these events or times, The periods we are most concerned with inside BIND 9 are marked with the letters A, B, C, and D.
- A: The key is created and on disk, but not yet included in the zone file and not used to sign any records. Keys can be pre-created as early as desired.
- B: The key is published in the zone, but is not yet used to sign any records. The new key must be pre-published at for at least as long as the TTL on the DNSKEY record plus any master to secondary transfer delays.
- C: The key is published in the zone, and is active for use in signing records. How long a key is used is a local policy decision. Recommendations range from rolling very frequently to five years or more. Each has its merits, and they are not discussed here.
- D: The key is published in the zone, but is inactive and is no longer used to sign any records. How long a key remains in this state is dependent both on when it was last used and the TTL values in the zone.
- The last state (not shown in the figure) is removed. It is no longer used by BIND in any way. This state is terminal.
Section D is somewhat interesting because it has two components. At some point during this interval, BIND 9.6 tools would have removed all signatures made with this key as signatures were refreshed over time. This is the first part of this box. In BIND 9.7.0 and 9.7.1 the duration is 0, causing an immediate spike in all types of resource requirements.
The second part of box D represents the maximum TTL value in use in the zone. The key cannot be removed from the zone until all records signed with this key have expired from caches. This is based on zone contents and is unchanged between server versions.
Figure 2 describes what is a typical ZSK roll. Exactly one ZSK is active at all times. This is for form of key rolling that is expected to be used in production as it puts the least strain on resources by not having more than one signature on a particular record.
The rolling period dnssec-signzone uses is described by R. This is directly related to the zone’s signature validity period. If signatures last 30 days, they must be refreshed before that 30 days have passed or they will become stale, and the zone will fail to validate. This typically happens sooner than strictly necessary, but it must occur (using 30 days as an example) at least (30 days – MaxTTL) to avoid problems.
With the command-line tools, when a ZSK becomes Inactive, the key would no longer be used to sign records. The new key would be used for all signatures, and this happens as signatures are refreshed. This does not mean no signatures exist for this key, only that no new ones will be created using that key.
BIND 9.7.0 and BIND 9.7.1 treat this transition from old to new key as an immediate trigger to re-sign the entire zone with the new key and remove the old key’s signatures. This causes a huge delta change in zone contents; increases server CPU load; and increases resources needed hold and transfer this zone data.
This change was not an accident. The purpose in this immediate re-signing was to remove the old key as quickly as possible. This is a good idea in an emergency rolling, but the behavior change was unexpected and has caused operational problems.
In Figure 2, one key stops signing records at the exact moment another key begins signing. This is what is expected to be done with ZSKs in practice as it minimizes all overhead of storing and transmitting two signatures.
Nothing currently in BIND 9.7 or proposed here disallows overlapping active regions. A record may be signed many times by many keys, and the overhead may be necessary at times for particular types of key rolling. However, it is critical that while active regions may overlap, they must never be disjoint. If at any time there is a gap between keys BIND 9 cannot correctly maintain the zone and the zone will appear broken to validating resolvers.
Proposed change in 9.7.2
The proposed change to 9.7.2 is to make the command-line tool behavior the default visible behavior. Signatures will transition from old to new key as the records re-signing timer expires. Additionally, a key will not be removed from the zone until BIND 9 knows that all signatures using that key are removed from the zone and it is safe based on the TTL to remove that key.
It is important that keys have sane timer values set or the zone may become broken when rolling to a new ZSK. BIND 9 may need to alter the administrator-supplied values for pre- and post-use publish in order to ensure the zone does not break. It should also be possible for BIND 9 to be provided only key start/stop times and have a reasonable pre- and post-use publish time calculated based on zone TTL values and last use of a key.
It may also be necessary for some keys to be used past their end date. An example of this would be if a key is added but no following key is provided. Rather than break the zone, the older key may continue to be used, with sufficient notification in the log files to indicate this is happening.
The expected impact of this change is anticipated to be minimal. Those anticipating the same transitional key rollover when migrating from command-line tools to BIND 9.7’s autosign feature will see expected behavior. Those who were unaware of the difference and for whom it was operationally insignificant will remain unaware. There will be a short-term difference in zone size as a result of the DNSKEY record set being larger for a longer period of time. A safety control may need to be added to ensure that a key is not removed after deactivation until it is safe to so so, producing warnings if triggered.
There may be installations or circumstances where the current 9.7.0 and 9.7.1 behavior is needed. To counteract this, we plan to introduce a new control that allows this immediate behavior to be enabled. I believe a “resigning roll duration” set to “automatic” or “immediate” may suffice. This will allow the administrator to choose the old immediate behavior if they really want it, or for immediate issues like compromised keys. It balances updates over time by default, choosing the best rate to resign to ensure all signatures are updated before the old key is scheduled to be removed.