Notice of plan to deprecate map zone file format

Timothe Litt litt at acm.org
Fri Sep 10 14:40:34 UTC 2021


On 10-Sep-21 08:36, Victoria Risk wrote:
>
>
>> On Sep 10, 2021, at 7:24 AM, Timothe Litt <litt at acm.org
>> <mailto:litt at acm.org>> wrote:
>>
>> Clearly map format solved a big problem for some users.  Asking
>> whether it's OK to drop it with no statement of what those users
>> would give up today is not reasonable.
>>
> Actually, we are not sure there ARE any users. In fact, the one
> example I could come up with was Anand, who has replied to the list
> that he is in fact NOT using map zone.  I should have asked directly -
> is anyone on this list USING MAP ZONE format?
>
Well, if the answer is "no one", that simplifies matters :-)

I do remember that startup time was a big issue before map came out, and
that the complaints subsided thereafter.  No personal knowledge as to
whether that was cause and effect or a realignment of the planets.  In
general, I don't look to Astrology for answers :-)

>> After all the "other improvements in performance" that you cited,
>> what is the performance difference between map and the other formats?
>
> I don’t know that, to be honest. We don’t have the resources to
> benchmark everything. Maybe someone on this list could?  We would also
> like to be able to embark on a wholesale update to the rbtdb next year
> and this is the sort of thing that might complicate refactoring
> unnecessarily.

IIRC, when I did some work on the stats channel & was concerned with
scalability, Evan said that you keep some large datasets (1M+zones)
around for testing and produced some numbers for that.  So it ought to
be possible to get some basic data.

I'm not suggesting a full benchmarking campaign -but one or two
datapoints are a lot better than none.  E.g. If there's no difference
with 1 or 10M zones with, say, 10K records each, it's pretty clear that
map's time is past.  If it's orders of magnitude faster (and it's used),
it's not.

I don't remember - did your user survey ask about how many/how large
zones people serve?  I vaguely think so, but it's been a while...

>> For a case which took 'several hours' before map was introduced, what
>> would the restart time be for named if raw format was used now?
>>
>>> If I knew that I would have said. 'Raw’ was much faster than the
>>> text version. Map was faster than raw. Raw is apparently not a
>>> problem to maintain.  I believe the improvement with raw was ~3x.
>>>
>
I think the questions are: (a) is startup time an issue (however it's
solved)?, (b) if so, is map format the solution? (c) If it is and people
are using it, what would the consequences be to them if it went away?
(d) If it is, and people aren't using it - is the documentation too
scary (as Anand said it is for him)?
>> It's pretty clear to me that if map format saves a few seconds in the
>> worst case, it's not worth keeping.  If it saves hours for large
>> operators, then the alternative isn't adequate.  Maybe "map" isn't
>> the answer - how might 'raw' compare to a tuned database back end? 
>> (Which has other advantages for some.)  What if operators specified a
>> priority order for loading zones?  Or zones were loaded on demand
>> during startup, with low activity zones added as a background task? 
>> Or???
>
> Well, back when we added map zone format, startup time was a major
> pain point for some users. Now, it seems as though large operators are
> updating their zones all the time (also updating RPZ feeds) and
> efficiency in transfers seems to be a bigger issue. 
>
What I was getting as is how hard the definition of "startup time" is. 
Time to serving all zones?  Important zones? Is it OK for responses to
be slow during startup, or is startup only complete when responses are
at nominal speed?

I wonder if this comes from large operators using a database(DLZ)  back
end.  Database developers tend to have a single-minded focus on
performance, and direct updates are probably faster than going thru
named & its generalized authentication/validation.  Plus, depending on
how you set up your server architecture, DB replication can replace DNS
zone transfers.

> We don’t have any direct data on what features are being used, we can
> only judge based on complaints we receive via bug tickets or posts on
> this list.
You did a survey a while back...
>>
>> A fair question for users would be what restart times are acceptable
>> for their environment - obviously a function of the number and
>> size/content of zones.  And is a restart "all or nothing", or would
>> some priority/sequencing of zone availability meet requirements?
>>
> That is a good question. Can you answer it for yourself?

Sure.  I'm not a large operator, but I've always thought big and
implemented smaller.  About 350 zones, 2 real views and 2 static-stub
recursive views.  50-a couple of hundred records/zone - not counting the
DNSSEC signatures & overhead that named generates.  ~10 servers.  Plus a
3rd party backup service.  Anything under a minute is a reasonable
startup time for named - though most of my servers are underpowered.
(e.g. RPi class machines with USB disks that sleep a lot.)  Two minutes
is tolerable.  Longer than that, I'd have issues.

If I were a larger operator and had to choose, I'd prioritize external
views so that key services (e.g. e-mail, webservers, vpns,...) aren't
seen to be slow/down.  The internal network has plenty of redundancy &
tolerance for slow resolution.  The external views are smaller, with
fewer servers.  Another priority would be zones for which a server is
primary, since it's required for updates.

If I were a DNS provider/registrar, I'd guess that of the (hopefully)
millions of zones that I sold, only a few actually  get a lot of
traffic.  So a scheme where historical query stats drove reload order
would be attractive.  And since I'd sell SLAs, prioritizing the
higher-paying customers would be good business.

Of course, none of that matters if reload times are small enough to
cover expected outage durations with an affordable number of servers.

The key would be the downtime on the database primaries (masters) - that
would prevent my customers from activating/updating their zones.  And a
reason for a database back-end rather than named-managed files - since
DB persistence, consistency, and replication are solved problems in that
world.

Since you're lucky to get through to a (competent technical) help desk
in 10s of minutes, a total downtime (meaning rebooting a server thru
named serving at least key/zones and updates) on the order of 15 minutes
is probably the outer limit.  That's a thumb-in-the-air number, not science.

Hope this helps.

>
> Thank you!
>
> Vicky
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20210910/28cf759d/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 495 bytes
Desc: OpenPGP digital signature
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20210910/28cf759d/attachment-0001.bin>


More information about the bind-users mailing list