Timeouts during cache cleaning and zone collection

Mon Jun 20 10:43:40 UTC 2005

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Karl,

the way bind is designed those moments when bind goes "offline" for
cache cleaning and reloading zones is normal. Maybe that is the
reason why you should have more than one DNS-server in the first place.

My own bind with some 40 zones is not to very annoying but I do observe
a lot of timeouts and even give-ups when it looks for serial numbers on
its masters. It is in the logs but I do not notice it.

I do notice that it takes more time to start bind with every additional
zone it loads.

If you can then split authoritative servers for the outside and caches
for the inside.

Nevertheless it is a good idea to load some zones on the inside resolvers

1. Your own zones should be permanently on your resolvers. You know better
than the root-servers what belongs to you. Nobody can poison your cache
about information your server is authoritative for.

2. The root zone. It is the single most problematic point of failure. There
have been attacks on the root-servers. The root zone file really is not so
big at all. I do clone a.public-root.net for the "." zone. All the
public-root.net servers allow axfr transfer. If the root breakes then my
nameserver will continue running for at least two weeks. I know a lot of
other root-servers do not allow zone transfer for security reasons. What
security?

3. Companies you do bussiness with. Especially banks know about security
problems with poisoned servers. Many think about publishing their zone
information and exchanging zone information with regular costumers.

Windows allows 2 nameservers other operating systems allow 3 nameservers
so you should have 2 or 3 resolvers inside your company. Hide them
behind a firewall. Dont let normal workstations query outside nameservers.

Having 2 or 3 nameervers nobody will notice that one of them is day
dreaming from time to time.

If you arrange for server1 asking outside1 (backup outside2, backup outside3),
server2 asking server1 (backup outside2, backup outside3) and server3 asking
server2 (backup outside3, backup outside1) then rarely your servers will
update at the same time. Nobody will notice the failure of a single server.

Auer, Karl James wrote:
| Hi there.
|
| We are seeing a problem with BIND 9.3.0, compiled with threading on
| Solaris, whereby the servers stop answering queries for a couple of
| seconds. Qeuries in this interval time out. That is, they are not
| answered slowly, they are not answered at all.
|
| The servers do this a) when they clean their caches and b) when they are
| downloading zones.
|
| Archived messages on the matter of cache cleaning suggest that these
| timeouts are normal for BIND, and that the only way to avoid them is to
| set turn cache cleaning off. I've tried setting the cleaning interval to
| only a few minutes, but it just caused more timeouts - there seems to be
| a sort of minimum interruption due to cache cleaning.
|
| Of more concern are the interruptions due to zone downloads. We have a
| (poorly designed) system whereby our zone files are generated from a
| database if required; the zone files are completely rewritten, with a
| new serial number, and a hidden master is reloaded. That causes a bunch
| of notifies to hit the secondaries, which then reload those zones from
| the hidden master. That process causes these timeouts. The secondaries
| are the nameservers that field all our queries. Note that about 400
| zones are downloaded, though most are very small or even empty. There
| are a couple of larger ones, but even there we are only loading 50000 or
| so entries.
|
| We have not separated authoritative and caching nameserver functions.
| The affected secondaries handle internal and external queries, but are
| not under a particularly heavy load. Even if we did separate the
| functions, it wouldn't help (I think) because the caching issue would
| still be there on the caching servers, and the download issue would
| still be there on the authoritative servers. We want to separate the
| functions anyway, for all the usual reasons.
|
| Note that this problem isn't new - it's just that our monitoring has
| improved :-)
|
| So my question: Is it normal for a BIND server to stop answering queries
| during zone downloads? If not what might be the problem here?
|
| Regards, K.
|
| --=20
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| Karl Auer (karl.auer at id.ethz.ch)       Geschaeft/work +41- 1-6327531
| Kommunikation, ETHZ RZ                    Privat/home +41-43-2660706
| Eidgenoessische Technische Hochschule, Zuerich    Fax +41- 1-6321225
| Clausiusstrasse 59 CH-8092 ZUERICH Switzerland
|
|

- --
Peter and Karin Dambier
Public-Root
Graeffstrasse 14
D-64646 Heppenheim
+49-6252-671788 (Telekom)
+49-6252-599091 (O2 Genion)
+1-360-226-6583-9738 (INAIC)
mail: peter at peter-dambier.de
http://iason.site.voila.fr
http://www.kokoom.com/iason
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQFCtp3aPGG/Vycj6zYRArZlAJwO5TdSHp1C0fgf95qLNNKLXQKEIACffc0t
Asz9XF48Te9fcX5Q9q2Bhj0=
=u7Ml
-----END PGP SIGNATURE-----