What kind of hardware?

Thu Mar 8 23:44:13 UTC 2001

Brad Knowles wrote:

> At 2:44 PM -0500 3/8/01, Kevin Darcy wrote:
>
> >  As some of you may know, I have a low opinion of forwarding. In my view,
> >  it's a necessary evil in some situations to get around connectivity
> >  problems, e.g. firewall boundaries, and in *some* network/DNS
> >  architectures conveys performance benefits *some* of the time, but an
> >  evil nonetheless and something generally to be avoided. So I'm
> >  naturally biased against your "preferred" setup to begin with.
>
>         On the networks on which I've set this stuff up, the central
> caching nameservers were already doing multiple hundreds or tens of
> thousands of DNS queries per second, and without the caching
> forwarding servers running locally on each machine performing a major
> service, the central caching nameservers would get pummeled into the
> ground.  On busy networks, I simply don't see any other alternative.

I think you misunderstood. I'm not denying that mail servers should run local
caching nameservers. Obviously they should. I was taking issue only with the
*forwarding* part. If every mail server runs an *autonomous* caching server, this
would actually alleviate the load on those central caching servers, would it not?

> >  Having disclosed that, I have to wonder, why would a non-forwarding
> >  caching server get *inconsistent* results?
>
>         Well, if you have a case where some authoritative servers haven't
> picked up the latest changes, and one local caching nameserver
> decides to ask a particular question of one authoritative server, and
> another local caching nameserver decides to ask the same question of
> a different authoritative server, they could obviously get totally
> different answers

Okay, propagation delay, I understand that perfectly. But how many folks run
non-NOTIFY-capable nameservers these days, really? These propagation problems
should eventually go away, or at least subside to "noise" level.

> >                                              You mean, because the
> >  administrators of the target domain screw something up, so that some
> >  authoritative servers for their domain give out different data than
> >  the others?
>
>         Yup.  See above.
>
> >               Frankly, I don't see this as my problem to fix (or work
> >  around); it's *their* problem to fix.
>
>         Actually, this is a problem that is inherent in the nature of the
> way the DNS works, since secondaries can always be more or less
> out-of-sync with the primaries, and records for mail (or other
> services) could well be different between those different copies of
> the zone(s).
>
> >                                                                I mean,
> >  what if somebody's slave DNS server is temporarily hacked, with the
> >  hacker attempting to intercept the domain's mail? If you happen to
> >  cache the malicious MX record, now suddenly *all* of the mail destined
> >  from your servers to that domain is going into the bad guy's mailbox,
> >  perhaps long after the intrusion is detected and the genuine MX record
> >  restored (obviously if the bad guy is any good, he'll set a high TTL
> >  value on the malicious record to maximize the effect of the hijacking).
>
>         Consistency is more important than occasionally getting the
> correct answer.  This is the lesson behind all cache poisoning
> attacks.

Hmmm... I thought "the" lesson behind all cache poisoning attacks was "don't
implicitly all of the RR's one sees in a DNS response packet". Of course, there
could be multiple lessons, but I'm not sure that "consistency over occasional
correctness" rates very high on the list, if it's on there at all...

>         Try being the Internet mail systems administrator for twenty
> million people, and literally getting hundreds or thousands of
> complaints *PER DAY* in your private mailbox, where Customer A is
> complaining that they sent three different messages to Recipient B,
> and only the first and third messages were delivered, because the
> second one was routed through a different machine which had a
> different view of the Internet.

With all due respect, if one mailbox is getting thousands of complaints a day from
a userbase of 20 million people, then a) that's actually not a very high
percentage, and b) sounds like an organizational problem -- isn't that what
low-level tech-support helpdesks largely exist for, i.e. to filter out "routine"
(non-)problems so that people with clues can get their work done?

>         I'll say it again -- consistency in always giving the same answer
> is more important than occasionally getting the "right" answer.
>
Maybe it's just a difference in userbase and/or requirements. I assume this is
some ISP or ISP-like entity serving these 20 million people, where email *is* one
of the main reasons, if not *the* reason, for the business relationship between
the provider and the end-user in the first place. When your main business is
something *other* than providing Internet services, however, e.g. building cars,
then often it's far more important that the mail *gets*through*, even if it has to
be re-sent over and over, than just the *appearance* that everything is working
smoothly. If an email miscommunication with one of our parts suppliers results in
the shipping of the wrong part and ultimately the idling of a production line (at
millions of dollars a minute), then it will be little consolation to hear that,
despite the fact that a good MX was available in DNS, at least the email servers
ignored that MX and failed *consistently*. We operate on a "get the mail there no
matter what" basis, rather than on a "hide the inherent inconsistencies of the
Internet from the users so it won't confuse and/or annoy them" basis. I'm not
saying one is inherently more "valid" or "legitimate" than the other; just that
they are driven by different needs.

To give you another example of this _modus_operandi_: if the quality of Internet
DNS administration keeps dropping like it has been, I may have to even start
*restarting* the nameservers periodically on my mail gateways, in order to flush
out all the botched NS RRsets that I keep seeing in the cache. That, of course,
would be a *terrible* waste of resources -- not only my resources, but those of
the TLD servers and of all of the Internet nameservers for the domains with which
we communicate. But my mandate is "get the mail through", and to accomplish that,
I'll resort to draconian measures like periodic restarts, if necessary...

- Kevin