Centralized DNS Caching for SMTP Servers? (was Re: What kind of hardware?)

Sat Mar 10 04:07:23 UTC 2001

At 7:50 PM -0500 3/9/01, Kevin Darcy wrote:

>  But the delays to go fetch that data tend to be measured in milliseconds,
>  which isn't noticeable in the context of mail delivery.

	I used to think that too, until I set up a forwarding caching 
system of this sort.  The reality of it is, when you're talking about 
hundreds of thousands or millions of mail messages per day, each of 
which could have dozens of additional milliseconds of delay added for 
each and every address (both sender and all recipients), this really 
does add up.

>                                                           I'd rather spend
>  that few milliseconds to increase my chances of a successful delivery.

	Or increase your chances that you'll get a screwed up copy this 
time and actually make the situation worse.  Remember -- if the glass 
is half full, by definition it is also half empty.  If you are going 
to take the gamble, then you are guaranteed to sometimes lose that 
gamble, and it's in times when you lose the gamble that consistency 
is most important.  The wins are not as high up as the losses are 
down low, so consistency overall is the best solution.

	Note that this sort of technique is also most applicable when 
playing in the stock market -- your best method of investing is to 
simply put in a small amount of money on a regular basis, and not try 
to play games with precisely how much will be put in when, because 
more often than not you will guess wrong.  Likewise, the best 
long-term strategy is to put your money in index funds as opposed to 
funds that try to outperform the indexes, because you have the entire 
population of everyone in the market putting their intelligence 
behind the indexes, while you have only a subset of that intelligence 
behind the funds that try to outperform the indexes.

>  Get a grip. I was addressing your previous point about master/slave
>  propagation delays -- "if you have a case where some authoritative
>  servers haven't picked up the latest changes". Obviously, NOTIFY figures
>  prominently in that equation.

	That's assuming that everyone supports NOTIFY, which is patently 
untrue.  Like it or not, not all the world is recent versions of BIND.

>  I don't want to tell you how to run your business, but it seems to 
>me that, no
>  matter how you slice it, if the complaints are bogus, i.e. some messages went
>  through, but not others, or messages were received out of order, 
>solely due to
>  problems with *other*people's* DNS, then the bogosity of those complaints
>  should have been recognized and disposed of by 3 levels of Help Desk.

	But those complaints aren't bogus.  The problem may actually have 
been with a network between our site and theirs, and may have only 
lived for a few seconds -- but those were the critical few seconds 
that caused one out of the three messages to be lost.  Likewise, we 
can't prove that BIND is 100% correct in all circumstances, so the 
problem might actually have been on our local server.

	Since we can't be certain where the real problem lies, we must 
fall back on the old adage "Be conservative in what you send, and 
liberal in what you accept".  In this case, what the customer 
*INSISTS* on is consistency, and if you don't provide that, then you 
do so at your own peril.  When you're talking businesses that are 
trying to shave the last thousandth of a penny off their per-user 
costs, because that might be the hairs-breadth difference between 
profitability and not, this is a *VERY* important issue.

>  By the way, don't you give your Help Desks *tools* so that they can easily
>  diagnose common problems on their own?

	What kind of tools are you going to give them?  There are over 
10,000 helpdeskers.  Do you really want to give them all accounts on 
your mail servers?  Do you really want to give them all root 
password?  Please explain to me how you are going to give each and 
every one of them the ten-plus years worth experience it may very 
well take in order to track down the source for a problem, and even 
then if the problem is transient in nature, you may simply never know.

>  In your world, perhaps this is a "loss". In my world, it is only a temporary
>  inconvenience.

	The more times you throw the dice, the more opportunity you have 
to lose.  Myself, I prefer to gamble as little as possible when 
you're talking about a mission-critical core infrastructure business.

>                  We have a formal policy stating that Internet email is
>  inherently unreliable and that for time-sensitive and mission-critical
>  information, *multiple* communication methods should be used, not just
>  email.

	Maybe you can get away with that in private industry, but as an 
ISP, that simply is not an acceptable answer.  As an ISP, you have to 
take this matter every bit as seriously as the customer demands, and 
while you can make some caveats about the Internet being unreliable, 
etc... you must treat this as the mission-critical application that 
the users consider it to be.

	If you want to make a comparison to manufacturing, this is a six 
sigma issue, and the entire business literally depends on getting 
this right.

>          Yes, I know that sounds like just a big CYA, but in this
>  particular case, it happens to be *true* -- Internet email *is*
>  inherently unreliable.

	That doesn't matter.  Users don't care.  That's what they're 
paying you for.

>                          We say that up front to our users, and
>  most of them understand and accept it. If they don't, then, frankly, too
>  bad for them.

	When you've got a captive market that can't go anywhere else, you 
can pretty much do whatever you want to the serfs, and they have no 
avenue of recourse.

>  (This is not to deny that we are becoming increasingly dependent on email
>  for doing business. But we are unlikely to change our policy in the
>  foreseeable future with respect to Internet email.

	At the kick-off meeting for the Internet Mail Consortion in 1995 
(at which I was the representative for AOL), I seem to recall that 
there was a representative from one of the major US automakers that 
was very actively involved in most of the discussions that took 
place, and their primary plan for this technology was to make it a 
key part of their EDI plans, so that they could improve their 
just-in-time manufacturing and their six-sigma, and reduce their 
on-hand inventory.

	If your company is not involved in these sorts of things, and 
looking at all the potential uses for the technology in that arena, 
then I think I can begin to understand the recent announcements 
regarding profitability problems.

>                                                     If we _were_ to amend
>  the policy, it would probably only be with respect to closed,
>  better-controlled business-exchange networks like ANX, the whole
>  _raison_d'etre_ of which is to provide greater reliablity for
>  mission-critical functions).

	Some of the problems are inherent in the technology itself, 
regardless of the infrastructure below it.  Since it is not 
technically possible to prove non-trivial programs correct, you 
simply have to accept that there will occasionally be failures, and 
you have to deal with that.


	Moreover, if you want to build this stuff on top of Internet 
technology (which seems to be the direction everyone is going, even 
if they are doing it over private networks), then there are a whole 
additional host of potential problems that you can run into, not the 
least of which is that the code coming out of Sendmail and Nominum is 
not coming from CMM level 5 organizations.

	These two companies do the best the can, and in the Internet 
world they are both a damn sight better than just about anyone else 
out there, but they are simply not structured to produce software of 
the sort of quality you'd be willing to put into mission-critical 
applications on a Shuttle launch.  And even with CMM level 5 
organizations, there are occasionally catastrophic failures of 
unforgivable proportions.

>                If the failure occurred because of inconsistency in *their*
>  DNS data, and we exercised due diligence in the face of that inconsistency,
>  then perhaps that's *all* that matters, in the eyes of upper management.
>  The consistency or inconsistency of our mail servers' delivery behavior,
>  in response to that DNS inconsistency, matters not a whit, only that we
>  tried our best and that the other guy blew it. That's the brutal
>  *political* reality under which we operate.

	You obviously are not a lawyer.  Neither am I, but I am married 
to one -- the term "due diligence" has a very specific legal meaning, 
and should not be bandied about carelessly.


	The truth of the matter is that you don't have a permanent record 
of every single DNS packet that went into that server, you don't have 
an event-driven model of precisely what was going on at the 
application level, the OS level, and the hardware level, and 
therefore you cannot possibly know whether a problem was truly the 
fault of the remote server or your local server (or some network in 
between that perhaps neither of you are aware of or have any control 
over) in all cases.

	Indeed, I would submit that you probably can't answer a question 
like this with 100% certainty in the majority of cases, precisely 
because of the transient nature of the DNS.

>  Sorry, you lost me there. Are parts A & C somehow *dependent* on part B???
>  It's not clear to me how the delivery or non-delivery of one message affects
>  the consequences of any of the other messages...

	They form part of a whole.  If you don't have all three, you 
don't have the necessary components to make the whole object.

>  It occurs to me, however, that where several email messages have
>  dependencies between each other, long-standing business conventions
>  (which predate email), like the "X of Y" convention, e.g. "part 3 of 7"
>  can help in the detection of missing or delayed messages.

	This assumes that the same entities on both sides are always the 
same ones communicating with each other, and that they both have 
complete knowledge of then entire transaction.

	In the real world, people have to deal with partial knowledge all 
the time, and they frequently don't have the same two people on 
either end that are always handling all conversations on a particular 
topic.

>  But in most cases, given the nature of email, these kinds of data
>  inconsistencies cause *delays*, rather than immediate failures, don't they?
>  E.g. MX points to a dead address, MX points to a machine that's not running
>  SMTP, etc.

	In cases of just-in-time manufacturing, or where you are trying 
to get work done by a particular deadline, or where all bids have to 
be in by a certain deadline and you need certain sub-bids before you 
can complete your umbrella bid, a delay is just as bad as a total 
failure.

>              In the majority of cases, I assume, the mail *does* eventually
>  get through, the only question is whether it gets through in a timely
>  fashion.

	Perhaps, perhaps not.  Again, we get back to the matter of 
consistency being more important than things occasionally getting 
through and occasionally not, so that people have a known response in 
a given period of time and if they get a failure they can take quick 
alternative action.  Silently delaying things can kill them as 
quickly as being hit by a speeding planetoid.

>            In our business, anything that's important and time-sensitive
>  gets a phone confirmation.

	Again, this gets back to your business.  Not everyone operates 
this way -- many depend on the Internet for mission-critical 
services, of which the only generally mission critical application 
(in my experience) is Internet e-mail.

>                              If not confirmed in a timely fashion, then,
>  yes, the sender will send the message again. At worst case, our mail
>  software sends a "4-hour warning" message; *then* they'll know that
>  something went wrong, and re-send the message.

	Ahh, so you're saying that you *do* have some internal server 
consistency after all, and that you send back warnings if the message 
hasn't been sent within a four hour period of time.  I'm glad to see 
that you haven't completely thrown consistency to the wind.

>  So, basically I think *notification* is a red herring here: users get
>  notification for failed delivery attempts *regardless* of whether the
>  cache is centralized or not. The real bone of contention is whether to
>  take an "all or nothing" (centralized cache) or a "do as much as you
>  can" (de-centralized cache) approach.

	Tell you what.  Why don't you try a real-world experiment, and 
turn on forwarding caching for all your servers for a month.  When 
you see the levels of reduced traffic, when you can see the reduced 
transaction times (due to decresed latency resulting from the larger 
centralized caches), and when you can put all that together into 
cold, hard numbers, then you come back and give us a report.

	My experience is that while it doesn't *seem* that forwarding 
caching servers of this sort would make all that much difference, 
when you start talking about real-world observed behaviour, there is 
actually quite a noticable difference.

>  Please clarify what you mean by "returning inconsistent answers". With a
>  centralized cache, either all of the mail goes through normally, or all of
>  the mail fails or gets delayed.

	No, all the mail gets handled the same way.  This is something 
that the users can understand, and in fact this is something that the 
users *INSIST* on having.


	They can understand when mail bounces -- sometimes they've got 
the wrong address, sometimes the server at the other end is screwed 
up, etc....

	They can obviously understand that usually mail goes through just 
fine, and there aren't any bounces.

	What confuses them to *NO END* is when they send three messages 
in short succession to the exact same address, and the first and 
third get through but the second one doesn't.  This is when they go 
ThermonNuclear, and they spend hundreds of dollars of customer 
support time in trying to get this issue dealt with.

	This is also a time where if they make a single call to the 
customer support facilities that month (or maybe that year), then 
they have already cost the company more revenue than the customer 
would generate, so *ABOVE ALL ELSE*, you have to give the customer 
consistent answers that they can understand, so they don't get 
confused and call the help desk.

>                                                     But since DNS cache
>  data is inherently ephemeral anyway, and, as I pointed out earlier, all
>  you're really doing is affecting the *granularity* of the inconsistency,
>  it's difficult to imagine how what we're discussing even makes it onto
>  the radar screen of "Unix Security"...

	Problems can come from anywhere.  You could have an issue with 
sunspot activity that affects your machine for a brief period of 
time, and this affects the answer that it hands out.  There could be 
network problems on your end.  There could be network problems 
somewhere between your end and the other end.  The potential sources 
for problems in this matter are literally endless.

	Therefore, the more you can do to smooth out this potential 
roller-coaster, the better.  This also means that you have to try to 
reduce or eliminate unnecessary and duplicative traffic wherever 
possible, so that regardless of the nature of the answer, you do 
everything you possibly can to make sure that you consistently hand 
out the *same* answer to the *same* question, in accordance with the 
information that you have in your database.

	Obviously, when that information in your database changes (e.g., 
a record expires, etc...), then you hand out a different answer for 
that question, but you do so consistently, and so long as all the 
clocks on all the machines are in sync via NTP, you can have a decent 
chance of being able to say that "as of time XYZ, we noticed this 
change and the status went from Purple to Chartreuse" and having that 
statement actually be an accurate statement of fact that will hold up 
in court (should you get sued over the matter), and be one that is 
applicable to all servers.

>  Lame servers aren't the main problem.

	They are a problem, but they're probably not the biggest problem. 
However, they are a relatively easy problem to detect and fix, and 
they are a good indicator of the overall level of pure total crap 
that is out there.  The more lame delegations there are, the more 
other garbage is out there that shouldn't be.

>  I think you inverted my argument somewhere along the line. I thought I
>  had made clear that I really didn't *want* to have to do periodic
>  restarts, expressly *because*, _inter_alia_, it would put a load on the
>  TLD nameservers.

	Unfortunately, IMO doing periodic reloads is simply a part of the 
way business has to be done these days, and therefore your desire to 
avoid the unnecessary additional load that this would present on the 
root nameservers is an empty and specious argument.  That load will 
be there, that load will *have* to be there, simply because of all 
the crap that is in the DNS and the limited number of ways in which 
you can get rid of the crap.

	What a centralized caching nameserver farm gives you is an 
additional level of indirection through which this traffic would be 
filtered, so as to reduce the unnecessary additional load on the root 
nameservers, while still supporting the reduction of the garbage that 
is cached locally.  Indeed, it's the only way to do that.

--
Brad Knowles, <brad.knowles at skynet.be>

#!/usr/bin/perl -w
# 531-byte qrpff-fast, Keith Winstein and Marc Horowitz <sipb-iap-dvd at mit.edu>
# MPEG 2 PS VOB file on stdin -> descrambled output on stdout
# arguments: title key bytes in least to most-significant order
# Usage:
# qrpff 153 2 8 105 225 /mnt/dvd/VOB_FILE_NAME | extract_mpeg2 | mpeg2_dec -
$_='while(read+STDIN,$_,2048){$a=29;$b=73;$c=142;$t=255;@t=map{$_%16or$t^=$c^=(
$m=(11,10,116,100,11,122,20,100)[$_/16%8])&110;$t^=(72, at z=(64,72,$a^=12*($_%16
-2?0:$m&17)),$b^=$_%64?12:0, at z)[$_%8]}(16..271);if((@a=unx"C*",$_)[20]&48){$h
=5;$_=unxb24,join"", at b=map{xB8,unxb8,chr($_^$a[--$h+84])}@ARGV;s/...$/1$&/;$
d=unxV,xb25,$_;$e=256|(ord$b[4])<<9|ord$b[3];$d=$d>>8^($f=$t&($d>>12^$d>>4^
$d^$d/8))<<17,$e=$e>>8^($t&($g=($q=$e>>14&7^$e)^$q*8^$q<<6))<<9,$_=$t[$_]^
(($h>>=8)+=$f+(~$g&$t))for at a[128..$#a]}print+x"C*", at a}';s/x/pack+/g;eval