Why forwarding is a Bad Thing

Sun Mar 25 23:53:09 UTC 2001

At 12:45 PM +0100 3/25/01, Jim Reid wrote:

>                                                So, all things being
>  equal, overall DNS lookup times will have no significant difference on
>  the delivery of subsequent messages to the list (assuming there were
>  any significant differences for the initial lookups, which I doubt,
>  but won't quibble about).

	Thing is, all things are almost never equal.

>                            If you factor in the overhead of
>  replenishing expired RRs -- with or without forwarding -- they're
>  likely to be lost in the noise. I doubt if anyone could measure the
>  difference in this scenario between a forwarding and non-forwarding
>  server. The chances are that the names will have expired from the name
>  server that's the forwarding target at the same time they expired from
>  the local server. Therefore the lookup overhead will be the same apart
>  from the extra delay of the local server waiting for the target server
>  to answer a lookup that the forwarding server could have done for
>  itself if it didn't forward.

	You're ignoring second-level caching effects resulting from 
multiple clients hitting the same set of central caching servers.  If 
L2 caches never worked under any circumstances whatsoever, then we 
wouldn't have the term "L2 cache".

	The real question is, do L2 caches provide a significant 
measurable advantage in this particular situation.  My anecdotal 
personal experience is that yes, they do make a significant 
measurable difference.  However, I do not have any hard quantifiable 
numbers to back this up.

>  IIUC, the biggest latency problem for mailing lists is not DNS
>  lookups. It's the tardiness of the remote mail servers. The main
>  performance factor is having smart mail software which can parallelise
>  delivery: ie the same message can be sent simultaneously to several
>  recipients. There is a very interesting paper on tuning sendmail for
>  large mailing lists by Rob Kolstad. It barely mentions DNS and none of
>  his tuning tricks relied on anything the name servers did. The URL is:
> 
>	http://www.usenix.org/publications/library/proceedings/lisa97/full_papers/21.kolstad

	I talked to Rob extensively before writing my "Sendmail 
Performance Tuning for Large Systems" paper that I presented at 
SANE'98.  I carefully re-read and reviewed his paper for the "Design 
and Implementation of Highly Scalable E-mail Systems" paper I 
co-wrote with Nick Christenson, and presented at LISA 2000.  I also 
carefully read and reviewed Strata Chalup's paper (among all the 
others I could find on related subjects, all listed in my 
bibliography):

	Chalup, S. R., Hogan, C., Kulosa, G., et. al
	"Drinking from the Fire(walls) Hose: Another Approach to Very 
Large Mailing Lists"
	USENIX, LISA XII Proceedings, December 1998
	<http://www.usenix.org/events/lisa98/full_papers/chalup/chalup_html/chalup.html>


	Unfortunately, the summaries I wrote of their papers for the LISA 
2000 paper had to be omitted from the presentation, but you can read 
them at <http://www.shub-internet.org/brad/papers/dihses/mta-review/>.


	Rob is of the opinion that sendmail gets within a hair's breadth 
of the theoretical maximum performance possible on a local network, 
and therefore no further work ever need be done on it.  It certainly 
doesn't need parallelization, etc....  Once you apply the 
optimizations he discovered (very little of which are directly 
applicable to sendmail itself), he feels that there is little that 
can be done to improve its performance with respect to large mailing 
lists.

	I disagree strongly with Rob, and feel that adding 
parallelization to the mix will significantly help improve the 
performance of handling large mailing lists, which is a large part of 
the reason why qmail and postfix have been so successful in this role.

	I also feel that optimizations such as the sort you've suggested, 
and which I've heard from Bryan Beecher are a good idea -- such as 
having a "fast" machine with very low timeouts handle the initial 
delivery attempt, and anything that doesn't make it in the initial 
attempt should be dumped on a set of "slow" machines with more normal 
timeouts.

	However, I also believe in optimizations that can (and should) be 
applied at the DNS level.


	While I do not have any concrete proof that a second-level 
caching/forwarding design significantly improves overall performance, 
my personal experience is that this is the case.

>  I'd be delighted if you or anyone else could point me at another
>  serious analysis of mailing list throughput and how a forwarding name
>  server "improved" performance.

	As you know, I am interviewing with certain companies where I 
might be able to test theories like this, if I end up getting hired 
by them.  This could potentially lead to another paper to be 
presented at an upcoming conference.

	If I were to be hired by a suitable company, and did have the 
opportunity to conduct tests of this sort, would you be 
willing/interested to join with me as co-author of the paper, and try 
to explore all possible avenues of what does and does not work, as 
well as trying to come up with suitable explanations as to why we 
believe these things to be true?
-- 
Brad Knowles, <brad.knowles at skynet.be>

/*        efdtt.c  Author:  Charles M. Hannum <root at ihack.net>          */
/*       Represented as 1045 digit prime number by Phil Carmody         */
/*     Prime as DNS cname chain by Roy Arends and Walter Belgers        */
/*                                                                      */
/*     Usage is:  cat title-key scrambled.vob | efdtt >clear.vob        */
/*   where title-key = "153 2 8 105 225" or other similar 5-byte key    */

dig decss.friet.org|perl -ne'if(/^x/){s/[x.]//g;print pack(H124,$_)}'