DNS for a search engine

Kevin Darcy kcd at daimlerchrysler.com
Fri Feb 16 22:36:09 UTC 2001


If you disobey TTL's, then you're going to be using stale data at least
part of the time. Don't you care about accuracy?

Perhaps you should be taking the approach of writing an
*adaptive* resolver, one that proactively re-queries certain names,
whenever they expire from cache, if they are popular with the resolver's
clients. It might also be useful to try and parallelize resolvers across
multiple boxes and/or to make the cache available to local clients via
IPC (e.g. shared memory or whatever).


- Kevin

Eric Billingsley wrote:

> How would you configure a set of DNS servers for a search engine?  There
> are two separate issues:
>
> Sites - Name lookups of sites while crawling the web - somewhere around
> 1-20 million names
>
> Clients - for log processing on a daily basis - somewhere around 50
> million IPs
>
> My view of the perfect DNS server for this application:
>
> I wouldn't even try to do DNS for the site here.  This will only be for
> the two tasks above.  I'm picturing a caching name server, but standard
> installations just don't work for this.  I need to tweak the cache so
> that I don't expire the entries for weeks rather than obeying standard
> TTL's.  I would want a very short time-out for the query, but I would
> want to requeue the address on the crawler and have the DNS server try
> the query again immediately with a very long timeout.  That way, the
> next time I try to crawl the site again, the answer would be in the
> cache and not just time out again.  I also want a system where I can
> restart the daemon and preserve the cache (write it to file).  If
> possible, I would even like to be able to copy the cache itself for some
> processes to use directly (very simple format).  I would then want a
> separate thread that would automatically attempt to update the cache
> when my local TTL expires rather than do it at query time.
>
> Beyond that, any way that I can speed up DNS resolution for IPs/names
> not in the cache would be good to know.  If any of you have ever done
> reverse DNS on a log file with 30M IPs you know that ALL of the
> processing time is spent doing DNS rather than actual work.  I used to
> work at AltaVista and this was the biggest pain I ran into.
>
> Any ideas out there about how to configure BIND as it stands today to do
> some of this?  Any help would be appreciated.  If anyone is bored and
> wants to add any of the functionality that isn't already there, that
> would be even more greatly appreciated ; )





More information about the bind-users mailing list