How stuff is resolved

Tue Jan 30 02:54:39 UTC 2001

Kingsley Tart wrote:

> Is this right?
>
> eg. I want to send Email to fred at company.com, I presume this is what
> happens:
>
> 1. resolver asks ?.root-servers.net for NS record for "com."
> 2. resolver asks results of (1) for NS record of "company.com."
> 3. resolver asks results of (2) for MX record of "company.com."
> etc

More like:

1. resolver asks ?.root-servers.net for "company.com" MX record
2. based on the NS and A records in Authority and Additional Sections of the
response, resolver asks a "com" server for "company.com" MX record
3. based on the NS and A records in Authority and Additional Sections of the
response, resolver asks a "company.com" server for "company.com" MX record.

Note however that this is the sequence *only* when the resolver has no
relevant referral information cached at all. If referral information is
cached, then some of the steps can be skipped. Also, if the resolver gets
NS records in the Authority Section of a response with no matching A records
in the Additional Section, it may have to generate some interstitial queries
to resolve those A records.

Note also that in Step 3, if the response from the authoritative
"company.com" nameserver includes NS records in the Authority Section, then
those NS'es will *override* the NS records obtained in Step 2. This is because
NS'es from an authoritative server are considered more "credible" than
NS records obtained from a higher-level server's referral. In effect, though,
people often screw up the NS records in their zones, or those NS records drift
out of synch with the delegation NS recoreds, with the result that what is
considered more "credible" isn't necessarily more *reasonable*.

> I have a problem with a mail server rejecting mail from addresses where it
> can't find the MX records for a domain where they actually do exist. If I
> telnet in to the mail server box and do "host -t mx domain.com" a few times
> so that it finds it, the mail server then accepts the mail from that domain
> because the local DNS has how cached the records.
>
> This is causing us a big problem! Any help would be appreciated ...

A lot of folks out there have screwed up their DNS, and this means it can
sometimes take an excessive amount of time to resolve MX records. Add to this
the fact that BIND 8 lacks "query restart" (which basically means that in some
pathological situations, it just gives up on a query it has partially
resolved, hoping that the client will keep retrying and eventually it'll be
able to resolve it fully), and the fact that sendmail's resolver subsystem has
typically been a "black box" (of course I don't know whether you're using
sendmail or some other mail software), and the phenomenon you describe is
hardly surprising. Oftentimes mailers just give up before the MX record
resolves. As you have seen, you can force the MX record into the cache by
issuing a bunch of queries, and then the mail server can look up the MX record
just fine for a while, but this is hardly a reasonable long-term strategy.

If you're running sendmail 8.10.1 or later, or are willing to upgrade, try
tweaking the various Timeout.resolver settings in your config. Also, you might
want to consider running BIND 9.1, which supposedly has query restart and
should therefore be able to better deal with partial responses from other
nameservers without requiring retries from the client.

- Kevin