IP addresses in NS records seem to be breaking hostname resolution

Wed Jul 17 21:07:49 UTC 2002

Chris Davis wrote:

> >I'm not sure what you're trying to accomplish here. "Reject or dump" bogus
> >NS records? The NS records are already unresolvable, how much more
> "rejection"
> >do you want?
>
> The bad NS records are unresolvable but they are accepted and kept in the
> cache and are tried until they expire, resulting in name resolution failure.
> I would prefer the bad NS records be either:
>
> 1- flatly rejected via a "let me see if these contain something other than
> just dots and numbers before I cache them" routine
>
>  or
>
> 2- dumped if they're all discovered to be unresolvable after they're tried
> via a "well heck, I tried all the NSes for this domain but none worked, so
> I'm dumping these NS records from my cache" routine.  Where's the sense in
> bind holding onto known unresolvable NS RRs?

For the same reason that named caches *any* RRs: because someone may ask for
them in the future (e.g. an explicit QTYPE=NS query for the relevant name).

Also, just because a name is "unresolvable" at one point in time doesn't mean
it might not become resolvable in a few seconds, a few minutes or a few hours.
Do you throw away good parts of your car, just because they happen to depend on
other parts that may be defective or worn out? No, you keep the good parts and
repair/replace the good parts. Same with cached DNS data. Keep the NS records
because maybe the associated A records will show up eventually and then you'll
have the entire "chain".

As a general rule, then, named does not throw away records whose RDATA contains
names which do not resolve. While it may on the surface seem pointless in this
particular instance, it is an application of that general rule.

> 3- refused by named at startup on the server with the bad NS records.

On the master server, you mean. How does the master know that a given NS record
is "bad" or not? Are you suggesting that it go out and do an A record query for
*every* name it sees in an NS record RDATA? Why limit it to NS records?
Shouldn't it be doing the same for MX records? CNAMEs? PTRs? SRVs? This is
going to add a lot of overhead and time to the zone-loading process. Or do we
just pick on NS records because they are "special"?

And what if an NS record is valid at time of load, but someone subsequently
deletes the A record, so it goes from being valid to being invalid? Are you
going to periodically check *all* names referred to by NS records? If you're
not willing to do that, then you have a situation where the existence of a
given NS record may depend on how often or how recently the zone containing it
was reloaded. Moreover, if you're going to be deleting an NS record from the
master server, then according to the rules, the zone itself has changed
(because the master is the origin of the zone data), and you need to increment
the serial number, replicate to slaves, etc.

And, what is named to do if *all* of the NS records for a zone are bad? Fail to
load the zone for lack of valid NS records? That's really no different than
just giving the data out as is, bogus records and all. Either way, the zone
becomes unresolvable. Somebody eventually notices and the problem gets fixed.

> >If an NS RRset consists of mixture of good and bad names,
> >nameservers will automatically find the good ones after maybe a few wasted
> >queries; if the RRset contains only bad names, it's useless anyway.
>
> The only broken records are NS (and SOA but that's another story).  The A
> and MX records that I need are good.  It's not useless at all.  It's just a
> little broken but in a very unfortunate way.

A *fatal* way, if everyone follows the standards.

> > Seems like what you really want here is a "treat DNS failure as a fatal
> >error" option/setting in your mailer software. That's not really a DNS
> issue,
> >_per_se_...
>
> That's not all I want.  I also cannot pull up their website with the bad NS
> records in my dns cache, making it a browser issue in addition to being a
> mail issue, if it's not a dns issue.
>
> However, it is a DNS issue.
>
> In my opinion, bind should reject malformed NS records rather than inserting
> them into the cache, dump them from the cache after they're deemed to be
> unresolvable, or alternatively, named should refuse to load zones with
> malformed NS records.

They're not "malformed". They are syntactically perfect. They just happen to
point to non-existent nodes.

> Refusal by named to load zone with bad NS RRs is probably the most likely
> not to violate RFCs, but it's also the least effective near term solution
> because it depends on admins of defective zones to upgrade their bind
> versions to a version with the new feature.  Long term, the "named refuses
> to load zones with defective NS RRs" routine seems like the best solution to
> minimize the problem.

See above. You're essentially arguing that named -- or nameservers in general
-- should break its general rules in the special case of NS records pointing to
all-numeric names. I don't think it is good engineering practice to carve out
arbitrary special cases like that. This is a DNS operational problem, like any
of the myriad of other possible DNS operational problems. If someone screws up
their NS records, things break, they get notified, they fix it, and with any
luck they'll learn a lesson and never screw up in the same way again. Life goes
on.

Plus, there really is no viable alternative. If we go back to the bad old
pre-RFC-2181 behavior, falling back on referral data, then we invite back all
of those old problems with bad/stale delegations. I'd rather put the
responsibility where it belongs -- with the maintainer of the zone -- than to
confuse the troubleshooting process, and invite even more varieties of admin
error, by adding referral data to the mix. The data-ranking rules in RFC 2181
were quite deliberately written to achieve the current state of affairs. Let's
not backslide.

- Kevin