Very odd problems.

Bryan McClendon jidar at par1.net
Wed Sep 27 20:44:23 UTC 2000


I am having what I consider to be strange problems with Bind at our
site.

The Long story:
First the setup. We were using Bind 4 for a few years on Redhat Linux
5.x and did not have any problems with it. A few months ago we purchased
a new server and installed RH 6.2 on the machine, which comes with bind
8. I moved all of our old zones files over, did the named.conf by hand,
and added a $TTL to the top of our zone files. This seemed to work fine.
Then about a week later one of our customers called and said she could
not send mail to a rival ISP in town (terraworld.net). From her
description it sounded like she was unable to resolve the mail host. So
I dropped to a shell and used the 'host' command to lookup their domain
and got "Host not found". Well I told her their DNS was down, and
figured that was that. Well she called back in about 20 minutes and told
me that their tech says everything is fine. So I again checked to see if
it was down, which it appeared to be, then went to www.traceroute.org to
see if it would work from various sites. Invariably, all the cgi
traceroutes I tried were able to lookup the domain so I knew it was
something with us. The first thing I did was I restarted named, and then
tried to lookup the host again, and this time it worked. Ack!

I worked on this problem off and on for the next two weeks and never
solved it. I tried new memory, switching CPU's, new MB, and upgrading to
the latest bind and 2.2 kernel to no avail. I don't know why, but when
it happens I have to restart named. To solve the problem I have setup a
cron that restarts named every 2 hours, nasty I know but what can I do?

See the configs for our setup look here:
http://morpheus.par1.net/~gspot/named.par1.net
http://morpheus.par1.net/~gspot/named.conf
In fact, all of the relevant files are there, knock yourself out.

If only it stopped there.

I just recently setup a linux server for a different ISP at
(oswego.net). Again I used RH 6.2 and took his old named files from bind
4 and set it up on bind 8. I set him up with single A records for each
IP and then used multiple CNAME entries to have various other names
point to each IP. This problem is a little different. Everything seems
to work fine locally, but from my site I was having trouble resolving
some of his names. I didn't understand why, but after reading the HOWTO
decided to just change every entry to an A record. It seemed to work
okay over the weekend, but today he called me and said he couldn't
resolve his mail server from his place of work (he has a day job at a
bank). I tried resolving it myself, but I couldn't. Hrm. Ok so I
telneted into his machine and tried to resolve it there, it worked fine.
Then the last thing I tried, I went to a few places on
www.traceroute.org and wouldn't you know it, they resolved his mail
server fine as well. I restarted my name server and then it was able to
resolve his name also, but not until a large delay had passed. (It took
2 mins maybe..?)
As of right now he still cannot resolve that mail server name from his
office at the bank, which uses yet a 3rd ISP. I have a feeling if I were
to have access to that ISP, and I restarted the name server, it would
work. 
Relevant files here:
http://morpheus.par1.net/~gspot/oswego/named.conf
http://morpheus.par1.net/~gspot/oswego/named.oswego.net
Others are there too...

The thing is, what the heck is going on here? I am thoroughly confused.
Is this some kind of strange thing with RH? If so, why is the bank
having that same problem? If so, why is it always just happening to a
few domains? What can I do to resolve this, or maybe get more info? Im
not opposed to any suggestions, I'm at the end of my rope.

Bryan McClendon
Parsons Internet
jidar at par1.net



More information about the bind-users mailing list