stops answering at same time every day

Allie M Hopkins allie at lsu.edu
Tue Mar 1 19:29:50 UTC 2005





I finally figured it out.  I ran tcpdump on the interface of the master a
few times before with no luck in discovering the issue.  It was only by
chance I started debugging another server that I found the culprit.  Our
tsm backup was scheduled to start to 6pm every afternoon (not by my choice)
and it was completely hogging all the network bandwidth for the duration of
the backup.  Our master uses a different interface for communicating to the
backup server than what it uses for bind.  This is why I never saw it.  I
started logging the traffic on one of my slaves and that one happens to use
the same interface for both bind and tsm backups.  I saw all the traffic
from the backup and hardly anything being communicated for bind queries.
As soon as the backup ended, the queries started up again.  No "strange"
process, CPU actually would decrease over time because it was only really
working this one function, and no strange connections.  Of course not, this
was all by design, well, expect for the fact that tsm was taking over.

Thanks for all the input.  I'm just glad I figured it out.

Allie


|---------+---------------------------->
|         |           "Evan Xinos"     |
|         |           <ex at bcapub.com>  |
|         |                            |
|         |           02/28/2005 07:04 |
|         |           AM               |
|---------+---------------------------->
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                                                                                  |
  |       To:       "'Allie M Hopkins'" <allie at lsu.edu>                                                                                              |
  |       cc:                                                                                                                                        |
  |       Subject:  RE: stops answering at same time every day                                                                                       |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|




Can u tcpdump the interface that bind should be answering on?  As I have
heard of times when bind would not answer in light of many updates being
sent out at the same time.  But I believe this was on an older version...


__________________________________
Evan Xinos
System Support Analyst
BCA Publications
(514)499-9550 ext 222
mailto:evan at bcaresearch.com
http://www.bcaresearch.com
__________________________________


-----Original Message-----
From: Allie M Hopkins [mailto:allie at lsu.edu]
Sent: Friday, February 25, 2005 5:30 PM
To: Evan Xinos
Cc: bind9-users at isc.org
Subject: RE: stops answering at same time every day









No local queries work.  It's as if named isn't even running, but it is.  It
receives all the queries and zone transfer requests, but just doesn't
answer.


Allie





|---------+---------------------------->
|         |           "Evan Xinos"     |
|         |           <ex at bcapub.com>  |
|         |                            |
|         |           02/25/2005 09:55 |
|         |           AM               |
|---------+---------------------------->

>--------------------------------------------------------------------------------------------------------------------------------------------------|


  |
|


  |       To:       "'Allie M Hopkins'" <allie at lsu.edu>
|


  |       cc:
|


  |       Subject:  RE: stops answering at same time every day
|



>--------------------------------------------------------------------------------------------------------------------------------------------------|







Allie does the server respond to your local dig during this outage time?
Could it be some sort of firewall policy on the network somewhere?








__________________________________
Evan Xinos
System Support Analyst
BCA Publications
(514)499-9550 ext 222
mailto:evan at bcaresearch.com
http://www.bcaresearch.com
__________________________________





-----Original Message-----
From: bind-users-bounce at isc.org [mailto:bind-users-bounce at isc.org] On
Behalf Of Allie M Hopkins
Sent: Friday, February 25, 2005 9:57 AM
To: Allie M Hopkins
Cc: bind-users-bounce at isc.org; bind9-users at isc.org
Subject: Re: stops answering at same time every day












I turned on query logging last night and nothing seems out of the ordinary.
Only one server stopped responding this time, but it was our master so all
zone transfers are stopped for the same amount of time.  The CPU
utilization was low, no crazy connections, no different processes running.
I'm truly stumped.





Doesn't anyone have any advice or suggestions?





I can log into the machine remotely during this downtime and do anything
else as usual.  The only thing different is the name server stops
responding to all types of requests.  What could possibly cause this at
such a regular interval?





Many departments take internet based tests in the afternoon, so this is
really affecting many people.  I'm just out of ideas.  Tonight I will have
a sniff going on the servers to see if there is something else I'm missing.





I'm begging for suggestions.  I've been troubleshooting for weeks.





Allie








|---------+---------------------------->
|         |           Allie M Hopkins  |
|         |           <allie at lsu.edu>  |
|         |           Sent by:         |
|         |           bind-users-bounce|
|         |           @isc.org         |
|         |                            |
|         |                            |
|         |           02/24/2005 08:39 |
|         |           AM               |
|---------+---------------------------->


>--------------------------------------------------------------------------------------------------------------------------------------------------|





  |
|





  |       To:       bind9-users at isc.org
|





  |       cc:       (bcc: Allie M Hopkins/allie/LSU)
|





  |       Subject:  stops answering at same time every day
|






>--------------------------------------------------------------------------------------------------------------------------------------------------|














All three of our nameservers - two running 9.3.0 and one running 9.2.3 -
stop answering at the same time every afternoon.  Not all three everyday -
sometimes just one, sometimes two, sometimes all three, but always at
6:30pm for 10-15 minutes.  My first thought was that a script or other os
application was stopping the service from answering but I've been running
logs and nothing points to anything peculiar.  No crontabs, no extra
processes running.








I'm going to turn on query logging during this interval to see what the
clients are doing, perhaps something is hosing the servers making it unable
to respond.  I know about the cleaning interval, but I thought the default
was 60 minutes.  I don't have this option set anywhere to change the
default.  I just can't seem to figure it out.  No weird connections at this
time either, according to netstat.





Anyone have any ideas???  I'm running AIX 4.3.3.  What other logs can I
turn on to get a better handle on the situation?





MRTG is graphing the traffic load.  You can see the dips on the servers:
http://kahuna.net.lsu.edu/mrtg/dns.html





Allie M Hopkins










****************************************************************************************************************************************************************************************************************************************************


The information contained in this e-mail transmission (including any
accompanying attachments) is intended solely for its authorized
recipient(s), and may be confidential and/or legally privileged. If you are
not an intended recipient, or responsible for delivering some or all of
this transmission to an intended recipient, you have received this
transmission in error and are hereby notified that you are strictly
prohibited from reading, copying, printing, distributing or disclosing any
of the information contained in it. Please note that BCA Publications Ltd
accepts no liability for the content of this email, or for the consequences
of any actions taken on the basis of the information provided. The
recipient should check this email and any attachments for the presence of
viruses. The company accepts no liability for any damage caused by any
virus transmitted by this email. Should you have any questions please
contact BCA Publications Ltd at (514) 499-9550 or email at
support at bcaresearch.com.


****************************************************************************************************************************************************************************************************************************************************




****************************************************************************************************************************************************************************************************************************************************
The information contained in this e-mail transmission (including any
accompanying attachments) is intended solely for its authorized
recipient(s), and may be
confidential and/or legally privileged. If you are not an intended
recipient, or responsible for delivering some or all of this transmission
to an intended recipient, you have received this transmission in error and
are hereby notified that you are strictly prohibited from reading, copying,
printing, distributing or disclosing any of the information contained in
it. Please note that BCA Publications Ltd accepts no liability for the
content of this email, or for the consequences of any actions taken on the
basis of the information provided. The recipient should check this email
and any attachments for the presence of viruses. The company accepts no
liability for any damage caused by any virus transmitted by this email.
Should you have any questions please contact BCA Publications Ltd at (514)
499-9550 or email at support at bcaresearch.com.
****************************************************************************************************************************************************************************************************************************************************



More information about the bind-users mailing list