2 simultaneous hung Bind boxes
Justin Shore
justin at justinshore.com
Wed Oct 28 05:30:09 UTC 2009
I got a call from a remote tech earlier this evening. He was at home on
our service and couldn't get on the Internet. His IP connectivity was
fine and could hit my NOC website via IP only. DNS however was hosed.
About the time I got in a position to check the bind logs and sniff his
traffic the problem went away. We chocked it up to a local problem
until a few minutes later across the SP network I too experienced the
same thing. My DNS requests simply timed out. I turned on querylog on
our boxes and could see what appeared to be successful hits and replies.
The boxes were just not replying to queries. Traffic on our main
upstream dropped by about 90% within a few short minutes (users' DNS
stopped and outbound usage ground to a halt basically). Not knowing
what else to try I restart bind on both NSs. That fixed it.
The boxes are running fairly old Bind code, 9.5.1b2. Tomorrow I will
upgrade to 9.6.1rc1 (unless people believe 9.7.0b1 is ready for use).
My question is are there any known ways for a crafted query or crafted
reply to cause what I've described on that old release of Bind? I
recall hearing about assorted things over the past couple of years
though I thought that they were things that would cause actual crashing,
not the mentally hosing my boxes appeared to take this evening. Does
anything else come to mind? The views on the servers only permit
recursive lookups internally from our customer prefixes. Externally you
can only get responses for things that we have authority over. Thoughts?
Thanks
Justin
More information about the bind-users
mailing list