DDNS failures

Wed Dec 14 00:47:50 UTC 2005

Peter Kringle wrote:

>I am currently working out a problem with DDNS failures.
>We are currently seeing about 10 to 20 updates per second, but every once in while the app used to send the updates gives the 
>following error:
>
>java.net.SocketTimeoutException: Receive timed out
>        at java.net.PlainDatagramSocketImpl.receive(Native Method)   
>        at java.net.DatagramSocket.receive(DatagramSocket.java:711)  
>        at org.xbill.DNS.SimpleResolver.send(SimpleResolver.java:248)
>        at com.alopa.prov.core.ddns.DDNS.sendUpdate(DDNS.java:435)
>        at com.alopa.prov.core.ddns.DDNS.notify(DDNS.java:336)
>        at com.alopa.prov.core.ddns.DDNSClient.run(DDNSClient.java:98)
>
>At this time I see nothing in the logs that shows an error.
>
>### BIND CONFIG ###
>
>options {
>        directory "/etc/bind";
>        auth-nxdomain no;    # conform to RFC1035
>        recursion yes;
>        version "Off with your head!";
>        interface-interval 0;
>        notify no;
>        transfer-format many-answers;
>        recursive-clients 100000;
>
>        allow-transfer {
>          xfer;
>        };
>
>        allow-query {
>          trusted;
>        };
>
>        blackhole {
>          bogon;
>        };
>};
>
>
>zone "{ZONE}" {
>        type master;
>        file "{ZONEFILE}";
>        notify explicit;
>        also-notify {{IP};};
>        allow-transfer { {IP}; };
>        allow-update { {IP}; {IP};};
>};
>
>### END ###
>
>
>What I am looking for suggestions on how I can diagnose the problem, 
>
Well, you've left out a critical piece of diagnostic information: when 
you get this error message, is the update being made or not? If the 
update is being made, then this is nothing more than the app timing out 
before the Dynamic Update response gets back to it, which, in turn, may 
be because a) the response packet is getting lost (probably a network 
problem), b) the master server is taking too long to respond (probably a 
performance/capacity problem on the master), or c) the app is not 
waiting long enough for the response packet (this may be tunable). On 
the other hand, if the update is *not* being made, then there is 
another, somewhat parallel set of possible causes: a) the Dynamic Update 
request packet is never getting to the master (again, probably a network 
problem), b) the master is dropping the packet because it's too busy 
(again, a performance/capacity problem on the master). There are other 
possibilities too, e.g. packet corruption; I'm just mentioning the most 
likely ones.

>and any ideas on how to increase performance of bind to   
>support this many updates.   
>
You seem to be assuming a performance/capacity problem on the master. 
All other things being equal, I would eliminate the possibility of a 
network problem first, by putting a sniffer at various points along the 
path between the client(s) and the master. Unless, of course, you have 
independent reasons to suspect a performance/capacity problem.

Of course, a "network" problem could actually be a type of 
performance/capacity problem, in the sense that too much query traffic 
could congest the master's local segment or NIC(s) and cause UDP packets 
to be lost. You said that the master is also doing recursive lookups, 
and although I wouldn't presume to flame you for that (since one of my 
masters also supports a small amount of recursive traffic), it does 
imply that maybe separating those functions may ease the network 
congestion and reduce the incidence of Dynamic Update packets (requests 
or responses) being dropped.

                                                                  -Kevin