named UDP retransmit timeouts ?

Fri Jul 23 19:13:27 UTC 2021

Jason Vas Dias <jason.vas.dias at gmail.com> wrote:
>
>  Please can anyone advise the best way to optimize named's
>  UDP timeout settings for caching-only local resolver usage
>  over a slow network link - I can't seem to find any in the
>  Bv9ARM document specifically describing how named
>  implements UDP re-transmits - please could someone
>  point me at the right pages or place to look, besides
>  the source code, which I am reading now, if there are any ?

I remember being surprised a while back that the retry intervals
and timeouts were more hard-coded than I expected. (But, be warned! I have
not refreshed my memory.)

The rough idea is that there's a certain amount of co-design between the
libc stub resolver (which back in the day came from BIND) and the
recursive server. IIRC, the libc resolver has a query timeout of 10s and
retries three times (so the overall timeout is about half a minute), and
named's resolver has a timeout of about 3s and also retries 3 times, which
neatly fits inside libc's 10s timeout.

At least that's what my memory tells me, but it may be wrong.

But, I think you will not be successful fixing your problems by tweaking
DNS software. One of the problems with DNS as a protocol is that its
transport layer is very simple and very stupid, so if the underlying
network has problems, the DNS isn't able to fight its way through.

>  My problem is that at home my whole internet goes through
>  one 100M CAT-6 ethernet cable to a GSM 3G/4G modem (90% 3G WCDMA) ,
>  it seems no more than about 128 kilobyte/sec download & less upload
>  bandwidth is available, whenever my browser decides to download
>  something large (like a JavaScript blob) , then DNS requests
>  start timing out, the browser keeps re-issuing its requests,
>  and similar nasty feedback situations occur when the GSM
>  modem's DHCP lease expires and it has to re-setup its NAT for
>  the ethernet link, so all UDP requests time out for about
>  10 seconds, building up quite a backlog.

Ugh, that sounds horrible.

I think the basic problem is that TCP is very aggressive about filling up
whatever bandwidth it thinks might be available, but the DNS is not, and
TCP's congestion control algorithms will happily overwhelm a comparatively
reticent protocol like the DNS.

You probably also have buffer bloat, which makes these problems worse.
(check out https://www.bufferbloat.net/ for LOTS of information)

I am lucky enough that I haven't needed to deal with your problems myself,
so the best I can do is give you a few hints, but no specific advice. The
main idea is to prevent your TCP flows from overwhelming your uplink,
and/or from interfering with DNS traffic. You can (with the right
know-how) do this with some stunt network configuration on your Linux
gateway.

* Use traffic classification and priority queueing to ensure that DNS
  packets can jump ahead of everything else. This probably won't be enough
  by itself because of buffer bloat.

* You can use traffic shaping to ensure that the aggregate traffic from
  your Linux box never tries to over-fill your uplink. Years and years
  ago a friend of mine did this to avoid buffer bloat in their cable
  modem.

* Configure FQ-CoDel on your Linux gateway. This is a queueing algorithm
  specifically designed to avoid buffer bloat and to make TCP back off
  before everything becomes terrible.

That's approximately everything I know about tackling your problem, so I
hope it points you in the right direction...

Tony.
-- 
f.anthony.n.finch  <dot at dotat.at>  https://dotat.at/
Biscay: Cyclonic in far north, otherwise westerly or southwesterly, 4
to 6, occasionally 7 in north. Slight or moderate becoming moderate or
rough. Squally thundery showers. Good, occasionally poor.