Strange named freezing

Nikita Druba admin at npo-lencor.ru
Mon Dec 13 06:18:55 UTC 2021


Hi!

My system - OS FreeBSD 12.2 and filesystem - zfs. Samba 4.13.14 runs in 
a jail with Bind 9.16.23 like backend. Also I have Bind 9.16.23 on 
another server, its working like secondary dns. Secondary Bind gets 
zones from DC by transferring with a tsig-key. Also, I have several 
subnetworks(loopback and 3 other), whom DC listen.

Some time ago I moved DC from one jail to another. And I have strange 
behaviour of Bind at new DC.

When I set in resolv.conf of new DC other dns server, for example - old 
DC or secondary Bind, all works fine. New DC successfully resolve any 
records by nslookup or host commands from himself or other host.

When I set in resolv.conf of new DC localhost or himself internal ip, 
Bind periodically freezing by the next regularity:

- Bind stops to reply for the requests for a ~5 minutes. After start 
working without service restart and freeze again.

- At the daytime(when employees in a office), in freezes after less 1 
minute work, at the night - after 10-15 minutes.

- If I change resolv.conf from secondary Bind to internal IP, then not 
need to restart Bind or Samba to start or stop periodically freezing. 
Just change nameserver record and wait. If it was freezed, when 
resolv.conf changing, then it will be in freeze state ~5 minutes after 
start freezing and after will work fine.

- If I change resolv.conf from secondary Bind to loopback, then NEED to 
restart Bind to start or stop freezing.

- When Bind freeze - it don't stopped service by a command and don't 
killed by default, only kill -9 work.

- Internal Samba DNS work fine and don't freeze, when resolv.conf look 
to localhost.

- Sometime Bind freeze not for all subnetworks. It can freeze for 
localhost and 2 subnetworks. In one last subnetwork DC Bind can 
successfully resolve any records from any subnetworks. But this 
situation I saw only one time and can't repeat it for now.

- No special Bind log records with "debug 50", in time or before of 
freezing. Its freezing after any messages. And all this messages I see 
in log, when Bind works without freezing.

- I tried to run bind with logging to terminal, but don't saw no 
additional information, when freeze. Terminal logs the same, like in log 
files.

- rndc freeze also.

I found one way for resolving this problem. My server, where work jail 
with DC, have 40 CPUs(20 cores and 40 threads). Therefore, when I starts 
named, it is creates 40 workers for every listen ip, i.e. 40 tcp and 40 
udp for every ip.

Because its too much for my configuration, I intuitively made a decision 
to try to decrease number of named workers to 10 by "-n 10". And all 
works without freezing with correct resolv.conf during last 2 weeks.

After, I tried set "-n 40", the same like named defines this value 
automatically. After restart named freezed again. May be it was 
coincidence, but with other settings named do not stop freezing. Also I 
noticed, that when named works without freezing, "number of zones" in 
"rndc status" output decreasing from 9 to 3. Seems, that named missed 
samba zones, but resolving of records from them works fine.

I tried to collect some logs by ktrace and catched freeze moment. After 
last record from usual log(when Bind freezing), in kdump starts many 
times repeating the next records:

  36460 named    CALL  nanosleep(0x7fffffffea30,0)
  36460 named    RET   nanosleep 0

What can be wrong here? How I can more localize the problem?



More information about the bind-users mailing list