Strange named freezing
Nikita Druba
admin at npo-lencor.ru
Mon Dec 27 09:24:33 UTC 2021
I apologize for the persistence, but maybe there will be some
recommendations for debugging?
13.12.2021 7:18, Nikita Druba пишет:
> Hi!
>
> My system - OS FreeBSD 12.2 and filesystem - zfs. Samba 4.13.14 runs
> in a jail with Bind 9.16.23 like backend. Also I have Bind 9.16.23 on
> another server, its working like secondary dns. Secondary Bind gets
> zones from DC by transferring with a tsig-key. Also, I have several
> subnetworks(loopback and 3 other), whom DC listen.
>
> Some time ago I moved DC from one jail to another. And I have strange
> behaviour of Bind at new DC.
>
> When I set in resolv.conf of new DC other dns server, for example -
> old DC or secondary Bind, all works fine. New DC successfully resolve
> any records by nslookup or host commands from himself or other host.
>
> When I set in resolv.conf of new DC localhost or himself internal ip,
> Bind periodically freezing by the next regularity:
>
> - Bind stops to reply for the requests for a ~5 minutes. After start
> working without service restart and freeze again.
>
> - At the daytime(when employees in a office), in freezes after less 1
> minute work, at the night - after 10-15 minutes.
>
> - If I change resolv.conf from secondary Bind to internal IP, then not
> need to restart Bind or Samba to start or stop periodically freezing.
> Just change nameserver record and wait. If it was freezed, when
> resolv.conf changing, then it will be in freeze state ~5 minutes after
> start freezing and after will work fine.
>
> - If I change resolv.conf from secondary Bind to loopback, then NEED
> to restart Bind to start or stop freezing.
>
> - When Bind freeze - it don't stopped service by a command and don't
> killed by default, only kill -9 work.
>
> - Internal Samba DNS work fine and don't freeze, when resolv.conf look
> to localhost.
>
> - Sometime Bind freeze not for all subnetworks. It can freeze for
> localhost and 2 subnetworks. In one last subnetwork DC Bind can
> successfully resolve any records from any subnetworks. But this
> situation I saw only one time and can't repeat it for now.
>
> - No special Bind log records with "debug 50", in time or before of
> freezing. Its freezing after any messages. And all this messages I see
> in log, when Bind works without freezing.
>
> - I tried to run bind with logging to terminal, but don't saw no
> additional information, when freeze. Terminal logs the same, like in
> log files.
>
> - rndc freeze also.
>
> I found one way for resolving this problem. My server, where work jail
> with DC, have 40 CPUs(20 cores and 40 threads). Therefore, when I
> starts named, it is creates 40 workers for every listen ip, i.e. 40
> tcp and 40 udp for every ip.
>
> Because its too much for my configuration, I intuitively made a
> decision to try to decrease number of named workers to 10 by "-n 10".
> And all works without freezing with correct resolv.conf during last 2
> weeks.
>
> After, I tried set "-n 40", the same like named defines this value
> automatically. After restart named freezed again. May be it was
> coincidence, but with other settings named do not stop freezing. Also
> I noticed, that when named works without freezing, "number of zones"
> in "rndc status" output decreasing from 9 to 3. Seems, that named
> missed samba zones, but resolving of records from them works fine.
>
> I tried to collect some logs by ktrace and catched freeze moment.
> After last record from usual log(when Bind freezing), in kdump starts
> many times repeating the next records:
>
> 36460 named CALL nanosleep(0x7fffffffea30,0)
> 36460 named RET nanosleep 0
>
> What can be wrong here? How I can more localize the problem?
>
More information about the bind-users
mailing list