bind-9.3.1 stops answering queries for nearly a minute/hour

Danny Thomas d.thomas at its.uq.edu.au
Wed Oct 19 00:16:29 UTC 2005


JINMEI Tatuya <jinmei at isl.rdc.toshiba.co.jp> suggested
> - decrease cleaning-interval.  it will decrease the amount of total
>   work of each cleaning session, so it may relatively mitigate the
>   problem.  However, it may still not be a good solution because named
>   will still consume CPU during the cleaning session, in which
>   responses will still be delayed.  It may even result in worse
>   performance since the periodic cleaning occurs more frequently.  So,
>   whether it helps or not depends on details of your environment
>   (query pattern, TTLs of cached records, etc).
> 
> - decrease DNS_CACHE_CLEANERINCREMENT defined in lib/dns/cache.c (at
>   line 48 for 9.3.1).  It's currently 1000, which means in the worst
>   case the response to a query can be delayed until named examines
>   1000 entries in the cache DB.  You can improve the response time
>   during the cleaning session by decreasing this parameter.  Of
>   course, it does not solve the high CPU usage during the session,
>   which is an inevitable cost at least in the current implementation,
>   and the cleaning period itself will be longer accordingly.
we tried the first as being the simplest.
Changing the cleaning interval from 60 -> 5 minutes produced a big
improvement. It almost seems non-linear:

NB . = 0-10ms, : = 10-100ms, * = 101-800ms, T = > 800ms

60 mins: 24 * >800ms + 9 * 100-800ms = > 19.2secs
12:34: ............................................................    2/  5
12:35: ....:............................................**TTT*TTTTT   27/589
12:36: TTTTTTTTTT**TT**TTTT**......................................   79/807
12:37: ............................................................    2/  5
12:38: ..............................................*.......:.....   14/678

5 mins:  0 * > 800ms + 6 * < 26ms = < 156ms
15:12: ............................................................    2/ 10
15:13: ...........::::::......................................:....    4/ 26
15:14: ............................................................    2/  6
15:15: ............................................................    2/  5
15:16: ............................................................    2/  6
15:17: ............................................................    2/  6
15:18: .................::::.::....................................    6/ 78
15:19: ....................T...........:...........................    3/ 56

so while the 5 minute cleaning cycle runs 12 times as often, the total
query delay during each cycle can be about 1% of that during each 60 min
cleaning cycle. NB it's probably true to say there is more variation
in total query delay for the 5min cleaning cycle, and sometimes it would
approach a second.

Danny

-- 
   d.thomas at its.uq.edu.au    Danny Thomas,                                    
          +61-7-3365-8221    Software Infrastructure,
 http://www.its.uq.edu.au    ITS, The University of Queensland



More information about the bind-users mailing list