bind-9.5.0b1 problem on ppc64 : rbtdb.c:1532: REQUIRE(prev > 0) failed

Tue Feb 5 13:04:15 UTC 2008

On 2/5/08, Res <res at ausics.net> wrote:
>
>
> each remote PoP has a local
> route to a primary end user DNS, IOW, eg: end users are all asigned
> ns3.blah and ns4.blah, ns3, is duplicated in every PoP, all has the same
> IP but each PoP has a route for that IP locally so it doesnt have to
> traverse the country back to our data centre,

Aha. Good setup. I might try this. I believe this is also how the local root
name server in my country was setup.

ns4 is located in our data
> centre, ns1 and 2 are as well but they are authoritives so end users only
> use them when looking up us, those caches have forwarders etc

Similar setup, but we don't recommend our users adding authoritative DNS to
their resolver.
Is this a best-practice? How does bind handle this situation anyway?
e.gwhen this happens :
- user add an authoritative DNS as their resolver, as well as a caching one
- user query a domain not in authoritative DNS
- authoritative DNS rejects the query

will the user automagically try again to other DNS servers (in this case the
caching one), or will the query fail?

>> primary for an entire state of dial/dsl customers hence why concurrent
> was
> >> set to 10000, because even at 8000 bind gave errors, and I made the
> fatal
> >
> > We ended up setting it to 10000 as well.
>
> if your at 10K and having troubles, bind is not the cause. if it works for
> nothing here it should everywhere,

Well, actually bind IS the cause :)
Following Jinmei's suggestion, it works fine now.

ns3 (results of only 1 box, in my state)  DL360 G4, 4gb ram
> rndc status shows 9718 requests
>
> 22:15:27 up 305 days, 12:46,  1 user,  load average: 0.00, 0.01, 0.01
>
> you dont have logging on do you? the only time i've ever seen Bind use
> any CPU is when querry logging is on, and thats just plain crazy on
> busy networks unless you are testing.

Yes, I do have query logging enabled.
The CPU (and disk) is not the issue on our new servers, as both usage are
still low enough.
When resource is scarce, query-logging will be the first to go.

logging {
>          category lame-servers { null; };

I like this one :)

>
> >> not in an ISP environment when you have 20K dsl users on one PoP :)
> > Try about one million users :)
>
> 1 mil and you only have 10K concurrent? man your users must do SFA :)

They're fine, actually.
As I mentioned earlier, going from power4 -> power5+ and bind 9.3 -> bind
9.5 reduced concurrency (as shown by rndc status) a lot.
It's now usually at 2000, and (rarely) go 5000. On the busy hour, DNS stats
are something like 1400 queries per second (from query log), using 6 Mbps
network BW.

> Going from power4 to power5+ in my setup actually reduce concurrent users
> as
> > shown by "rndc status". Going from bind 9.3 -> 9.5 reduce it even
> further.
>
> we notice no delays in DNS requests, all lookups are instant, sop faster
> hardware wouldnt have any affect here.

I wouldn't say we have "instant lookup", but "dig" usually show lines
similar to these :
";; Query time: 221 msec"

> A P4 isn't sufficient for that (not in my experince, anyway)
>
> well, I dont recommend it, but it shows it can be done if you have nothing
> else available in the middle of the night :) It handled it very well.

Your hardware is obviously enough to serve all your needs then.

Thanks for sharing information about your setup.

Regards,

Fajar