bind problems, 9.7.0 p1

David Ford david at blue-labs.org
Fri Jun 11 16:39:29 UTC 2010


A snippet of the log to start with

11-Jun-2010 06:35:08.959 Postgres driver unable to find available
connection after searching 30 times
11-Jun-2010 06:35:08.959 Postgres driver unable to return result set for
findzone query

    /*%
     * Loops through the list of DB instances, attempting to lock
     * on the mutex.  If successful, the DBI is reserved for use
     * and the thread can perform queries against the database.
     * If the lock fails, the next one in the list is tried.
     * looping continues until a lock is obtained, or until
     * the list has been searched dbc_search_limit times.
     * This function is only used when the driver is compiled for
     * multithreaded operation.
     */

    static dbinstance_t *
    postgres_find_avail_conn(db_list_t *dblist)
    {
            dbinstance_t *dbi = NULL;
            dbinstance_t *head;
            int count = 0;

            /* get top of list */
            head = dbi = ISC_LIST_HEAD(*dblist);

            /* loop through list */
            while (count < dbc_search_limit) {
                    /* try to lock on the mutex */
                    if (isc_mutex_trylock(&dbi->instance_lock) ==
    ISC_R_SUCCESS)
                            return dbi; /* success, return the DBI for
    use. */

                    /* not successful, keep trying */
                    dbi = ISC_LIST_NEXT(dbi, link);

                    /* check to see if we have gone to the top of the
    list. */
                    if (dbi == NULL) {
                            count++;  
                            dbi = head;
                    }
            }
            isc_log_write(dns_lctx, DNS_LOGCATEGORY_DATABASE,
                          DNS_LOGMODULE_DLZ, ISC_LOG_INFO,
                          "Postgres driver unable to find available
    connection "
                          "after searching %d times",
                          count);
            return NULL;
    }



11-Jun-2010 06:35:09.080 name.c:2091: REQUIRE(suffixlabels > 0) failed
11-Jun-2010 06:35:09.081 exiting (due to assertion failure)

    void
    dns_name_split(dns_name_t *name, unsigned int suffixlabels,
                   dns_name_t *prefix, dns_name_t *suffix)
    {
            unsigned int splitlabel;

            REQUIRE(VALID_NAME(name));
            REQUIRE(suffixlabels > 0);
            REQUIRE(suffixlabels < name->labels);
            REQUIRE(prefix != NULL || suffix != NULL);
            REQUIRE(prefix == NULL ||
                    (VALID_NAME(prefix) &&
                     prefix->buffer != NULL &&
                     BINDABLE(prefix)));
            REQUIRE(suffix == NULL ||
                    (VALID_NAME(suffix) &&
                     suffix->buffer != NULL &&
                     BINDABLE(suffix)));

            splitlabel = name->labels - suffixlabels;

            if (prefix != NULL)
                    dns_name_getlabelsequence(name, 0, splitlabel, prefix);

            if (suffix != NULL)
                    dns_name_getlabelsequence(name, splitlabel,
                                              suffixlabels, suffix);

            return;
    }




There are two issues here.  a) why is bind rapid firing, and i mean
RAPID, the logs are overflowing with these messages.  bind attempts to
find a free mutex connection and failing?  14 of these pairs in 3ms with
80 seconds of silence prior to this and a minute of silence after this. 
420 attempts in 3ms.

my postgresql logs aren't indicating anything is going on and the
machine is almost a blank slate for activity.  it's entirely idle. 
there's no hangup on resources for the DB so i have to presume that bind
itself has somehow gotten into a full-up state without good reason. 
postgresql is indicating 4 idle connections normally.  i have maybe one
or two queries per second averaged out of small 4-12 queries in an ~8
second interval. maybe a microsleep pause would be beneficial.  better
would be a dump showing which threads were doing what to figure out why
a supposedly idle system is all tied up.

Next, b) named keeps dying with this entirely ambiguous assertion
failure.  i'm sure it's a fault of my own but without any indication
where the issue lies, this like asking to find a leaf in a forest
without knowing what type of leaf you're looking for ^_^.

Why is bind so prone to falling over and dying from typos?  don't get me
wrong please, i love bind which is why i've been using it for ~15 years
now.  i've noted that bind has a strong tendancy to simply flat out
abort if it encounters zone data it doesn't like rather than report it
and drop the bad data.  that's not really very reliable.  it's ok for
testing in the lab but really bad manners for production. :>

A bit of help on these please :)

-david

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20100611/37b7ca9e/attachment.html>


More information about the bind-users mailing list