Journal errors

Barry Finkel b19141 at achilles.ctd.anl.gov
Thu Apr 27 14:29:12 UTC 2006


"Phillip" <preeves1 at gmail.com> wrote:

>>We have Bind 9.2.3 running on RHE3.   Our redhat servers are slaves to
>>windows servers.  The errors that are piling up in our logs are as
>>follows:
>>
>>named[27177]: malformed transaction: db.mydomain.com.jnl last serial
>>20065 != transaction first serial 20064
>>named[27177]: transfer of 'mydomain.com/IN' from 192.168.100.10 #53:
>>failed while receiving responses: unexpected error
>>named[27177]: transfer of 'mydomain.com/IN' from 192.168.100.10#53: end
>>of transfer
>>named[27177]: zone mydomain.com/IN: transferred serial 20065
>>
>>These are the only errors or warnings that we recieve in our logs and
>>we get about 50 a day.  Not a big deal but getting annoying.  Ive been
>>searching for the past few weeks for the answer to this question
>>myself.   I read a little and it says that jnl files are created on a
>>BIND server that permits IXFR transfers to reduce network traffic.
>>These files are created when a primary server announces that an update
>>has been made.  The jnl files hold for 15 minutes (I think) waiting on
>>any more updates and then once the time expires BIND attempts to merge
>>the changes into the existing db file on the slave server.  When I
>>monitored the changes I would look at the Windows servers and notice
>>that they might make 2 or 3 changes per 15 minutes, maybe more.  Every
>>change it would update the jnl file with an updated SOA.  All this
>>makes sense to me even the message...
>>"named[27177]: malformed transaction: db.mydomain.com.jnl last serial
>>20065 != transaction first serial 20064"
>>
>>However I dont understand why it errors in the updating of the db
>>record after this and surely it has nothing to do with the diff in
>>SOAs.  I was wondering if this was a bug in BIND 9.2.X


And I replied:

>I am seeing messages like this:
>
>     Apr 18 03:12:44 dns0.anl.gov named[163]: [ID 873579 daemon.error]
>       malformed transaction: cmt.anl.gov.jnl last serial 2001077345 !=
>       transaction first serial 2001077344
>
>In my case, the master for the zone in question is a MS W2k+3 DNS
>Server, with many DDNS updates throughout the day from a MS W2k+3 DHCP
>Server.  I slave the zone internally on BIND dns1.anl.gov.  I want the
>zone on dns0 (a hidden BIND "master") so that I can process it along
>with my other zones via scripts.  For various reasons I cannot have
>dns0 be a slave to the W2k+3 DNS Server, so I use dns1 as the master.
>I think in this case that dns1 starts an IXFR to dns0, and during the
>IXFR an IXFR arrives at dns1 from the real W2k+3 DNS master.  So, this
>error is between two BIND 9.2.4 systems.  I know that I need to upgrade
>to the latest BIND 3.x.  As with Phillip, "I was wondering if this was
>a bug in BIND 9.2.X."  I am not seeing any of these errors in the
>AXFR/IXFR from the W2k+3 DNS Server to any of my foour BIND slave
>servers.

I got a snoop trace of this, and I combined it with the dns.log file
from my W2k+3 DNS Server.  I am not an expert in decoding DNS packets,
especially IXFR packets.  I can send the trace records and my summary
to anyone who wants to look at this.  I probably will not file a bug
report until I can reproduce it on a more current level of BIND.
I have not looked at the change log in the newer BINDs to see if this
is listed as a known bug that has been resolved.

In my case, I can get around the problem by having the dns1 server
"also-notify" the dns0 server when the zones from the Windows box
are updated.  For a number of technical reasons, I cannot have the
Windows DNS Server notify dns0 when a zone is updated.  The problem
seems to occur when there are a number of updates to the master zone.
That zone gets successfullly IXFRed to dns1 after each DDNS update.
But dns0 is not notified.  At the refresh interval, dns0 asks dns1 for
the zone SOA, and dns0 sees an increased zone serial number.  When the
IXFR from dns1 to dns0 occurs, there could be multiple updates included
in the IXFR, and this is where the error is occurring.  A decoding of
the snoop packets would shed light on exactly what is happening.
----------------------------------------------------------------------
Barry S. Finkel
Computing and Information Systems Division
Argonne National Laboratory          Phone:    +1 (630) 252-7277
9700 South Cass Avenue               Facsimile:+1 (630) 252-4601
Building 222, Room D209              Internet: BSFinkel at anl.gov
Argonne, IL   60439-4828             IBMMAIL:  I1004994



More information about the bind-users mailing list