ERROR : - writeable file 'data/udalgurijudiciarygov.hosts': already in use: /etc/nicnet2007.govdomain:15424 - loading configuration: failure

Wed Aug 5 04:37:17 UTC 2015

On 2015-08-04 07:14, /dev/rob0 wrote:
> On Mon, Aug 03, 2015 at 10:36:25PM -0500,
>    Lawrence K. Chen, P.Eng. wrote:
>> This unfortunately looks like the thread for me to jump on to....
>> 
>> I missed installing the last two 9.9...-p# patches, first time I
>> built everything and was pretty much ready to do it, and then
>> forgot all about it due to health issues.  More recent one...I had
> 
> I hope you're well now.
> 

While, I have finally got a partial diagnosis of a rare disease for which 
there is no treatment or cure (SCA), has at least lifted that burden (now if 
only I can make all the bills of getting there to go away...)

Perhaps at some point I'll see if specific identification is possible, to 
look for possible clinical trials...though most I seem targeted to the more 
common types, which I'm negative for (not surprising as a cluster of family 
members...while I'm alone among my relatives...)

>> got it built for Solaris x64 and was about to work on building it
>> for Solaris SPARC when the most recent one appeared.  This one
>> carried a much strong get things patched (to me at first, then
>> higher ups started jumping around...)
> 
> It's good that you have deployed the fix for CVE-2015-5477.  Those
> who are ignorant or foolish would say this shows the problems with
> free software.  But that's opposed to the truth: these security
> reports are the strength of free software.  Anyone can hack at it
> looking for bugs.  And then those bugs get fixed.
> 
> Who knows what bugs lurk inside black-box proprietary solutions?
> Worse, who knows if they'd be fixed?  Security is in openness,
> standing up to the light of scrutiny.
> 
Kind of like a while back, there was the TLS POODLE CVE that only affected 
F5's.

Which was problematic as support was allowed to expire on our older but still 
only production F5 (which will reach EoSD at the end of this year...)  And, I 
was having trouble getting the hotfix to install via the web interface.

I eventually found how to do updates from the command line on devcentral and 
got the HF installed a month before we got the units back on support (though 
just in time for the primary unit to fail...requiring two RMAs to resolve...)

I recall back when a CVE had pushed me to upgrade from EOL 9.7.7 to 
9.9.2-P1...the day before I was to leave for LISA.  I had thought it odd that 
somebody was asking about whether a patch would be released for 9.6, didn't 
realize at the time that it was ESV.  Though as I recall there was something 
about required me to upgrade from 9.6 to 9.7 before going live with DNSSEC?

Further recall suggests it was something to do with DLV?  Which now I wan to 
figure out how to remove.  I have an insecure delegation that is using a 
wildcard in the subdomain...its a contracted mass mailing service, which 
seems to require cause it to try the DLV so it can generate NSEC3 records for 
the wildcard?

Forget if I ever finished reporting it... thought I saw them while doing the 
upgrades, but can't locate them now.  Solution was turn off dlv 
(dnssec-lookaside no;)  Couple months ago, I finally nuked our DLV records 
(after the compromise incident...in April)  Wonder now if I should've 
published new KSK that way.  As I KSK with our registrar still hasn't been 
updated...and the old KSK is now showing as revoked as it nears the end of 
its life during our normal KSK rollover window (July-ish)  A contractor that 
was working on getting GTM setup to replace parts of things....he wants to 
copy the private key from master server to GTM (both are in our datacenter), 
so I send him details on how to track them down our our master server....or 
multiple emails of increasing detail on how to find them.

Where upon he copies them into an email and replies all to a large number of 
outside contacts.  Including the outside consultant has been trying to direct 
him through the CUI, but he's opened up the CUI to let the outside consultant 
in...don't know if he also gave him the administrator password or not.  Right 
now I've only change one letter, though probably should put on my creativity 
cap and come up with a new complex but mnemonic password.

Though in recovering my password to our f5configbackupVM, it has triggered a 
C2 response that prevents the GUI side from updating the daemon side's 
database...which is where the F5 admin passwords are stored.  At least it 
does backups, though would be nice if it would report failures at least...and 
certificate reports (usually about old certs I've forgotten to remove, though 
thought I saw that newer F5 does sync deletions now.)

The important thing was to have configuration backups of our F5's, since 
there had been a number of times former onsite contractor had needed, or 
almost, them.

Just noticed the variation is timestamps between the generations of rrl.log.  
Seems I got slammed July 28-29....

>> But, it turned out to be a huge mess to upgrade.
>> 
>> The first time I ran into this error, were some really old mistakes
>> where the admin had copy and pasted a bunch of similar zones...and
>> missed adjusting some of the files.  Since on the master side they
>> all come from the same file....it probably didn't cause any
>> noticeable problems for the slaves or clients.
>> 
>> However, install upgrade on our master server...knocked it out, so
>> I'm here looking to see what the proper fix for my situation is.
> 
> This seems to be a bug fix (not allowing named to share writeable
> files) which has brought a lot of broken configurations out.  Oops.
> 
> Basically, no two slave zones (even nominally the same zone, in a
> different view) should EVER share the same file.  Master zones can
> get away with file sharing, but ONLY if named does not write to the
> file (no allow-update, update-policy, nor auto-dnssec.)
> 
It is documented in our wiki that the secondary side the files need to be 
unique, even when we've created a bunch of master zones off of the same file.

And, more recently in a defined naming (so our showzone script saves 
remembering the compile-zone incantation...)

>> Looking for a valid easy fix here ;) Partly because coming soon
>> they're going to demolish the DNS infrastructure that I got saddled
>> with and feel like I done a pretty good job at re-engineering it to
>> meet all the demands of it.  But, I'm the last legacy unix systems
>> administrator here....
> 
> Sad.  There's nothing "legacy" about Unix, though.  Sounds like the
> salesmen are winning out over the technicians, in terms of getting
> management to set policy.
> 
Should've been a sign during an interview session, where a candidate meeting 
the admin groups..the Windows Administrators and the Unix Administrators, 
asked where are the Linux Administrators?

But, coming soon we're going to be 100% virtualized and running Ubuntu.  
starting two July 1st's ago....maybe we'll be there before the next one?  
Especially in getting people to move off of the old F5.  Security was 
surprised that we still have the old largely unprotected subnets into our 
datacenter that were for our previous F5 (4.x)...pre-FWSM...but these were 
servers that hadn't got rebuilt in the new(current) network architecture when 
the hardware failed about 3 months after EoTS/EoRMA....and having done two 
RMAs during the final months before those EOL dates...apparently meant 
nothing to their life expectancy.  Though bad that an admin what had give his 
6-8 week notice...where some application admins couldn't get something to 
work as a pair behind our F5, even after some conference calls with another 
University that had it working, got their servers slapped into the external 
vlan with the F5.  Where the only protection, if any, involves them setting 
up host based firewall.  It was largely exposed as they try to have their VMs 
moved into our main cloud (as we've been losing servers in our old VM 
cluster....I think the part that isn't doing EVC is down to only one server.  
(Meanwhile, still don't know all the details of the new(new) network for our 
3 7200v's...well, the dev one is pretty much done, just the names need to be 
changed.)

But, I have little access into the systems in the virtual datacenter...

>> Anyways...the problem is because we had turned out existing master
>> server into doing split/stealth (started out stealth...) DNS, while
>> having it continue to serve as slave to delegated subdomains.  So
>> that those subdomains are propagated to our external facing slave
>> servers.
>> 
>> So that's where the problem comes in....the internal authoritative+
>> nameservers having the master collect secondary zone data from
>> them...on the Internal view.  But, then having to send that
>> information to nameservers that hit the external view of the
>> master.
> 
> The way to select a different view on the master is to use TSIG keys.
> 
> https://kb.isc.org/article/AA-00295/
> 

Wow, that seems so obvious now that I've read it....there had been discussion 
of having the GTM complete replace our master...at least as far as DNSSEC 
goes.  Since the compute requirements need to do full signings of 6 zones (5 
large larges ones, and a small one for what was supposed to be our stealth 
subdomain....its just a wildcard MX record....so they can get mail to our 
externally hosted system, currently O365)...in under a minute. (currently 
idling on an X4170...12 cores, 24 threads...)

signing was faster on an X4150 vs T5120, but since X4150 had original been 
purchased for a specific project, it got reclaimed for new direction for that 
group...so an X4170 appeared.  I missed that its 1gig NIC being complete 
saturated with DDoS stuff....not sure how bad it was for my other two 
secondaries....  but a full 2G of DDoS traffic was crossing our core to the 
datacenter for only the I have two X4170s...which were to be master and one 
slave.  But, during the IP shuffle a 3rd one appeared (and got published in a 
few places, and now everywhere), and things didn't quite land in the right 
places. (their hostnames don't match their top-level names)  The other two 
secondaries are a pair of X4100s.

Anyways... the question was how do I get a single GTM to transfer both 
internal and external views.  I had partially set up tsig for the external 
view (especially since GTM might be coming from an internal IP ...its 
currently trying to do zone-transfers via the 7200v's management interface.  
Where there's a firewall that doesn't know if should allow that....)

>> So, until a few hours ago....it was include a file containing all
>> the delegated (sub)domains into both views....causing both sides to
>> be working off of the same file.
> 
> It would require some reworking of things, but you might be
> interested in the new BIND 9.10 feature of "in-view" zone option.
> This lets you literally include a zone from another view.  See BIND 9
> ARM chapter 6, "zone Statement Definition and Usage", for details.
> 

Always weird that its always Bv9ARM, but it likely differs across 9.x's.  
Though the only one that I've spent time reading hsa been the one for 9.9, 
and it has mainly been in chapter 6 -- trying to recall the details around 
the dns64 zone (eventually resorted to digging through archives to find the 
thread from July 2013....

I had considered whether I wanted to jump off of 9.9 to 9.10, mainly for DNS 
cookies to help in the battle against DDoS.  But, originally new F5 was to be 
in full service by July 1st, 2014...  So, I opted to avoid any issues with 
upgrading.  There were a few when did the last minute hope from 9.7 to 9.9 
(though later I met someone from ISC who said it only affected people doing 
dns64 so I probably didn't need to upgrade...)

Though during this recent upgrade, I did discover a dns64 block in the 
configs for our datacenter dns servers.  It some an attempt to deal with old 
kernels that mysteriously do IPv4-mapped-in-IP6 reverses  I didn't seem to 
fix the problem though, as my logs continue to fill up with these machines 
complaing they don't know who ::ffff:<their IP> or the ::ffff:<gateway IP> 
is.  But, they work.

Yeas ago I had to create my own empty zones (and later fill in some detail 
for the RFC1918 address we were using.)  To solve a performance problem 
because the reverses lookups going out to the roots were timing out.  Seems 
our network operator is blocking those queries from reaching the blackhole 
servers, requiring me to do them (which was a large part of the issues I ran 
into going from 9.7 to 9.9...)

At first it was find command to disable all...but later came back and only 
select ones are disabled.

Probably going to stay 9.9.x for the near term...if abandonment of my special 
builds is still imminent.... in addition to turning rrl on, we also make use 
of 'filter-aaaa' here.

>> WHich seemed to work fine.  As only one side is getting updates,
>> the other side is just to feed our outside facing slaves.  Well,
>> this update wouldn't go for that.
>> 
>> So, cloning the file and doing a global search and destroy....the
>> external view is looking zone files in a directory that is emtpy,
>> while the internal side continus as is.
>> 
>> To have something for the external nameservers to transfer
>> (hopefully), I'm doing a regular sync of the file 'sec' to 'ext'.
>> 
>> Not totally sure that's working....but nothing filing up logs
>> about it.
>> 
>> So, is what I did something that'll hold...or is there an easy
>> proper solution to this?
> 
> Slave zones should be transferred using DNS.  In a stealth master
> case, you need to populate also-notify lists, but perhaps in your
> case you can share some of that configuration with global or view
> level settings.  (Better than having to set everything per zone.)
> 

At first I was hoping for a solution that I could do at the view level, but 
'directory' can only be done at global., so it was 'sed'-like on the file 
that largely worked (there were two blocks where it didn't, and probably due 
to them being added by copy/paste...resulting in no tabs.

The initial regex was like 's/^\tfile "sec\//\tfile "ext\//'....didn't want 
to worry about the commented out zones, and big enough to avoid replacing 
other stuff (though they be limited should be other embedded comments.)  Of 
course, there's nothing that says the new file needs to look as close as 
possible to the original file, except that its easier to see that it worked 
using diff.

Later there were a couple zones that within that were sharing files, due to 
oversight...

Yeah, I had wondered about doing notifies from my hack (instead purely 
relying on soa timing for refreshes...)

Meanwhile...on the issue of also-notify (which started somewhere along 
9.9.x)....that currently clutters up my logs....because I have it set for all 
the zones, and up at the view level before they get included, I have "notify 
no;" set, versus "notify explicit;" (external view.)

I just spotted problem with my hack....we have AD updating the external view, 
where I should be copying that to the internal view....

>> To hold us/me over until they decide if its going to be
>> BlueCat or Infoblox that replaces everything.
> 
> IIUC both of those are BIND under the hood. :)
> 

It was my recollection that I had seen somewhere that both used BIND under 
their covers....though there were some appliances using Unbound (which I have 
no knowledge of, will probably need to when I get around to upgrading to 
FreeBSD 10.x)  Right now I have a couple servers still on FreeBSD 9.1, though 
the powersupply in my poudriere server has died, so can't do the full pkg 
switch needed for getting to 9.3.

>> Sadly, I missed both presentations due to other issues....more sad
>> because I found my "named.iner" shirt, which I was going to wear to
>> the second presentation ;)
> 
> Haha, I have one of those also.  Really cool. :)

Oh, the hard part about the appliance plan....it was a need raised by a now 
former employee to upgrade/replace our ancient DHCP servers.  He had been 
told that there was no budget to get appliances that year, so maybe next 
year. Well, its next-next year and talk is next year is when which ever is 
selected will get purchased.

Hopefully the Sun V240's (running Solaris 9 and using dhcpd v3.0.4), which 
are pretty much as they were when setup over 9 years ago....they had been 
built prior to my start date, but didn't go into production until shortly 
after I started.  They are semi-managed with CFEngine...the DHCP part isn't, 
so it requires them to log into both servers and make the similar edits when 
needed.

I have CFEngine managing my two dhcp servers at home. (so its just edit one 
file and its under revision control too...)  Though at work, there'll be many 
files as it hands out leases to many networks across campus and one of our 
satellite campuses.

We could probably update the boxes if only they would let us. I did a number 
of builds from 4.1.11 to 4.2.4-P1, when it pretty much seems they won't allow 
upgrades.  Though they complain about problems now and then with it.  Namely 
back when there was an iOS that stopped renewing its lease, while continuing 
to use it, followed by iOS releases that renewed too often.  Though not sure 
those were the reason behind our dhcp outages, just that them seemed to 
follow what had happened at another University (I have relative that works 
there...)

It was a major effort to get them to let us replace some failed cpu fans.  
Though since them they all have one failed fan per cpu.  Probably one of the 
few that two failed fans in them, currently....fortunately they weren't both 
on the same CPU?  Not sure what the future is, the admin that had done all 
the CPU fan replacements has left....as has everybody else that had made up 
the Unix team, leaving just me.  Wondering if my loyalty has been 
misplaced....

-- 
Who: Lawrence K. Chen, P.Eng. - W0LKC - Sr. Unix Systems Administrator
                                    with LOPSA Professional Recognition.
For: Enterprise Server Technologies (EST) -- & SafeZone Ally