do not stupidly delete ZSK files

Fri Aug 7 05:16:08 UTC 2015

On 2015-08-06 19:26, Heiko Richter wrote:

>> Though back then I was still building bind 32-bit, and the hardware
>> as much slower.  A full signing was more than 10x longer than our
>> current hardware....which can get it done in just under a minute.
>> (usually)  The need for speed is some people expect DNS changes to
>> be near instantaneous.
> 
> So either you have very slow servers, or a really big zone, if it
> takes a whole minute to sign it.
> 
> Just use inline-signing and the changes will be instantanious. As soon
> as nsupdate delivers a change to the master server, it will sign it
> automatically and send out notifies. Doesn't even take a second, as
> only the changes need to be signed, not the whole zone.
> 

Its big and probably full of a lot of stuff that isn't needed anymore, etc.  
Though there something weird about the zones too.

our ksu.edu zone will have more entries than the k-state.edu one, even though 
by policy they should be the same, though I just fixed up delegated subdomain 
that is only doing .ksu.edu form (the also don't list us as secondaries or 
allow us to do transfers anymore...which they're supposed to according to 
policy (and to ensure external resolution....especially if all their 
129.130.x.y addresses become 10.42.x.y or something.  Internally we're 
probably running out of open blocks of IPv4, especially for anything that 
wants /27 or bigger (such as a /21)  It caused problems the first chunk from 
a reclaimed block was used.  The reclaimed block used to be our guest 
wireless network (which is now a number of are was a growing number of blocks 
in 10.x.x.x space)  The switch to WPA2 Enterprise versus open guest, made it 
too tempting to take easy way to get online.  So it was required that campus 
resources block access from guest networks.  There was no notification that 
the old guest network wasn't anymore...and its been years now.

But, often hear that it should would be nice if I filled these various 
network blocks with generated forward/reverses....I'm rarely in the loop for 
what and where the blocks are.

Anyways...the odd thing I was going with ksu.edu vs k-state.edu...the size of 
the raw second zones end up fairly close in size so would expect a huge 
difference in viewing the zones.

but, the named-compilezone to convert k-state.edu back into text took a few 
seconds, while it took minutes to do ksu.edu.....same machine, etc.    Wonder 
why, and wonder to what extent I should investigate.....

But, our master server, is Sun Fire X4170 M2 (dual Xeon E5620's)....its bored 
and a waste most of the time...until a full signing needs to get done.  
Though it isn't as fun to watch when I was using a T5120 (64 threads)....load 
average would break 100 and set all kinds of monitoring alerts....  but it 
chugged along fine....though the apps (and their admins) in other containers 
on it weren't as happy.

Years ago, loads exceeding 100 were often fatal and messy, since they used to 
be caused by problems between ZFS and our old SAN (9985)....as much as they 
didn't want us to, turning of zil was often the fix to make it not happen 
anymore.  The problem went away after we switched to new SAN (which isnt so 
new anymore...as its end is nearing.

I've thought about looking for a solution that I can throw our zone configs 
enough that would just work, but I largely haven't had time to do that.  Or I 
was hoping to get more backing on enforcing good behavior in my zones. (stop 
the vanity of wanting 10.x.x.x servers at same level as your subdomain with 
public.)  Not sure how preprocesssing zone files to generate internal / 
external (/ guest / dr) versions translates into a free ready to go solution 
:)

I commented out the the latter two as the first never did what they wanted, 
and I heard that the official DR plan was something that got written up back 
in 2000 and and then shelved to be revisited when there's funding....  So 
once we got we got secondaries outside of our netblock (we vanished complete 
a few times when our Internet connection breaks, and the last major quite a 
number of sites plus our email were externally hosted....

During recent DNS outage, i couldn't send replies to co-workers....our 
Office365 tenant said i was an invalid sender :..(  It also apparently 
knocked me off of jabber and stopped having my deskphone forward to my 
cellphone....or for me to get sms notications of voicemail.....

But, FreeNode contined to work....before jabber we had a private channel that 
we hung out in (while its been a long time since we ran a node, we still have 
well maybe not, since the co-workers that had those friends have all left 
now....which is probably why ownership of the channel hasn't transferred to 
me....)

>> 
>> For those I do have a script that can run after and ssh into all
>> my caching servers have flush....
> 
> You don't need to manually sync your servers. Just aktivate NOTIFY and
> your master will inform all slaves of any zone changes. If you also
> activate IXFR-Transfers, the slaves will only transfer the records
> that have changes; there's no need to transfer the whole zone.
> Combined with inline-signing your updates will propagate to all
> servers within a second.
> 
Well, we do have our caching servers acting as slaves for some zones, but 
frequently its not realiable for getting our busiest server (the server that 
listed first on our DNS configuration page, and is what DHCP gives out as 
first.) to not continue with its cached answer...  I've made suggestions to 
try to get them to spread things out....there's 6 servers....not just 
two...as they some areas now get the second server first.  Resulting in 
second listed server being my second busiest.  After that its a split between 
3 and 5 ones.  We used to list our datacenter DNS as 'backup', though we had 
an outage our student information system due to the datacenter DNS getting 
swamped by a few computers across campus (that were getting hammered by a 
DDoS attack....

number 3 used to be 3rd busiest, but its popularity is has gone down...since 
it only has a 100M connection, while others have gigabit.  All the campus 
servers used to be only 100M.  But, people that know which say it matters...  
But, tis in the powerplant and has one leg on inverter power...the batteries 
for the old phone system are there....next to large empty room....

though at the moment, no incremental capabilities.... so I can hit a slave a 
few times before the transfer finishes the info updates. (just as I can hit 
master servera few times after it does 'rndc reload' after the 
signing....before it reflect the change...

But, it it was actually hard getting to the amount of automation that I have 
now.... but occasion people fight the automation. (some more than others)

>> 
>> Now if only I could figure out how to do that to the rest of the
>> world to satisfy those other requests.
> 
> It's just a matter of lowering your ttl. Resolvers all over the world
> will cache your records according to your ttl. If you really have
> 86400 set as ttl, any given record will be queried only once per day.
> 
> Just lower the default ttl to a resonable number and your updates will
> propagate faster to the resolvers. It's just a question of how much
> bandwidth and resources are you willing/able to give to DNS? Lower it
> step-by-step until either hit the limit in your bandwidth or the
> system-resources of your servers.
> 
>> 
>> Recently saw in incident....a department that has full control of
>> their subdomain made a typo on an entry with TTL 86400.  They had
>> fixed the typo, but the world still wasn't seeing the correction.
>> Asked us if we could lower the TTL for it, to maybe 300.
>> 
>> Hmmm... no.
> 
> If they have full control of their subdomain, why don't they just
> change the ttl themselves?
> 
that's basically what my co-worker said.... in responding to the ticket.  
But, what they're ask is we  lower the TTL of the already cached value.

> Setting a ttl of 1 day seems a bit high, but of course it always
> depends on your zone. If the data is static, 1 day is find, but for
> dynamic zones this is a but high.
> 

There lots that seem to feel that 1 day is what things need to be at except 
for temporary reasons....though people often forget to have to lowered in 
advance of a server upgrade or something.  And, this case they had made a 
typo on where the new server was...so instead of traffic shifting from old to 
new as their update spread out....it all disappeared....

All my domains are static, and I just have forwarding set to the servers that 
have dynamics subdomains (though I'm slave to them...shich this new bind has 
me a bit stumped on what the correct way to go is.

> When you use inline-signing, your updates will be signed on-the-fly,
> as they come in, so you can lower the ttl to a few minutes without any
> problems. This helps much in keeping outdated data out of any
> resolver's cache.
> 

Hopefully a solution will suddenly appear that can replace the scripts I've 
mashed together over the years to do what we do now....

I had thought I'd have solution to our current DNS problem in place by 
now....

-- 
Who: Lawrence K. Chen, P.Eng. - W0LKC - Sr. Unix Systems Administrator
                                    with LOPSA Professional Recognition.
For: Enterprise Server Technologies (EST) -- & SafeZone Ally