How reliable is RPZ in production? I'm seeing flakiness in testing.

Tue Jan 6 23:43:13 UTC 2015

Hi Anne,

We've been using RPZ in production for over six months, and haven't
had any serious issues.  We haven't encountered this specific type of
flakiness, but then again, it's likely our configs and bind versions
aren't the same either: we do our quarantining at layer 2.

You might start things out by giving us your bind version and your
response-policy {} config.  Also print out the exact rules (one or two
examples should suffice) you're using for client quarantining --
that'll help narrow things down.  Also, how are you publishing to your
client quarantine zones?  Presumably you're using some sort of DDNS
publishing that gets triggered when a client does something
suspicious.

John
-- 
John Miller
Systems Engineer
Brandeis University
johnmill at brandeis.edu

On Tue, Jan 6, 2015 at 5:52 PM, Anne Bennett <anne at encs.concordia.ca> wrote:
> I'm playing with RPZ with a view to both quarantining internal
> compromised or vulnerable hosts, and capturing attempts at
> communication with known external bad hosts.  I start with a
> fairly extensive whitelist, to avoid "lying" about any of my own
> hosts, and to give truthful answers for patch sites, so that my
> users can patch their systems even when otherwise quarantined.
>
> The masters for my RPZs do not themselves use the zones
> for policy (nor do they recurse on queries).  However the
> nameservers that do recursive resolution for my network are
> slaves for those RPZs, and *do* use them for policy.
>
> My set-up works, but sporadically - it's as though the RPZs wink
> in and out of use for no apparent reason, even when I'm not
> changing the data.  At one point while testing last December,
> my by-client-IP test quarantine rule just stopped matching
> (based on no logged hits, and no redirection of my queries
> from the quarantined host).  Only a restart of named on the
> resolver brought the quarantine back, but then the whitelist
> worked only partially.
>
> I don't know what to make of this; it looks as though the
> technology is several years old, and my experience with ISC
> bind is usually excellent.  Has anyone else encountered this
> type of flakiness?
>
> If not, any advice about how to debug this?