Re: API change advice: Passing plan invalidation info from the rewriter into the planner? - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: API change advice: Passing plan invalidation info from the rewriter into the planner?
Date
Msg-id 20140616051530.GL2556@tamriel.snowman.net
Whole thread Raw
In response to Re: API change advice: Passing plan invalidation info from the rewriter into the planner?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: API change advice: Passing plan invalidation info from the rewriter into the planner?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Wed, Jun 11, 2014 at 8:59 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > In this case the user-defined code needs to return a boolean.  We don't
> > currently do anything to prevent it from having side-effects, no, but
> > the same is true with views which incorporate functions.  I agree that
> > it makes a difference when compared to column-level privileges, but my
> > point was that we have provided easier ways to do things which were
> > possible using more complicated methods before.  Perhaps the risk with
> > RLS is higher but these issues look managable to me and the level of
> > doubt about our ability to provide this feature in a reasonable and
> > principled way that our users will understand surprises me.
>
> I'm glad the issues look manageable to you, but you haven't really
> explained how to manage them.

There's been a number of suggestions made and it'd be great to get more
feedback on them- running the quals as the table owner, having a GUC
which can be set to either run 'as normal' or either ignore RLS (if the
user has that right) or error out if RLS would happen, and undoubtably
there are other ideas along those same lines to address the pg_dump and
other concerns.

> For my part, I'm mildly surprised that anyone thinks it's a good idea
> to have SELECT * FROM tab to mean different things depending on who is
> typing it.

Realistically, in the RDBMS realm in which we're in and that we're
working to break into- this is absolutely a given and expected.  It's
new to PostgreSQL, certainly, but it's not uncommon or surprising at all
in our industry.

> To me, that seems very confusing; how does an unprivileged
> user with no ability to assume some other role validate that the row
> security policy they've configured works at all and exposes precisely
> the intended set of rows?

While I see what you're getting at, I'm not convinced it's really all
that different from being set up without access to some schema or table
which the administrator setting up accounts didn't include for you.
Sure, in the case of a schema or table, you can get an error back
instead of just not seeing the data, but if you're looking for specific
data, chances are pretty good you'll realize the lack of data quickly
and ask the same question regarding access.

To wit, I've certainly had users ask exactly that question of- "do I
have access to all the data in this table?" even when using PG where
it's a bit tricky to limit such access.  Clearly, the same risk applies
when using views and so the question is understandable.  Perhaps these
were users with more experience in other RDBMS's where it's more common
to have RLS, but there are at least a couple cases which I can think of
where that wouldn't apply.

> Even aside from security exposures, how
> does a non-superuser who runs pg_dump know whether they've got a
> complete backup or a filtered dump that's missing some rows?

This would be addressed with the GUC that's been proposed.  As would the
previous paragraph, though I wanted to apply to that independently.

> I'm not referring to the proposed implementation particularly; or at
> least not that aspect of it.  I don't think trying to run the view
> quals as the defining user is likely to be very appealing, because I
> think it's going to hurt performance, for example by preventing
> function inlining and requiring lots of user-ID switches.

I understand that there are performance implications.  As mentioned to
Tom, realistically, there's zero way to optimized at least some of these
use-cases because they require a completely external module (eg:
SELlinux) to be involved in the decision about who can view what
records.  If we can optimize that, it'd be by a completely different
approach whereby we pull up the qual higher because we know the whole
query only involves leakproof functions or similar, allowing us to only
apply the filter to the final set of records prior to them being
returned to the user.  The point being that such optimizations would
happen independently and regardless of the quals or user-defined
functions involved.  At the end of the day, I can't think of a better
optimization for such a case (where we have to ask an external security
module if a row is acceptable to return to the user) than that.  Is
there something specific you're thinking about that we'd be missing out
on?

> But I'm not
> gonna complain if someone wants to mull it over and make a proposal
> for how to make it work.  Rather, my concern is that all we've got is
> what might be called the core of the feature; the actual guts of it.
> There are a lot of ancillary details that seem to me to be not worked
> out at all yet, or only half-baked.

Perhaps it's just my experience, but I've been focused on the main core
feature for quite some time and it feels like we're really close to
having it there.  I agree that a few additional bits would be nice to
have but these strike me as relatively straight-forward to implement
overtop of this general construct.  I do see value in documenting these
concerns and will see about making that happen, along with what the
general viewpoints and thoughts are about how to address the concern.

> > How about "it's in high demand by our user base"?  In particular, it's
> > being asked for by a *highly* technical section of our user base who
> > uses these capabilities today, with this design, in those other
> > databases.
>
> Sure, that's a valid reason for considering any feature.  But it's not
> an excuse to overlook whatever design problems may exist.

Agreed- improvements in the design, provided it continues to meet the
expectations of the user-base, are absolutely welcome.

> > The current approach allows a nearly unlimited level of flexibility,
> > should the user wish it, by being able to run user-defined code.
> > Perhaps that would be considered 'one policy', but it could certainly
> > take under consideration the calling user, the object being queried
> > (if a function is defined per table, or if we provide a way to get
> > that information in the function), etc.
>
> In theory, that's true.  But in practice, performance will suck unless
> the security qual is easily optimizable.  If your security qual is
> WHERE somecomplexfunction() you're going to have to implement that by
> sequential-scanning the table and evaluating the function for each
> row.

That's not actualy true today, is it?  Given our leak-proof attribute,
if the qual is "WHERE somecomplexfunction() AND leakprooffunctionx()"
then we would be able to push down the leak-proof function and not
necessairly run a straight sequential scan, no?  Even so, though, we've
had users who have tested exactly what this patch implements and they've
been happy with their real-world use-cases.  I'm certainly all for
optimization and would love to see us make this better for everyone, but
I don't view that as a reason to delay this particular feature which is
really just bringing us up to parity with other RDMBS's.

> For example, I once worked at a company where we had a table
> containing information about our customers and potential customers.
> Sales representatives were allowed to see their own accounts, and
> partners were allowed to see accounts associated with that partner.
> These things were independent.  So for a sales rep, the security qual
> was WHERE sales_rep_id = <something> and for a partner the security
> qual was WHERE partner_id = <something>.  Now, you could maybe write
> this as a single qual, something like this:
>
> WHERE sales_rep_id = (SELECT oid FROM pg_authid WHERE rolname =
> current_user AND oid IN (SELECT id FROM person WHERE is_sales_rep)) OR
> partner_id = (SELECT p.org_id FROM pg_authid a, person p WHERE
> a.rolname = current_user and a.oid = p.id)

That looks like it'd work, or a pl/pgsql function which did the same.

> But that's probably not going to perform very well, because to match
> an index on sales_rep_id, or an index on partner_id, that's going to
> have to get simplified a whole lot, and that's probably not going to
> happen.  If we've only got one branch of the OR, I think we'll realize
> we can evaluate the subquery as an InitPlan and then use an index, but
> with two branches I think that will fail.

You're right- we could perform better in such a case.

What solution did you come up with for this case, which performed well
and was also secure..?

> I don't want to overstate the importance of this particular case; but
> I do think scenarios in which it's advantageous to have multiple
> row-level security policies are plausible.

I'm not against this in general.  The question, in my mind, is what
level of granularity we would provide this at.  As I tried to outline
previously, there's a huge number of combinations which we could come up
with to support this under and I'm not 100% sure that it'd actualy end
up being better than the simplicity of a single qual where the user gets
to define any kind of relationship they want between the various
policies; even programatically if they want.

> Another, perhaps-simpler
> example is that you might have a table containing unclassified data,
> classified data, and secret data.  You want to give access to the
> unclassified data only to one category of users; access to the
> unclassified data and the classified data to a second group of
> more-trusted users; and access to all of the data to a third group of
> very highly trusted users.  If the table can only have one security
> policy that applies to everyone who isn't exempt, how will you do
> that?  This sort of use case seems very plausible to me so I think we
> need to give some real thought to what we will recommend to users who
> want to do things like this.  Can the proposed patch handle it?  How?

There are multiple ways this could be implemented- the first, basic, way
would be through a table which maps users to security levels via an enum
where more privileged levels are higher in value and therefore a simple
greater-than could be applied after a join which would implement this
particular policy.

The reality (which I've had discussions with users about..) is actually
much more complicated where an extermal security module makes the
decision about if a given user/connection can have access to a specific
bit of labeled data.  The reason is that things are not classified so
simply as "unclass", "class" and "secret" but rather into much more
granular pieces- user X might have access to A and B, but not C, while
user Y can access B and C.  The absolute levels described above may
exist for less sensetive data but for data beyond that (which I'd hazard
to guess is most of it...), more granularity and control is needed.

> >> What this patch is doing is basically allowing a table to really be a
> >> view over itself.
> >
> > I don't agree with this characterization.  This patch specifically
> > allows filtering the rows returned from the table, and it intentionally
> > does not allow changing the data.
>
> I don't know what to say to this.  What I said is, quite literally,
> what the patch does.  It wraps the patch in an subquery RTE that is
> precisely the same thing you would get if you defined a
> security_barrier view with the security qual in the WHERE clause.

Exactly- it does *not* allow changing the SELECT clause, or adding in a
GROUP BY, or a JOIN, or tossing in a windowing function, etc.

> This is not a question of opinion; the patch either does that or it
> doesn't, and I think it does.

Apologies for not being clearer but my point was that only the WHERE
clause can be modified by this patch, which is quite intentional.  This
separates the concerns of "can I access this data" from "modify the data
to represent it in X way".

> > We are already looking at WITH CHECK OPTION-style support, but I
> > disagree that separate permissions or data changing will ever be a part
> > of RLS because then it's no longer RLS.
>
> What do you mean by "data changing"?  If you mean inserts, updates,
> and deletes, I am very sure people are going to want to perform those
> operations on RLS-enabled tables.

Yes, they'll want to support those operations.  However, they will not
expect RLS to allow them to redefine a columns as "x+10" instead of "x",
which a view does allow.

> Do you find it implausible that someone will want to exempt a certain
> role from RLS on only one table but not on other tables in the system?

No- excempting certain roles from RLS makes sense as a capability.

>  Do you find it implausible that someone will want to allow a certain
> table to bypass RLS when selecting rows, but not when updating or
> deleting them?  I find those scenarios very plausible.

This is also plausible and something which we were anticipating while
developing this patch.  Simon, KaiGai and I specifically discussed
addressing SELECT vs UPDATE/DELETE earlier this year, as I recall.
Providing that level of flexibility is absolutely on the road map, but I
don't know that it all has to exist in 9.5; it may, which would be
great, but I don't view it as required.

> > We have this problem with psql today, as has been discussed.  The fact
> > that pg_dump doesn't happen to have this problem is great but it's no
> > true solution for the problem at hand.
>
> It's true that users can break security by being incautious about the
> queries they type into psql, and I'm all for having better tools to
> manage that.  But a feature that causes currently-safe uses of pg_dump
> to become unsafe is, in my opinion, absolutely not OK.

I don't particularly like it, and would require a way to override it,
but a GUC which pg_dump sets by default that says "give me everything or
error" would work to address this.  I'm open to other thoughts, of
course, but it does seem like a relatively simple solution (which is a
good thing when it comes to security concerns, imv).

> I do agree with your argument that things like adding and removing
> columns, or changing their data types, could be simpler with RLS than
> in the view-over-table model - because in the view-over-table model,
> we don't really know whether the user would like a new column to
> cascade to the view, whereas in the RLS model, we can automatically do
> the right thing.

Agreed.
Thanks,
    Stephen

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Proposal for CSN based snapshots
Next
From: Stephen Frost
Date:
Subject: Re: API change advice: Passing plan invalidation info from the rewriter into the planner?