Re: API change advice: Passing plan invalidation info from the rewriter into the planner? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: API change advice: Passing plan invalidation info from the rewriter into the planner?
Date
Msg-id CA+TgmoZu76cRBQEKNNHKRMmGhMfns0VWvD4HX+KHP-PgpVvmfQ@mail.gmail.com
Whole thread Raw
In response to Re: API change advice: Passing plan invalidation info from the rewriter into the planner?  (Stephen Frost <sfrost@snowman.net>)
Responses Re: API change advice: Passing plan invalidation info from the rewriter into the planner?  (Kevin Grittner <kgrittn@ymail.com>)
Re: API change advice: Passing plan invalidation info from the rewriter into the planner?  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Wed, Jun 11, 2014 at 8:59 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> Row-level security is not a chance for the system to deny access; it's
>> a chance for user-defined code to take control and perform arbitrary
>> operations.  So the scope of what we're contemplating for row-level
>> security is really far, far more invasive than what we did for
>> column-level privileges.
>
> In this case the user-defined code needs to return a boolean.  We don't
> currently do anything to prevent it from having side-effects, no, but
> the same is true with views which incorporate functions.  I agree that
> it makes a difference when compared to column-level privileges, but my
> point was that we have provided easier ways to do things which were
> possible using more complicated methods before.  Perhaps the risk with
> RLS is higher but these issues look managable to me and the level of
> doubt about our ability to provide this feature in a reasonable and
> principled way that our users will understand surprises me.

I'm glad the issues look manageable to you, but you haven't really
explained how to manage them.  The way to dispel doubt is to come up
with specific technical proposals that address the technical issues
that have been raised.  I accept that you are surprised that someone
might not think we are on the right course here, but it's entirely
appropriate for me to express my doubts about this or any other patch,
much as many people do in regards to many patches that are posted here
- generally for good and valid reasons.

For my part, I'm mildly surprised that anyone thinks it's a good idea
to have SELECT * FROM tab to mean different things depending on who is
typing it.  To me, that seems very confusing; how does an unprivileged
user with no ability to assume some other role validate that the row
security policy they've configured works at all and exposes precisely
the intended set of rows?  Even aside from security exposures, how
does a non-superuser who runs pg_dump know whether they've got a
complete backup or a filtered dump that's missing some rows?  A
filtered dump might not even be restorable if foreign keys are
involved.  I think those are serious issues that deserve serious
thought and consideration, not just a vague assurance that the issues
are probably manageable.

>> Because we don't have a good design.
>
> We're using a design that's found in multiple other RDBMS's and used
> extensively by certain industries which use those RDBMS's today.  I'm
> certainly open to improving what is found in other systems for PG but I
> have a hard time seeing this approach as a bad design.  Perhaps you're
> referring to our implementation, in which case I might agree and things
> like running the quals as the table owner is something which should be
> considered (I don't know how the other RDBMS's operate in this regard
> offhand- it'd be good to find out).

I'm not referring to the proposed implementation particularly; or at
least not that aspect of it.  I don't think trying to run the view
quals as the defining user is likely to be very appealing, because I
think it's going to hurt performance, for example by preventing
function inlining and requiring lots of user-ID switches.  But I'm not
gonna complain if someone wants to mull it over and make a proposal
for how to make it work.  Rather, my concern is that all we've got is
what might be called the core of the feature; the actual guts of it.
There are a lot of ancillary details that seem to me to be not worked
out at all yet, or only half-baked.

>> I'm not categorically opposed to adding more RLS features to
>> PostgreSQL and never have been; in fact, I was deeply involved in the
>> original design of security barrier views and committed the original
>> patch to add that functionality to PostgreSQL, without which none of
>> what we're talking about here would be possible.  But the
>> currently-proposed design is very unappealing to me, for the reasons
>> that I've explained.  The right answer to "this feature doesn't
>> provide anything that we don't already have and will introduce major
>> new security exposures that haven't been adequate thought" is
>> debatable, but "well other people have this so we should too" is
>> definitely not it.
>
> How about "it's in high demand by our user base"?  In particular, it's
> being asked for by a *highly* technical section of our user base who
> uses these capabilities today, with this design, in those other
> databases.

Sure, that's a valid reason for considering any feature.  But it's not
an excuse to overlook whatever design problems may exist.

>> Are you equally unimpressed with the idea that RLS as proposed can't
>> support more than one security policy right now *at all*?  Because it
>> seems to me that either you think multiple RLS policies on a single
>> table is important (in which case the current patch is inadequate) or
>> you think it's not important (in which case we need not argue about
>> whether doing it with multiple views over the same underlying table is
>> awkward).
>
> The current approach allows a nearly unlimited level of flexibility,
> should the user wish it, by being able to run user-defined code.
> Perhaps that would be considered 'one policy', but it could certainly
> take under consideration the calling user, the object being queried
> (if a function is defined per table, or if we provide a way to get
> that information in the function), etc.

In theory, that's true.  But in practice, performance will suck unless
the security qual is easily optimizable.  If your security qual is
WHERE somecomplexfunction() you're going to have to implement that by
sequential-scanning the table and evaluating the function for each
row.

For example, I once worked at a company where we had a table
containing information about our customers and potential customers.
Sales representatives were allowed to see their own accounts, and
partners were allowed to see accounts associated with that partner.
These things were independent.  So for a sales rep, the security qual
was WHERE sales_rep_id = <something> and for a partner the security
qual was WHERE partner_id = <something>.  Now, you could maybe write
this as a single qual, something like this:

WHERE sales_rep_id = (SELECT oid FROM pg_authid WHERE rolname =
current_user AND oid IN (SELECT id FROM person WHERE is_sales_rep)) OR
partner_id = (SELECT p.org_id FROM pg_authid a, person p WHERE
a.rolname = current_user and a.oid = p.id)

But that's probably not going to perform very well, because to match
an index on sales_rep_id, or an index on partner_id, that's going to
have to get simplified a whole lot, and that's probably not going to
happen.  If we've only got one branch of the OR, I think we'll realize
we can evaluate the subquery as an InitPlan and then use an index, but
with two branches I think that will fail.

I don't want to overstate the importance of this particular case; but
I do think scenarios in which it's advantageous to have multiple
row-level security policies are plausible.  Another, perhaps-simpler
example is that you might have a table containing unclassified data,
classified data, and secret data.  You want to give access to the
unclassified data only to one category of users; access to the
unclassified data and the classified data to a second group of
more-trusted users; and access to all of the data to a third group of
very highly trusted users.  If the table can only have one security
policy that applies to everyone who isn't exempt, how will you do
that?  This sort of use case seems very plausible to me so I think we
need to give some real thought to what we will recommend to users who
want to do things like this.  Can the proposed patch handle it?  How?

>> > I don't feel that RLS will, or even *should*, have the same level of
>> > flexibility that you can achieve with views and/or security definer
>> > functions.  I expect that, over time, we will add more capabilities to
>> > it, but it's never going to be able to redefine the contents of a column
>> > as a view can, nor will it be able to add columns to a table as views
>> > can.  I don't see those as reasons against having support for RLS.
>>
>> What this patch is doing is basically allowing a table to really be a
>> view over itself.
>
> I don't agree with this characterization.  This patch specifically
> allows filtering the rows returned from the table, and it intentionally
> does not allow changing the data.

I don't know what to say to this.  What I said is, quite literally,
what the patch does.  It wraps the patch in an subquery RTE that is
precisely the same thing you would get if you defined a
security_barrier view with the security qual in the WHERE clause.
This is not a question of opinion; the patch either does that or it
doesn't, and I think it does.

>> If we choose to support that, I think it is
>> absolutely inevitable that people are going to want all the same
>> options that they would have if they really made a separate view -
>> separate permissions, WITH CHECK OPTION, all of it.
>
> We are already looking at WITH CHECK OPTION-style support, but I
> disagree that separate permissions or data changing will ever be a part
> of RLS because then it's no longer RLS.

What do you mean by "data changing"?  If you mean inserts, updates,
and deletes, I am very sure people are going to want to perform those
operations on RLS-enabled tables.

Do you find it implausible that someone will want to exempt a certain
role from RLS on only one table but not on other tables in the system?Do you find it implausible that someone will want
toallow a certain
 
table to bypass RLS when selecting rows, but not when updating or
deleting them?  I find those scenarios very plausible.

>> It addresses running pg_dump *as the superuser*, but not as a database
>> owner or just a regular users.  If unprivileged user A runs pg_dump -t
>> some_table_owned_by_user_b, and falls victim to a Trojan horse, that
>> is going to get reported as a security defect in PostgreSQL.  Telling
>> the person who reports that issue that it's design behavior is not
>> going to make them happy, or result in good press coverage for
>> PostgreSQL.
>
> We have this problem with psql today, as has been discussed.  The fact
> that pg_dump doesn't happen to have this problem is great but it's no
> true solution for the problem at hand.

It's true that users can break security by being incautious about the
queries they type into psql, and I'm all for having better tools to
manage that.  But a feature that causes currently-safe uses of pg_dump
to become unsafe is, in my opinion, absolutely not OK.

I do agree with your argument that things like adding and removing
columns, or changing their data types, could be simpler with RLS than
in the view-over-table model - because in the view-over-table model,
we don't really know whether the user would like a new column to
cascade to the view, whereas in the RLS model, we can automatically do
the right thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: wrapping in extended mode doesn't work well with default pager
Next
From: Amit Kapila
Date:
Subject: Few observations in replication slots related code