Re: API change advice: Passing plan invalidation info from the rewriter into the planner? - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: API change advice: Passing plan invalidation info from the rewriter into the planner?
Date
Msg-id 20140611234656.GC2556@tamriel.snowman.net
Whole thread Raw
In response to Re: API change advice: Passing plan invalidation info from the rewriter into the planner?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Robert Haas (robertmhaas@gmail.com) wrote:
> >> I'm really concerned about the security implications of this patch.  I
> >> think we're setting ourselves up for a whole lot of hurt for somewhat
> >> unclear gain.
>
> > I'm certainly of a different opinion and, for the most part, I feel that
> > if there are security concerns then they need to be addressed- and
> > better by us than by asking users to use some other mechanism to
> > implement RLS.
>
> TBH, I found Robert's argument pretty persuasive.  The idea that
> "SELECT * FROM table" might invoke arbitrary processing ought to scare
> anyone who's concerned about security, because that's going to completely
> break any assumptions about pg_dump being safe for instance, as well as
> force top-to-bottom rethinking of many other security assumptions.

SELECT triggers for a wide variety of use-cases are pretty commonly
asked for here and are something I'd like to see us support also.  There
are also quite a few ways in which a select can end up executing code.
Today it requires more than 'select * from table;', but not very
much..  I agree that it'd be good if we had a way to address that but I
continue to view that as an independent issue.

What I haven't heard any comments on, yet found interesting, was the
idea of having the RLS quals run as the owner of the table.  Would that
address these concerns?  I seem to recall wondering why we didn't do
that for views in the first place, though I doubt we could change it
now even if we wanted to (and I'm guessing the spec has something to say
about this, though I haven't gone and looked and don't remember
offhand).  It's certainly rather curious that functions called under a
view are run as the calling user while permissions checks on relations
referred to by the view are as the view owner.

Hopefully that will make the rest of this discussion less relevant, but
I'll respond with my feelings anyway.

> > ...  That commit was
> > the ground-work to allow us to finally get proper RLS and I'm very
> > disappointed to hear that the mechanical pieces around making RLS easy
> > for users to use (and getting that check-box taken care of in a wide
> > variety of fields that we are being exposed to now, see the PGConf.NYC
> > keynote speakers...) is receiving such push-back.
>
> If this is being sold as merely "ease of use", then it is probably going
> to get rejected.  In order to get some extra ease of use for the minority
> of users who need RLS, you are going to significantly complicate the lives
> of all Postgres users.  That's not a net win in any sane calculation of
> ease of use.

I don't view this as being at all accurate- how is this complicating the
lives of all Postgres users?  If they are worried about running user
defined code then they *already* have a lot to worry about.

While the users of RLS might be less than 50% and therefore the
minority, I expect it will have quite a bit of up-take in certain
industries and I know that our lack of any RLS is currently preventing
use of Postgres in some rather important cases.

As for it being ease-of-use, again, there are ways in which column level
privileges could have been dealt with using views, rules, security
definer functions, etc, but that doesn't mean we don't want that
feature.  I certainly view RLS (and have for quite some time..) as a much
needed capability, even if it can be done today using a bunch of user
written code that must be security audited.

> Maybe the right thing to think about is how we can make it easier to set
> up table + view combinations according to the pattern Robert described.

While this sounds interesting, I don't see adding columns or redefining
them as being in the perview of RLS.  The current approach of
allowing a boolean expression to be defined is both extremely flexible
while also being simple when the requirement is simple.  Having to
create, manage, update, etc, an independent object would add unnecessary
complexity.

Perhaps having it be a boolean expression is too much flexibility but
the alternatives that I can think of aren't terribly attractive to me
and the boolean expression approach is what folks coming from other
RDBMS's will be familiar with and understand how to build their
applications around.  We may need to provide some additional pieces
around this (perhaps a trigger-like function type which also gets
information about the object being queried, etc) but the point is to
have a straight-forward and simply reasoned about way of limiting what
data is returned.

> I wouldn't have a problem with some more-or-less-automated support for
> doing that.  (Consider SERIAL as a possible precedent here: it's basically
> a table creation macro.)

Perhaps there's a way to make that work, but personally it looks like a
whole bunch more work and I don't see the gain.  How would adding RLS to
an existing table work?  It's worse than the SERIAL case as at least
a default clause can be added later without impacting the application
code.  Would the functions referenced through such a view run as the
user of the view?

> > You're suggesting that we use views instead, which clearly could run
> > someone else's code.  Perhaps the user will notice that they're
> > selecting from a view instead of a table, but I've never seen a security
> > design around making sure that what is being select'd from is a table
> > vs. a view.
>
> pg_dump is a sufficient counterexample to that statement.

No, it isn't.  pg_dump's defined purpose is explicitly to pull out the
data contents underneath, or the definition of the object, which means
it happens to issue explicit select * from table's (or COPY commands)
for tables and pull the view definition for views.

There's no way to even ask it to dump out the contents of a view (rather
than the definition of it).  I don't consider that a security design
which checks if the object *that we're asking to select the contents of*
is checked to see if it's a view or a table, in order to avoid calling
user-defined code.

I agree that pg_dump takes many precautions to avoid running user code
in a way which could be dangerous, both to avoid security issues and
because its goal is to reproduce the system exactly as it was, and
running user code would likely cause problems for that.

I still do not buy this argument that individuals or applications pay
much more attention to selecting from views than they do selecting from
tables, or generally go out of their way to try and avoid running user
defined code (indeed, much of the point is to be able to add such things
without having to change the application around..).

We care about these issues a great deal in pg_dump, rightfully, but
psql, pgAdmin3, Perl DBD/DBI, libpq-using application, etc, etc, have no
mechanism to say "give me just the data and only the data and don't run
any user-defined code".  Adding that capability might be interesting if
we can figure out how exactly to define it but it's still an orthogonal
issue to RLS, imv.
Thanks,
    Stephen

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [GENERAL] Question about partial functional indexes and the query planner
Next
From: Noah Misch
Date:
Subject: Re: [patch] build issues on Win32