Re: API change advice: Passing plan invalidation info from the rewriter into the planner? - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: API change advice: Passing plan invalidation info from the rewriter into the planner?
Date
Msg-id 20140612005917.GD2556@tamriel.snowman.net
Whole thread Raw
In response to Re: API change advice: Passing plan invalidation info from the rewriter into the planner?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: API change advice: Passing plan invalidation info from the rewriter into the planner?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
* Robert Haas (robertmhaas@gmail.com) wrote:
> On Wed, Jun 11, 2014 at 12:23 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > This argument could have been made for column-level privileges also, no?
>
> Not really.  First of all, we didn't have security_barrier views at
> that time, let alone security barrier views that are auto-updateable.

We had security definer functions, even set-returning ones, along with
rules and triggers.

> That's a really important piece of technology which makes filtering
> access via views feasible in ways that really were not feasible in the
> past.  Secondly, column-level permissions - like every other
> currently-existing type of permissions - are declarative.  They are an
> additional opportunity for the system to say "no" to something it
> otherwise would have allowed, but no user-defined code is executed.

We could try to avoid calling user-defined code for RLS, but it'd add a
whole lot of complexity and as far as I can see and your proposed
solution isn't avoiding the user-defined code anyway, so I'm not sure
why this solution should be required to meet that.

> Row-level security is not a chance for the system to deny access; it's
> a chance for user-defined code to take control and perform arbitrary
> operations.  So the scope of what we're contemplating for row-level
> security is really far, far more invasive than what we did for
> column-level privileges.

In this case the user-defined code needs to return a boolean.  We don't
currently do anything to prevent it from having side-effects, no, but
the same is true with views which incorporate functions.  I agree that
it makes a difference when compared to column-level privileges, but my
point was that we have provided easier ways to do things which were
possible using more complicated methods before.  Perhaps the risk with
RLS is higher but these issues look managable to me and the level of
doubt about our ability to provide this feature in a reasonable and
principled way that our users will understand surprises me.

> > I agree that views, or even security-definer functions, offer a great
> > deal of flexibility, and that may be necessary in some use-cases, but I
> > fail to see why that means we should avoid providing the mechanics to
> > achieve simple and usable RLS akin to what other major RDBMS's have.
>
> Because we don't have a good design.

We're using a design that's found in multiple other RDBMS's and used
extensively by certain industries which use those RDBMS's today.  I'm
certainly open to improving what is found in other systems for PG but I
have a hard time seeing this approach as a bad design.  Perhaps you're
referring to our implementation, in which case I might agree and things
like running the quals as the table owner is something which should be
considered (I don't know how the other RDBMS's operate in this regard
offhand- it'd be good to find out).

> I'm not categorically opposed to adding more RLS features to
> PostgreSQL and never have been; in fact, I was deeply involved in the
> original design of security barrier views and committed the original
> patch to add that functionality to PostgreSQL, without which none of
> what we're talking about here would be possible.  But the
> currently-proposed design is very unappealing to me, for the reasons
> that I've explained.  The right answer to "this feature doesn't
> provide anything that we don't already have and will introduce major
> new security exposures that haven't been adequate thought" is
> debatable, but "well other people have this so we should too" is
> definitely not it.

How about "it's in high demand by our user base"?  In particular, it's
being asked for by a *highly* technical section of our user base who
uses these capabilities today, with this design, in those other
databases.

> Craig's patch really hasn't grappled with any of
> these thorny definition and security issues; it's just about making
> the basic functionality work.  That's fine for a POC, but it's not
> enough for a feature that the project would be committing to maintain
> for the indefinite future.

Improving the patch is exactly what I'd like to do, but throwing out the
notion that RLS can't be allowed to execute user-defined code is cutting
the legs out of the feature completely- particularly with our system
where users can create all manner of objects in the system with their
own code being run.

> >> That's mighty useful for debugging, at least IMHO.  And, if you want
> >> to have several row-level security policies for different classes of
> >> users, just create more than one view and grant different privileges
> >> on each.
> >
> > I'm really not impressed with the idea that RLS should be done with
> > multiple different views of the same underlying table.
>
> Are you equally unimpressed with the idea that RLS as proposed can't
> support more than one security policy right now *at all*?  Because it
> seems to me that either you think multiple RLS policies on a single
> table is important (in which case the current patch is inadequate) or
> you think it's not important (in which case we need not argue about
> whether doing it with multiple views over the same underlying table is
> awkward).

The current approach allows a nearly unlimited level of flexibility,
should the user wish it, by being able to run user-defined code.
Perhaps that would be considered 'one policy', but it could certainly
take under consideration the calling user, the object being queried
(if a function is defined per table, or if we provide a way to get
that information in the function), etc.  What it wouldn't require is
the same object to be queried through different object names, which is
what I was principally objecting to.  What would it mean to have
mutliple RLS policies for a given object?  There would have to be some
criteria to distinguish which one would be applied, yet that can be
handled with the existing design by the user already, if they wish to.

Were we to preclude users from being able to have user-defined functions
called, then there's quite a bit of additional complexity we'd need to
replicate.  Per-user policies, per-role policies, a definition of which
one applies when, per-source-IP, per-connection-type (SSL vs. non-SSL),
per-security-label, etc..

> >> By contrast, it seems to me that every design so far proposed for
> >> something that is actually called row-level security - as opposed to
> >> commit 842faa714c0454d67e523f5a0b6df6500e9bc1a5, which *really is*
> >> row-level security, is extremely limited.  Look back at all the things
> >> listed in the previous paragraph; can you do those things easily with
> >> the designs that have been proposed?  As far as I can see, not really.
> >
> > I don't feel that RLS will, or even *should*, have the same level of
> > flexibility that you can achieve with views and/or security definer
> > functions.  I expect that, over time, we will add more capabilities to
> > it, but it's never going to be able to redefine the contents of a column
> > as a view can, nor will it be able to add columns to a table as views
> > can.  I don't see those as reasons against having support for RLS.
>
> What this patch is doing is basically allowing a table to really be a
> view over itself.

I don't agree with this characterization.  This patch specifically
allows filtering the rows returned from the table, and it intentionally
does not allow changing the data.

> If we choose to support that, I think it is
> absolutely inevitable that people are going to want all the same
> options that they would have if they really made a separate view -
> separate permissions, WITH CHECK OPTION, all of it.

We are already looking at WITH CHECK OPTION-style support, but I
disagree that separate permissions or data changing will ever be a part
of RLS because then it's no longer RLS.

> I find the
> contrary argument - that people will only want X amount and no more -
> simply not plausible.

I'm not sure where you are seeing the requests for this feature from,
but where I have heard them it's been to match what exists in other
RDBMS's which do not have the capabilities that you're describing users
will want- yet RLS is heavily used in those organizations.  For the use
cases that I've had in the past, RLS-as-defined would be the feature
that I want for most tables, with views for joins and data-changing
operations.

> If it's valuable to have some of those
> capabilities in an RLS framework, somebody's going to want all of
> them.  There's no bright line to divide the things that are valuable
> in that context from those that aren't.

I see the line quite clearly- RLS is about having a filtering mechanism
and that's it.  If it isn't filtering the rows (meaning giving back a
'true' or 'false' result for each row) then it's beyond RLS.

> > I'm glad to hear your thoughts on the level of granularity which might
> > be nice to have with RLS.  What would be great is to spend a bit more
> > time reviewing what other systems provide in this area and considering
> > what makes sense for us.  This will also be a feature and an area which
> > we will be improving for a long time to come, but we do need this
> > capability and we have to start somewhere.
>
> I think this definitely important.  I also think that we should be
> careful to study the deficiencies in those other systems and to
> clearly call out what value the capabilities we're thinking of adding
> to PostgreSQL 9.5 have over the status quo in PostgreSQL 9.4.  I'm not
> so much arguing that we shouldn't have row-level security as that, in
> every way that's really meaningful, we already do.

This is not the feeling that the users which I have been working with
have, nor does it match my feelings about this.  As mentioned in my
email to Tom just now, having another object to deal with adds
unnecessary complexity and will require application changes potentially
to implement over existing tables.

> >> There's no way for users who are RLS exempt to turn
> >> off their exemption for testing purposes, let alone on a per-table
> >> basis.
> >
> > I don't follow this argument entirely- users can't turn off the existing
> > permissions system for testing either, unless an authorized user with
> > the correct permissions makes the change to allow it- or the user bumps
> > themselves up to superuser, or to a role which has broader permissions,
> > both of which would also be possible to do with RLS.
>
> Sure, but in the existing system, the query either returns the same
> results for everybody, or it fails outright with an error.  It's
> certainly possible to screw up the existing permissions, but this new
> thing that's being proposed is much more complicated, because it's not
> just whether it works that's at issue, but what results you actually
> get.

I agree that we'll need to make sure we return the correct answer.
There is complexity there, but hopefully we've addressed much or all of
that with what we have in 9.4 and this is just adding a simpler and
often requested way to use that capability without the need to create
and manage another object in the system.

> >> There's no way to have multiple RLS policies on a single
> >> table.  All of those are things that we get "for free" in the
> >> view-over-table model, and implementing formal RLS basically requires
> >> us to either invent a new RLS-specific way of doing each of those
> >> things, or suffer along with a subset of the functionality.  Yuck.
> >
> > What would probably be good is to review the use-cases which the current
> > patch already addresses- and we've had good responses from actual users
> > who are already playing with the patch and are hearing that it is
> > addressing their requirements.
>
> Yes.  And in particular, I think we should have a much clearer
> statement than we currently do about the use cases in which it falls
> short.

I'm happy to have that discussion with the users who are asking for this
but in the conversations that I've had to date, updatable s.b. views are
not RLS to them and I have to agree- having to maintain twice as many
objects in the system which have to be named differently and have
permissions which can be distinct from each other (which is something
that could be a *problem* if it isn't intended), must both be updated
when adding or removing columns, etc, makes that solution quite
unappealing.

> > You're suggesting that we use views instead, which clearly could run
> > someone else's code.  Perhaps the user will notice that they're
> > selecting from a view instead of a table, but I've never seen a security
> > design around making sure that what is being select'd from is a table
> > vs. a view.  Have you seen applications which implement such a check
> > prior to running a query?
>
> Yes.  pg_dump, to name one really important one.  I wouldn't be
> surprised if graphical clients did something similar - display the
> table data for a table, or the view definition for a view.

I'm quite sure you can select back the data from a view in every
graphical client that exists- and without any warning popping up that
you might be running code that someone else wrote.  Yes, you can also
get the definition of the view in many cases and you can tell if what
you're selecting is a view or a table but that doesn't mean people are
actively being paranoid about that distinction or worrying about the
other cases where user-defined code might be run, even when selecting
from a table, in general.

> But I
> admit to not having checked that.  More than that, if I were a DBA,
> I'd certainly be darn careful about selecting from untrusted views,
> but I expect to be able to read a table, or run pg_dump, without
> getting my account hacked.

I'd love to hear how you decide which views are trusted and which are
not.  Last I checked, most serious attacks still come from internal
individuals rather than external ones.  Don't get me wrong- we
definitely have an issue here that it'd be great to find a solution to,
as has been discussed extensively, but I don't see RLS as making that
problem particularly worse, and really, excluding superusers and having
the option for other users to be excluded goes above what we've done to
date in other areas.

> I don't think restricting what can go into an RLS policy is the right
> answer; that to me misses the point.  What needs to be restricted is
> the possibility that a user will inadvertently run code they didn't
> mean to run.

I'm glad that you agree that restricting the RLS policy isn't the right
answer.  I agree that we want to come up with a way to prevent users
from running code that isn't safe or isn't intended.  I still don't see
RLS as making that particularly worse.  The system is really nearly
unusable in any interactive way if you restrict yourself to operations
which can't possibly run any user-defined code today.  There have been
discussions about ways to possibly improve that, and those ways would
need to address the RLS case in addition to the other already existing
cases but I don't see that as a signifigant increase in the amount of
work required to address that problem (which is already quite large..).

> > I don't see this as being an insurmountable issue.  I agree that having
> > a way for pg_dump to run safely is important and the superuser check
> > does address that, given that we don't have a "read-only (and
> > everything)" capability today.  Once we do (and I surely hope that will
> > come sooner rather than later), such a role should also have the 'no
> > RLS' bit, as it wouldn't make any sense for such a role anyway.  The
> > lack of that is not a strike against RLS though.
>
> It addresses running pg_dump *as the superuser*, but not as a database
> owner or just a regular users.  If unprivileged user A runs pg_dump -t
> some_table_owned_by_user_b, and falls victim to a Trojan horse, that
> is going to get reported as a security defect in PostgreSQL.  Telling
> the person who reports that issue that it's design behavior is not
> going to make them happy, or result in good press coverage for
> PostgreSQL.

We have this problem with psql today, as has been discussed.  The fact
that pg_dump doesn't happen to have this problem is great but it's no
true solution for the problem at hand.

> >> 2. There's a danger that the functionality available in the two models
> >> will diverge, so that certain things can only be done in one world or
> >> the other.
> >
> > They will always be distinct, intentionally so.
>
> I think that's an absolutely terrible idea.  We do not want to be in
> the business of having two parallel systems with slightly different
> capabilities and syntax that are providing the same fundamental
> functionality.  And they are: the proposal for RLS is to make it work
> just like a security_barrier view, sharing a common implementation.

While RLS could be viewed as providing a subset of what updatable sb
views provide, I can see a clear line between the two and, for my part,
we should allow users to make their own decision about if they want the
complexity involved with maintaining another object in the system to
provide the filtering or if they want to implement the filtering and the
data manipulation, joins, etc, independently.

That's really another big point to be made here- there's value in
separating these concerns.  Security is a big enough concern and a big
enough issue that being able to address it explicitly and with a simple
syntax is extremely valuable.  RLS as we've been discussing it allows
that, while having to include it in more complicated view definitions
could make it much more difficult to reason about.  I suppose one could
define a view for just the filtering and then another view for the data
manipulation and joining over top of the other views, but, again, that
adds another level of complexity that isn't needed- and you can't be
100% sure that the only thing the supposedly filtering view is doing is
*just* filtering unless you audit it regularly.

> >> 4. Making SELECT * FROM tab ambiguous seems likely to be a security minefield.
> >
> > While I agree that we need to consider this, I don't think it will be a
> > "minefield", but rather something we need to document and educate our
> > users about.  If you'd like a "disable-all-RLS" GUC, I'm all for it.
>
> I would definitely like that.  I have proposed it in the past.

Great.
Thanks,
    Stephen

pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: [patch] build issues on Win32
Next
From: Noah Misch
Date:
Subject: Re: wrapping in extended mode doesn't work well with default pager