Re: API change advice: Passing plan invalidation info from the rewriter into the planner? - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: API change advice: Passing plan invalidation info from the rewriter into the planner?
Date
Msg-id 20140618022513.GH16098@tamriel.snowman.net
Whole thread Raw
In response to Re: API change advice: Passing plan invalidation info from the rewriter into the planner?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: API change advice: Passing plan invalidation info from the rewriter into the planner?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
* Robert Haas (robertmhaas@gmail.com) wrote:
> On Mon, Jun 16, 2014 at 1:15 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > I understand that there are performance implications.  As mentioned to
> > Tom, realistically, there's zero way to optimized at least some of these
> > use-cases because they require a completely external module (eg:
> > SELlinux) to be involved in the decision about who can view what
> > records.  If we can optimize that, it'd be by a completely different
> > approach whereby we pull up the qual higher because we know the whole
> > query only involves leakproof functions or similar, allowing us to only
> > apply the filter to the final set of records prior to them being
> > returned to the user.  The point being that such optimizations would
> > happen independently and regardless of the quals or user-defined
> > functions involved.  At the end of the day, I can't think of a better
> > optimization for such a case (where we have to ask an external security
> > module if a row is acceptable to return to the user) than that.  Is
> > there something specific you're thinking about that we'd be missing out
> > on?
>
> Yeah, if we have to ask an external security module a question for
> each row, there's little hope of any real optimization.  However, I
> think there will be a significant number of cases where people will
> want filtering clauses that can be realized by doing an index scan
> instead of a sequential scan, and if we end up forcing a sequential
> scan anyway, the feature will be useless to those people.

I agree that we want to support that, if we can do so reasonably.  What
I was trying to get at is simply this- don't we provide that already
with the leakproof attribute and functions?  If we don't have enough
there to allow index scans then we should be looking to add more, I'm
thinking.

> > Perhaps it's just my experience, but I've been focused on the main core
> > feature for quite some time and it feels like we're really close to
> > having it there.  I agree that a few additional bits would be nice to
> > have but these strike me as relatively straight-forward to implement
> > overtop of this general construct.  I do see value in documenting these
> > concerns and will see about making that happen, along with what the
> > general viewpoints and thoughts are about how to address the concern.
>
> I feel like there's quite a bit of work left to do around these
> issues.  The technical bits may not be too hard, but deciding what we
> want will take some thought and discussion.

I agree on this point, but I'm still hopeful that we'll be able to get a
good feature into 9.5.  There are quite a few resources available for
the 'just programming' part, so the long pole in the tent here is
absolutely hashing out what we want and how it should function.

I'd be happy to host or participate in a conference call or similar if
that would be useful to move this along- or we can continue to
communicate via email.  There's a bit of a lull in conferences to which
I'm going to right now, so in person is unlikely, unless folks want to
get together somewhere on the east coast (I'd be happy to travel to
Philly, Pittsburgh, NYC, etc, if it'd help..).

> > That's not actualy true today, is it?  Given our leak-proof attribute,
> > if the qual is "WHERE somecomplexfunction() AND leakprooffunctionx()"
> > then we would be able to push down the leak-proof function and not
> > necessairly run a straight sequential scan, no?  Even so, though, we've
> > had users who have tested exactly what this patch implements and they've
> > been happy with their real-world use-cases.  I'm certainly all for
> > optimization and would love to see us make this better for everyone, but
> > I don't view that as a reason to delay this particular feature which is
> > really just bringing us up to parity with other RDMBS's.
>
> I'm a bit confused here, because your example seems to be totally
> different from my example.  In my example, somecomplexfunction() will
> get pushed down because it's the security qual; that needs to be
> inside the security_barrier view, or a malicious user can subvert the
> system by getting some other qual evaluated first.  In your example,
> you seem to be imagining WHERE somecomplexfunction() AND
> leakprooffunctionx() as queries sent by the untrusted user, in which
> case, yet, the leak-proof one will get pushed down and the other one
> will not.

Right- my point there was that the leakproof one might allow an index
scan to be run.  This is all pretty hand-wavey, I admit, so I'll see if
I can get more details about how the currently-proposed patch is
performing for the users who are testing it and what kind of plans
they're seeing.  If that falls through, I'll try and build up my own set
of realistic-looking (to myself and the users who are testing) example.

> > What solution did you come up with for this case, which performed well
> > and was also secure..?
>
> I put the logic in the client.  :-(

Well, that's not helpful here. ;)

> >> I don't want to overstate the importance of this particular case; but
> >> I do think scenarios in which it's advantageous to have multiple
> >> row-level security policies are plausible.
> >
> > I'm not against this in general.  The question, in my mind, is what
> > level of granularity we would provide this at.  As I tried to outline
> > previously, there's a huge number of combinations which we could come up
> > with to support this under and I'm not 100% sure that it'd actualy end
> > up being better than the simplicity of a single qual where the user gets
> > to define any kind of relationship they want between the various
> > policies; even programatically if they want.
>
> I agree.  That's why I think we need some more design work in this
> area.  Perhaps it's OK to allow only one RLS-qual per table at most,
> and tell people that if you want more than that, you need to use
> security-barrier views as wrappers instead.

Note that my suggestion would be to simply put a pl/pgsql call (perhaps
a security definer one) into the RLS definition- not to say "use views".

> But I'm not sure; that
> feels like it's giving something up that might be important.  And I
> think that the kinds of syntax we're discussing won't support leaving
> that out of the initial version and adding it later, so if we commit
> to this syntax, we're stuck with that behavior.  To avoid that, we'd
> need something like this:
>
> ALTER TABLE tab ADD POLICY polname WHERE quals;
> GRANT SELECT (polname) ON TABLE tab TO role;

Right, if we were to support multiple policies on a given table then we
would have to support adding and removing them individually, as well as
specify when they are to be applied- and what if that "when" overlaps?
Do we apply both and only a row which passed them all gets sent to the
user?  Essentially we'd be defining the RLS policies to be AND'd
together, right?  Would we want to support both AND-based and OR-based,
and allow users to pick what set of conditionals they want applied to
their various overlapping RLS policies?

Sounds all rather painful and much better done programatically by the
user in a language which is suited to that task- eg: pl/pgsql, perl, C,
or something besides our ALTER syntax + catalog representation.

> >> What do you mean by "data changing"?  If you mean inserts, updates,
> >> and deletes, I am very sure people are going to want to perform those
> >> operations on RLS-enabled tables.
> >
> > Yes, they'll want to support those operations.  However, they will not
> > expect RLS to allow them to redefine a columns as "x+10" instead of "x",
> > which a view does allow.
>
> Hmm, I think some users do want to do things like this.  There are
> previous discussions of wanting to fuzz a set of coordinates, for
> example, or blank out a certain list of columns.

Absolutely they'll want to be able to do this- but that's going to be a
case which I (and others, I think) feel comfortable going back and
saying "use views for that".  I'm trying to draw that line in the ground
between what is RLS and what are views and keeping RLS to the WHERE
clause strikes me as a good line to draw (and one which matches up with
existing expectations in this space).

> > This is also plausible and something which we were anticipating while
> > developing this patch.  Simon, KaiGai and I specifically discussed
> > addressing SELECT vs UPDATE/DELETE earlier this year, as I recall.
> > Providing that level of flexibility is absolutely on the road map, but I
> > don't know that it all has to exist in 9.5; it may, which would be
> > great, but I don't view it as required.
>
> I think we at least need to have a clear design for it before
> committing anything.  Otherwise we may find that we've committed to
> syntax which backs us into a corner.

Fair enough.  There was some support for this idea in the original patch
by Craig, but we can further develop this syntax (and what it may look
like for 9.5, if it ends up not covering all cases).
Thanks!
    Stephen

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: API change advice: Passing plan invalidation info from the rewriter into the planner?
Next
From: Tom Lane
Date:
Subject: Re: BUG #10680 - ldapbindpasswd leaks to postgresql log