Re: API change advice: Passing plan invalidation info from the rewriter into the planner? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: API change advice: Passing plan invalidation info from the rewriter into the planner?
Date
Msg-id CA+TgmoZc2VUDNuPTjG193SZMPV_+63+EhzveCh6MC+4gRK+wAw@mail.gmail.com
Whole thread Raw
In response to Re: API change advice: Passing plan invalidation info from the rewriter into the planner?  (Stephen Frost <sfrost@snowman.net>)
Responses Re: API change advice: Passing plan invalidation info from the rewriter into the planner?  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Mon, Jun 16, 2014 at 1:15 AM, Stephen Frost <sfrost@snowman.net> wrote:
>> I'm not referring to the proposed implementation particularly; or at
>> least not that aspect of it.  I don't think trying to run the view
>> quals as the defining user is likely to be very appealing, because I
>> think it's going to hurt performance, for example by preventing
>> function inlining and requiring lots of user-ID switches.
>
> I understand that there are performance implications.  As mentioned to
> Tom, realistically, there's zero way to optimized at least some of these
> use-cases because they require a completely external module (eg:
> SELlinux) to be involved in the decision about who can view what
> records.  If we can optimize that, it'd be by a completely different
> approach whereby we pull up the qual higher because we know the whole
> query only involves leakproof functions or similar, allowing us to only
> apply the filter to the final set of records prior to them being
> returned to the user.  The point being that such optimizations would
> happen independently and regardless of the quals or user-defined
> functions involved.  At the end of the day, I can't think of a better
> optimization for such a case (where we have to ask an external security
> module if a row is acceptable to return to the user) than that.  Is
> there something specific you're thinking about that we'd be missing out
> on?

Yeah, if we have to ask an external security module a question for
each row, there's little hope of any real optimization.  However, I
think there will be a significant number of cases where people will
want filtering clauses that can be realized by doing an index scan
instead of a sequential scan, and if we end up forcing a sequential
scan anyway, the feature will be useless to those people.

>> But I'm not
>> gonna complain if someone wants to mull it over and make a proposal
>> for how to make it work.  Rather, my concern is that all we've got is
>> what might be called the core of the feature; the actual guts of it.
>> There are a lot of ancillary details that seem to me to be not worked
>> out at all yet, or only half-baked.
>
> Perhaps it's just my experience, but I've been focused on the main core
> feature for quite some time and it feels like we're really close to
> having it there.  I agree that a few additional bits would be nice to
> have but these strike me as relatively straight-forward to implement
> overtop of this general construct.  I do see value in documenting these
> concerns and will see about making that happen, along with what the
> general viewpoints and thoughts are about how to address the concern.

I feel like there's quite a bit of work left to do around these
issues.  The technical bits may not be too hard, but deciding what we
want will take some thought and discussion.

>> > The current approach allows a nearly unlimited level of flexibility,
>> > should the user wish it, by being able to run user-defined code.
>> > Perhaps that would be considered 'one policy', but it could certainly
>> > take under consideration the calling user, the object being queried
>> > (if a function is defined per table, or if we provide a way to get
>> > that information in the function), etc.
>>
>> In theory, that's true.  But in practice, performance will suck unless
>> the security qual is easily optimizable.  If your security qual is
>> WHERE somecomplexfunction() you're going to have to implement that by
>> sequential-scanning the table and evaluating the function for each
>> row.
>
> That's not actualy true today, is it?  Given our leak-proof attribute,
> if the qual is "WHERE somecomplexfunction() AND leakprooffunctionx()"
> then we would be able to push down the leak-proof function and not
> necessairly run a straight sequential scan, no?  Even so, though, we've
> had users who have tested exactly what this patch implements and they've
> been happy with their real-world use-cases.  I'm certainly all for
> optimization and would love to see us make this better for everyone, but
> I don't view that as a reason to delay this particular feature which is
> really just bringing us up to parity with other RDMBS's.

I'm a bit confused here, because your example seems to be totally
different from my example.  In my example, somecomplexfunction() will
get pushed down because it's the security qual; that needs to be
inside the security_barrier view, or a malicious user can subvert the
system by getting some other qual evaluated first.  In your example,
you seem to be imagining WHERE somecomplexfunction() AND
leakprooffunctionx() as queries sent by the untrusted user, in which
case, yet, the leak-proof one will get pushed down and the other one
will not.

>> But that's probably not going to perform very well, because to match
>> an index on sales_rep_id, or an index on partner_id, that's going to
>> have to get simplified a whole lot, and that's probably not going to
>> happen.  If we've only got one branch of the OR, I think we'll realize
>> we can evaluate the subquery as an InitPlan and then use an index, but
>> with two branches I think that will fail.
>
> You're right- we could perform better in such a case.
>
> What solution did you come up with for this case, which performed well
> and was also secure..?

I put the logic in the client.  :-(

>> I don't want to overstate the importance of this particular case; but
>> I do think scenarios in which it's advantageous to have multiple
>> row-level security policies are plausible.
>
> I'm not against this in general.  The question, in my mind, is what
> level of granularity we would provide this at.  As I tried to outline
> previously, there's a huge number of combinations which we could come up
> with to support this under and I'm not 100% sure that it'd actualy end
> up being better than the simplicity of a single qual where the user gets
> to define any kind of relationship they want between the various
> policies; even programatically if they want.

I agree.  That's why I think we need some more design work in this
area.  Perhaps it's OK to allow only one RLS-qual per table at most,
and tell people that if you want more than that, you need to use
security-barrier views as wrappers instead.  But I'm not sure; that
feels like it's giving something up that might be important.  And I
think that the kinds of syntax we're discussing won't support leaving
that out of the initial version and adding it later, so if we commit
to this syntax, we're stuck with that behavior.  To avoid that, we'd
need something like this:

ALTER TABLE tab ADD POLICY polname WHERE quals;
GRANT SELECT (polname) ON TABLE tab TO role;

>> Another, perhaps-simpler
>> example is that you might have a table containing unclassified data,
>> classified data, and secret data.  You want to give access to the
>> unclassified data only to one category of users; access to the
>> unclassified data and the classified data to a second group of
>> more-trusted users; and access to all of the data to a third group of
>> very highly trusted users.  If the table can only have one security
>> policy that applies to everyone who isn't exempt, how will you do
>> that?  This sort of use case seems very plausible to me so I think we
>> need to give some real thought to what we will recommend to users who
>> want to do things like this.  Can the proposed patch handle it?  How?
>
> There are multiple ways this could be implemented- the first, basic, way
> would be through a table which maps users to security levels via an enum
> where more privileged levels are higher in value and therefore a simple
> greater-than could be applied after a join which would implement this
> particular policy.

Interesting.

>> What do you mean by "data changing"?  If you mean inserts, updates,
>> and deletes, I am very sure people are going to want to perform those
>> operations on RLS-enabled tables.
>
> Yes, they'll want to support those operations.  However, they will not
> expect RLS to allow them to redefine a columns as "x+10" instead of "x",
> which a view does allow.

Hmm, I think some users do want to do things like this.  There are
previous discussions of wanting to fuzz a set of coordinates, for
example, or blank out a certain list of columns.

>>  Do you find it implausible that someone will want to allow a certain
>> table to bypass RLS when selecting rows, but not when updating or
>> deleting them?  I find those scenarios very plausible.
>
> This is also plausible and something which we were anticipating while
> developing this patch.  Simon, KaiGai and I specifically discussed
> addressing SELECT vs UPDATE/DELETE earlier this year, as I recall.
> Providing that level of flexibility is absolutely on the road map, but I
> don't know that it all has to exist in 9.5; it may, which would be
> great, but I don't view it as required.

I think we at least need to have a clear design for it before
committing anything.  Otherwise we may find that we've committed to
syntax which backs us into a corner.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: API change advice: Passing plan invalidation info from the rewriter into the planner?
Next
From: Alvaro Herrera
Date:
Subject: Re: pg_control is missing a field for LOBLKSIZE