Thread: Improving RLS qual pushdown
A while ago [1] I proposed an enhancement to the way qual pushdown safety is decided in RLS / security barrier views. Currently we just test for the presence of leaky functions in the qual, but it is possible to do better than that, by further testing if the leaky function is actually being passed information that we don't want to be leaked. Attached is a patch that does that, allowing restriction clause quals to be pushed down into subquery RTEs if they contain leaky functions, provided that the arglists of those leaky functions contain no Vars (such Vars would necessarily refer to output columns of the subquery, which is the data that must not be leaked). An example of the sort of query this will help optimise is: SELECT * FROM table_with_rls WHERE created > now() - '1 hour'::interval; where currently this qual cannot be pushed down because neither now() nor timestamptz_mi_interval() are leakproof, but since they are not being passed any data from the table, they can't actually leak anything, so the qual can be safely pushed down, allowing indexes to be used if available. In fact the majority of builtin functions aren't marked leakproof, and probably most user functions aren't either, so this could potentially be useful in a wide range of real-world queries, where it is common to write quals of the form <column> <operator> <expression>, and the expression may contain leaky functions. Regards, Dean [1] http://www.postgresql.org/message-id/CAEZATCWKcPfWiLoCnmfMSzKLgoaBz7AXKmGZ-Mk83Gd3JG8u1w@mail.gmail.com
Attachment
Dean, * Dean Rasheed (dean.a.rasheed@gmail.com) wrote: > A while ago [1] I proposed an enhancement to the way qual pushdown > safety is decided in RLS / security barrier views. Currently we just > test for the presence of leaky functions in the qual, but it is > possible to do better than that, by further testing if the leaky > function is actually being passed information that we don't want to be > leaked. This certainly sounds reasonable to me. > In fact the majority of builtin functions aren't marked leakproof, and > probably most user functions aren't either, so this could potentially > be useful in a wide range of real-world queries, where it is common to > write quals of the form <column> <operator> <expression>, and the > expression may contain leaky functions. Agreed. Looks like you've already added it to the next commitfest, which is great. I'm definitely interested but probably won't get to it right away as I have a few other things to address. Thanks! Stephen
On Fri, Jan 9, 2015 at 7:54 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > A while ago [1] I proposed an enhancement to the way qual pushdown > safety is decided in RLS / security barrier views. Currently we just > test for the presence of leaky functions in the qual, but it is > possible to do better than that, by further testing if the leaky > function is actually being passed information that we don't want to be > leaked. > > Attached is a patch that does that, allowing restriction clause quals > to be pushed down into subquery RTEs if they contain leaky functions, > provided that the arglists of those leaky functions contain no Vars > (such Vars would necessarily refer to output columns of the subquery, > which is the data that must not be leaked). > > An example of the sort of query this will help optimise is: > > SELECT * FROM table_with_rls WHERE created > now() - '1 hour'::interval; > > where currently this qual cannot be pushed down because neither now() > nor timestamptz_mi_interval() are leakproof, but since they are not > being passed any data from the table, they can't actually leak > anything, so the qual can be safely pushed down, allowing indexes to > be used if available. One thing they could still leak is the number of times they got called, and thus possibly the number of unseen rows. Now if the expressions get constant-folded away that won't be an issue, but a clever user can probably avoid that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 14 January 2015 at 13:29, Robert Haas <robertmhaas@gmail.com> wrote: > One thing they could still leak is the number of times they got > called, and thus possibly the number of unseen rows. Now if the > expressions get constant-folded away that won't be an issue, but a > clever user can probably avoid that. > Right now, EXPLAIN ANALYSE can be used to tell you the number of unseen rows. Is that something that people are concerned about, and are there any plans to change it? Regards, Dean
On Wed, Jan 14, 2015 at 9:22 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On 14 January 2015 at 13:29, Robert Haas <robertmhaas@gmail.com> wrote: >> One thing they could still leak is the number of times they got >> called, and thus possibly the number of unseen rows. Now if the >> expressions get constant-folded away that won't be an issue, but a >> clever user can probably avoid that. > > Right now, EXPLAIN ANALYSE can be used to tell you the number of > unseen rows. Is that something that people are concerned about, and > are there any plans to change it? Interesting question. I don't know. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Wed, Jan 14, 2015 at 9:22 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On 14 January 2015 at 13:29, Robert Haas <robertmhaas@gmail.com> wrote: > >> One thing they could still leak is the number of times they got > >> called, and thus possibly the number of unseen rows. Now if the > >> expressions get constant-folded away that won't be an issue, but a > >> clever user can probably avoid that. > > > > Right now, EXPLAIN ANALYSE can be used to tell you the number of > > unseen rows. Is that something that people are concerned about, and > > are there any plans to change it? > > Interesting question. I don't know. Wasn't this part of the "covert channel" discussion that took place way before RLS was committed? As I recall, it was argued that such covert channels are acceptable as long as their bandwidth is low. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
* Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > Robert Haas wrote: > > On Wed, Jan 14, 2015 at 9:22 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > > On 14 January 2015 at 13:29, Robert Haas <robertmhaas@gmail.com> wrote: > > >> One thing they could still leak is the number of times they got > > >> called, and thus possibly the number of unseen rows. Now if the > > >> expressions get constant-folded away that won't be an issue, but a > > >> clever user can probably avoid that. > > > > > > Right now, EXPLAIN ANALYSE can be used to tell you the number of > > > unseen rows. Is that something that people are concerned about, and > > > are there any plans to change it? > > > > Interesting question. I don't know. > > Wasn't this part of the "covert channel" discussion that took place way > before RLS was committed? As I recall, it was argued that such covert > channels are acceptable as long as their bandwidth is low. Yes, it was part of the discussion and no, there's no plans to try and hide row counts in explain analyze, nor to deal with things like unique constraint or foreign key reference violations. There are other areas which need improvement which will help address covert channel activity (better auditing, better control over what actions are allowed to diffferent users when it comes to creating objects, modifying permissions and policies, throttling, etc). Thanks, Stephen
Dean, * Dean Rasheed (dean.a.rasheed@gmail.com) wrote: > Attached is a patch that does that, allowing restriction clause quals > to be pushed down into subquery RTEs if they contain leaky functions, > provided that the arglists of those leaky functions contain no Vars > (such Vars would necessarily refer to output columns of the subquery, > which is the data that must not be leaked). > --- 1982,1989 ---- > * 2. If unsafeVolatile is set, the qual must not contain any volatile > * functions. > * > ! * 3. If unsafeLeaky is set, the qual must not contain any leaky functions > ! * that might reveal values from the subquery as side effects. I'd probably extend this comment to note that the way we figure that out is by looking for Vars being passed to non-leakproof functions. > --- 1318,1347 ---- > } > > /***************************************************************************** > ! * Check clauses for non-leakproof functions that might leak Vars > *****************************************************************************/ Might leak data in Vars (or somethhing along those lines...) > /* > ! * contain_leaked_vars > ! * Recursively scan a clause to discover whether it contains any Var > ! * nodes (of the current query level) that are passed as arguments to > ! * leaky functions. > * > ! * Returns true if any function call with side effects may be present in the > ! * clause and that function's arguments contain any Var nodes of the current > ! * query level. Qualifiers from outside a security_barrier view that leak > ! * Var nodes in this way should not be pushed down into the view, lest the > ! * contents of tuples intended to be filtered out be revealed via the side > ! * effects. > */ We don't actually know anything about the function and if it actually has any side effects or not, so it might better to avoid discussing that here. How about: Returns true if any non-leakproof function is passed in data through a Var node as that function might then leak see sensetive information. Only leakproof functions are allowed to see data prior to the qualifiers which are defined in the security_barrier view and therefore we can only push down qualifiers if they are either leakproof or if they aren't passed any Vars from this query level (and therefore they are not able to see any of the data in the tuple, even if they are pushed down). > --- 1408,1428 ---- > CoerceViaIO *expr = (CoerceViaIO *) node; > Oid funcid; > Oid ioparam; > + bool leaky; > bool varlena; > > getTypeInputInfo(exprType((Node *) expr->arg), > &funcid, &ioparam); > ! leaky = !get_func_leakproof(funcid); > > ! if (!leaky) > ! { > ! getTypeOutputInfo(expr->resulttype, &funcid, &varlena); > ! leaky = !get_func_leakproof(funcid); > ! } > ! > ! if (leaky && > ! contain_var_clause((Node *) expr->arg)) > return true; > } > break; That seems slightly convoluted to me. Why not simply do: bool in_leakproof, out_leakproof; getTypeInputInfo(exprType((Node *) expr->arg), &funcid, &ioparam); in_leakproof = get_func_leakproof(funcid); getTypeOutputInfo(expr->resulttype, &funcid, &varlena); out_leakproof = get_func_leakproof(funcid); if (!(in_leakproof && out_leakproof) && contain_var_clause((Node *)))return true; ... > --- 1432,1452 ---- > ArrayCoerceExpr *expr = (ArrayCoerceExpr *) node; > Oid funcid; > Oid ioparam; > + bool leaky; > bool varlena; > > getTypeInputInfo(exprType((Node *) expr->arg), > &funcid, &ioparam); > ! leaky = !get_func_leakproof(funcid); > ! > ! if (!leaky) > ! { > ! getTypeOutputInfo(expr->resulttype, &funcid, &varlena); > ! leaky = !get_func_leakproof(funcid); > ! } > ! > ! if (leaky && > ! contain_var_clause((Node *) expr->arg)) > return true; > } > break; Ditto here. > *************** contain_leaky_functions_walker(Node *nod > *** 1435,1446 **** > { > RowCompareExpr *rcexpr = (RowCompareExpr *) node; > ListCell *opid; > > ! foreach(opid, rcexpr->opnos) > { > Oid funcid = get_opcode(lfirst_oid(opid)); > > ! if (!get_func_leakproof(funcid)) > return true; > } > } > --- 1455,1472 ---- > { > RowCompareExpr *rcexpr = (RowCompareExpr *) node; > ListCell *opid; > + ListCell *larg; > + ListCell *rarg; > > ! forthree(opid, rcexpr->opnos, > ! larg, rcexpr->largs, > ! rarg, rcexpr->rargs) > { > Oid funcid = get_opcode(lfirst_oid(opid)); > > ! if (!get_func_leakproof(funcid) && > ! (contain_var_clause((Node *) lfirst(larg)) || > ! contain_var_clause((Node *) lfirst(rarg)))) > return true; > } > } Might look a bit cleaner as: /* Leakproof functions are ok */ if (get_func_leakproof(funcid))continue; /* Not leakproof, check if there are any Vars passed in */ if (contain_var_clause((Node *) lfirst(larg)) ||contain_var_clause((Node *) lfirst(rarg)))return true; The rest looked good to me. Thanks! Stephen
Thanks for looking at this. On 28 February 2015 at 04:25, Stephen Frost <sfrost@snowman.net> wrote: > Dean, > > * Dean Rasheed (dean.a.rasheed@gmail.com) wrote: >> Attached is a patch that does that, allowing restriction clause quals >> to be pushed down into subquery RTEs if they contain leaky functions, >> provided that the arglists of those leaky functions contain no Vars >> (such Vars would necessarily refer to output columns of the subquery, >> which is the data that must not be leaked). > >> --- 1982,1989 ---- >> * 2. If unsafeVolatile is set, the qual must not contain any volatile >> * functions. >> * >> ! * 3. If unsafeLeaky is set, the qual must not contain any leaky functions >> ! * that might reveal values from the subquery as side effects. > > I'd probably extend this comment to note that the way we figure that out > is by looking for Vars being passed to non-leakproof functions. > OK fair enough. >> --- 1318,1347 ---- >> } >> >> /***************************************************************************** >> ! * Check clauses for non-leakproof functions that might leak Vars >> *****************************************************************************/ > > Might leak data in Vars (or somethhing along those lines...) > Perhaps just "Check clauses for Vars passed to non-leakproof functions". The more detailed explanation below is probably then sufficient to cover why that's significant. >> /* >> ! * contain_leaked_vars >> ! * Recursively scan a clause to discover whether it contains any Var >> ! * nodes (of the current query level) that are passed as arguments to >> ! * leaky functions. >> * >> ! * Returns true if any function call with side effects may be present in the >> ! * clause and that function's arguments contain any Var nodes of the current >> ! * query level. Qualifiers from outside a security_barrier view that leak >> ! * Var nodes in this way should not be pushed down into the view, lest the >> ! * contents of tuples intended to be filtered out be revealed via the side >> ! * effects. >> */ > > We don't actually know anything about the function and if it actually > has any side effects or not, so it might better to avoid discussing > that here. How about: > > Returns true if any non-leakproof function is passed in data through a > Var node as that function might then leak see sensetive information. > Only leakproof functions are allowed to see data prior to the qualifiers > which are defined in the security_barrier view and therefore we can only > push down qualifiers if they are either leakproof or if they aren't > passed any Vars from this query level (and therefore they are not able > to see any of the data in the tuple, even if they are pushed down). > That sounds OK (except s/sensetive/sensitive). >> --- 1408,1428 ---- >> CoerceViaIO *expr = (CoerceViaIO *) node; >> Oid funcid; >> Oid ioparam; >> + bool leaky; >> bool varlena; >> >> getTypeInputInfo(exprType((Node *) expr->arg), >> &funcid, &ioparam); >> ! leaky = !get_func_leakproof(funcid); >> >> ! if (!leaky) >> ! { >> ! getTypeOutputInfo(expr->resulttype, &funcid, &varlena); >> ! leaky = !get_func_leakproof(funcid); >> ! } >> ! >> ! if (leaky && >> ! contain_var_clause((Node *) expr->arg)) >> return true; >> } >> break; > > That seems slightly convoluted to me. Why not simply do: > > bool in_leakproof, out_leakproof; > > getTypeInputInfo(exprType((Node *) expr->arg), &funcid, &ioparam); > in_leakproof = get_func_leakproof(funcid); > > getTypeOutputInfo(expr->resulttype, &funcid, &varlena); > out_leakproof = get_func_leakproof(funcid); > > if (!(in_leakproof && out_leakproof) && contain_var_clause((Node *))) > return true; > I'm not sure that's really much clearer, because of the 2 double-negatives in the final if-clause, and it's less efficient in the case where the input function is leaky, when it's not nessecary to check the output function (which involves 2 cache lookups). > ... > >> --- 1432,1452 ---- >> ArrayCoerceExpr *expr = (ArrayCoerceExpr *) node; >> Oid funcid; >> Oid ioparam; >> + bool leaky; >> bool varlena; >> >> getTypeInputInfo(exprType((Node *) expr->arg), >> &funcid, &ioparam); >> ! leaky = !get_func_leakproof(funcid); >> ! >> ! if (!leaky) >> ! { >> ! getTypeOutputInfo(expr->resulttype, &funcid, &varlena); >> ! leaky = !get_func_leakproof(funcid); >> ! } >> ! >> ! if (leaky && >> ! contain_var_clause((Node *) expr->arg)) >> return true; >> } >> break; > > Ditto here. > >> *************** contain_leaky_functions_walker(Node *nod >> *** 1435,1446 **** >> { >> RowCompareExpr *rcexpr = (RowCompareExpr *) node; >> ListCell *opid; >> >> ! foreach(opid, rcexpr->opnos) >> { >> Oid funcid = get_opcode(lfirst_oid(opid)); >> >> ! if (!get_func_leakproof(funcid)) >> return true; >> } >> } >> --- 1455,1472 ---- >> { >> RowCompareExpr *rcexpr = (RowCompareExpr *) node; >> ListCell *opid; >> + ListCell *larg; >> + ListCell *rarg; >> >> ! forthree(opid, rcexpr->opnos, >> ! larg, rcexpr->largs, >> ! rarg, rcexpr->rargs) >> { >> Oid funcid = get_opcode(lfirst_oid(opid)); >> >> ! if (!get_func_leakproof(funcid) && >> ! (contain_var_clause((Node *) lfirst(larg)) || >> ! contain_var_clause((Node *) lfirst(rarg)))) >> return true; >> } >> } > > Might look a bit cleaner as: > > /* Leakproof functions are ok */ > if (get_func_leakproof(funcid)) > continue; > > /* Not leakproof, check if there are any Vars passed in */ > if (contain_var_clause((Node *) lfirst(larg)) || > contain_var_clause((Node *) lfirst(rarg))) > return true; > Shrug. Not sure it makes much difference either way. > The rest looked good to me. > Thanks. Do you want me to post an update, or are you going to hack on it? Regards, Dean
Dean, * Dean Rasheed (dean.a.rasheed@gmail.com) wrote: > Thanks for looking at this. Thanks for working on it. :) > On 28 February 2015 at 04:25, Stephen Frost <sfrost@snowman.net> wrote: > > * Dean Rasheed (dean.a.rasheed@gmail.com) wrote: > >> --- 1318,1347 ---- > >> } > >> > >> /***************************************************************************** > >> ! * Check clauses for non-leakproof functions that might leak Vars > >> *****************************************************************************/ > > > > Might leak data in Vars (or somethhing along those lines...) > > > > Perhaps just "Check clauses for Vars passed to non-leakproof > functions". The more detailed explanation below is probably then > sufficient to cover why that's significant. Agreed, that looks fine to me. > That sounds OK (except s/sensetive/sensitive). I'm horrible with spelling. :) Thanks for catching it. > I'm not sure that's really much clearer, because of the 2 > double-negatives in the final if-clause, and it's less efficient in > the case where the input function is leaky, when it's not nessecary to > check the output function (which involves 2 cache lookups). Yeah, it is a bit less performant. > Shrug. Not sure it makes much difference either way. Alright. > > The rest looked good to me. > > > > Thanks. Do you want me to post an update, or are you going to hack on it? Either works for me, though I'm happy to handle the modifications to this if it means you have time to look at the other patches.. :) Thanks! Stephen
On 1 March 2015 at 18:23, Stephen Frost <sfrost@snowman.net> wrote: >> Thanks. Do you want me to post an update, or are you going to hack on it? > > Either works for me, though I'm happy to handle the modifications to > this if it means you have time to look at the other patches.. :) > OK, I'll continue testing the other patches. I want to think about RLS and inheritance a bit more. Regards, Dean
I took another look at this and came up with some alternate comment rewording. I also added a couple of additional comments that should hopefully clarify the code a bit. Regards, Dean
Attachment
Dean, * Dean Rasheed (dean.a.rasheed@gmail.com) wrote: > I took another look at this and came up with some alternate comment > rewording. I also added a couple of additional comments that should > hopefully clarify the code a bit. Finally got back to this. Made a few additional changes that made things look clearer, to me at least, and added documentation and comments. Also added a bit to the regression tests (in particular, an explicit check on the RLS side of this, not that it should be different, but just in case things diverge in the future). Please review and let me know if you see any issues. Thanks! Stephen
On 27 April 2015 at 17:33, Stephen Frost <sfrost@snowman.net> wrote: > Dean, > > * Dean Rasheed (dean.a.rasheed@gmail.com) wrote: >> I took another look at this and came up with some alternate comment >> rewording. I also added a couple of additional comments that should >> hopefully clarify the code a bit. > > Finally got back to this. Made a few additional changes that made > things look clearer, to me at least, and added documentation and > comments. Also added a bit to the regression tests (in particular, an > explicit check on the RLS side of this, not that it should be different, > but just in case things diverge in the future). Please review and let > me know if you see any issues. > Thanks, that all looks very good. Regards, Dean