Thread: Improving RLS qual pushdown

Improving RLS qual pushdown

From
Dean Rasheed
Date:
A while ago [1] I proposed an enhancement to the way qual pushdown
safety is decided in RLS / security barrier views. Currently we just
test for the presence of leaky functions in the qual, but it is
possible to do better than that, by further testing if the leaky
function is actually being passed information that we don't want to be
leaked.

Attached is a patch that does that, allowing restriction clause quals
to be pushed down into subquery RTEs if they contain leaky functions,
provided that the arglists of those leaky functions contain no Vars
(such Vars would necessarily refer to output columns of the subquery,
which is the data that must not be leaked).

An example of the sort of query this will help optimise is:

SELECT * FROM table_with_rls WHERE created > now() - '1 hour'::interval;

where currently this qual cannot be pushed down because neither now()
nor timestamptz_mi_interval() are leakproof, but since they are not
being passed any data from the table, they can't actually leak
anything, so the qual can be safely pushed down, allowing indexes to
be used if available.

In fact the majority of builtin functions aren't marked leakproof, and
probably most user functions aren't either, so this could potentially
be useful in a wide range of real-world queries, where it is common to
write quals of the form <column> <operator> <expression>, and the
expression may contain leaky functions.

Regards,
Dean

[1] http://www.postgresql.org/message-id/CAEZATCWKcPfWiLoCnmfMSzKLgoaBz7AXKmGZ-Mk83Gd3JG8u1w@mail.gmail.com

Attachment

Re: Improving RLS qual pushdown

From
Stephen Frost
Date:
Dean,

* Dean Rasheed (dean.a.rasheed@gmail.com) wrote:
> A while ago [1] I proposed an enhancement to the way qual pushdown
> safety is decided in RLS / security barrier views. Currently we just
> test for the presence of leaky functions in the qual, but it is
> possible to do better than that, by further testing if the leaky
> function is actually being passed information that we don't want to be
> leaked.

This certainly sounds reasonable to me.

> In fact the majority of builtin functions aren't marked leakproof, and
> probably most user functions aren't either, so this could potentially
> be useful in a wide range of real-world queries, where it is common to
> write quals of the form <column> <operator> <expression>, and the
> expression may contain leaky functions.

Agreed.

Looks like you've already added it to the next commitfest, which is
great.  I'm definitely interested but probably won't get to it right
away as I have a few other things to address.
Thanks!
    Stephen

Re: Improving RLS qual pushdown

From
Robert Haas
Date:
On Fri, Jan 9, 2015 at 7:54 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> A while ago [1] I proposed an enhancement to the way qual pushdown
> safety is decided in RLS / security barrier views. Currently we just
> test for the presence of leaky functions in the qual, but it is
> possible to do better than that, by further testing if the leaky
> function is actually being passed information that we don't want to be
> leaked.
>
> Attached is a patch that does that, allowing restriction clause quals
> to be pushed down into subquery RTEs if they contain leaky functions,
> provided that the arglists of those leaky functions contain no Vars
> (such Vars would necessarily refer to output columns of the subquery,
> which is the data that must not be leaked).
>
> An example of the sort of query this will help optimise is:
>
> SELECT * FROM table_with_rls WHERE created > now() - '1 hour'::interval;
>
> where currently this qual cannot be pushed down because neither now()
> nor timestamptz_mi_interval() are leakproof, but since they are not
> being passed any data from the table, they can't actually leak
> anything, so the qual can be safely pushed down, allowing indexes to
> be used if available.

One thing they could still leak is the number of times they got
called, and thus possibly the number of unseen rows.  Now if the
expressions get constant-folded away that won't be an issue, but a
clever user can probably avoid that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Improving RLS qual pushdown

From
Dean Rasheed
Date:
On 14 January 2015 at 13:29, Robert Haas <robertmhaas@gmail.com> wrote:
> One thing they could still leak is the number of times they got
> called, and thus possibly the number of unseen rows.  Now if the
> expressions get constant-folded away that won't be an issue, but a
> clever user can probably avoid that.
>

Right now, EXPLAIN ANALYSE can be used to tell you the number of
unseen rows. Is that something that people are concerned about, and
are there any plans to change it?

Regards,
Dean



Re: Improving RLS qual pushdown

From
Robert Haas
Date:
On Wed, Jan 14, 2015 at 9:22 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On 14 January 2015 at 13:29, Robert Haas <robertmhaas@gmail.com> wrote:
>> One thing they could still leak is the number of times they got
>> called, and thus possibly the number of unseen rows.  Now if the
>> expressions get constant-folded away that won't be an issue, but a
>> clever user can probably avoid that.
>
> Right now, EXPLAIN ANALYSE can be used to tell you the number of
> unseen rows. Is that something that people are concerned about, and
> are there any plans to change it?

Interesting question.  I don't know.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Improving RLS qual pushdown

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Wed, Jan 14, 2015 at 9:22 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> > On 14 January 2015 at 13:29, Robert Haas <robertmhaas@gmail.com> wrote:
> >> One thing they could still leak is the number of times they got
> >> called, and thus possibly the number of unseen rows.  Now if the
> >> expressions get constant-folded away that won't be an issue, but a
> >> clever user can probably avoid that.
> >
> > Right now, EXPLAIN ANALYSE can be used to tell you the number of
> > unseen rows. Is that something that people are concerned about, and
> > are there any plans to change it?
> 
> Interesting question.  I don't know.

Wasn't this part of the "covert channel" discussion that took place way
before RLS was committed?  As I recall, it was argued that such covert
channels are acceptable as long as their bandwidth is low.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Improving RLS qual pushdown

From
Stephen Frost
Date:
* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
> Robert Haas wrote:
> > On Wed, Jan 14, 2015 at 9:22 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> > > On 14 January 2015 at 13:29, Robert Haas <robertmhaas@gmail.com> wrote:
> > >> One thing they could still leak is the number of times they got
> > >> called, and thus possibly the number of unseen rows.  Now if the
> > >> expressions get constant-folded away that won't be an issue, but a
> > >> clever user can probably avoid that.
> > >
> > > Right now, EXPLAIN ANALYSE can be used to tell you the number of
> > > unseen rows. Is that something that people are concerned about, and
> > > are there any plans to change it?
> >
> > Interesting question.  I don't know.
>
> Wasn't this part of the "covert channel" discussion that took place way
> before RLS was committed?  As I recall, it was argued that such covert
> channels are acceptable as long as their bandwidth is low.

Yes, it was part of the discussion and no, there's no plans to try and
hide row counts in explain analyze, nor to deal with things like unique
constraint or foreign key reference violations.

There are other areas which need improvement which will help address
covert channel activity (better auditing, better control over what
actions are allowed to diffferent users when it comes to creating
objects, modifying permissions and policies, throttling, etc).
Thanks,
    Stephen

Re: Improving RLS qual pushdown

From
Stephen Frost
Date:
Dean,

* Dean Rasheed (dean.a.rasheed@gmail.com) wrote:
> Attached is a patch that does that, allowing restriction clause quals
> to be pushed down into subquery RTEs if they contain leaky functions,
> provided that the arglists of those leaky functions contain no Vars
> (such Vars would necessarily refer to output columns of the subquery,
> which is the data that must not be leaked).

> --- 1982,1989 ----
>    * 2. If unsafeVolatile is set, the qual must not contain any volatile
>    * functions.
>    *
> !  * 3. If unsafeLeaky is set, the qual must not contain any leaky functions
> !  * that might reveal values from the subquery as side effects.

I'd probably extend this comment to note that the way we figure that out
is by looking for Vars being passed to non-leakproof functions.

> --- 1318,1347 ----
>   }
>
>   /*****************************************************************************
> !  *          Check clauses for non-leakproof functions that might leak Vars
>    *****************************************************************************/

Might leak data in Vars (or somethhing along those lines...)

>   /*
> !  * contain_leaked_vars
> !  *        Recursively scan a clause to discover whether it contains any Var
> !  *        nodes (of the current query level) that are passed as arguments to
> !  *        leaky functions.
>    *
> !  * Returns true if any function call with side effects may be present in the
> !  * clause and that function's arguments contain any Var nodes of the current
> !  * query level.  Qualifiers from outside a security_barrier view that leak
> !  * Var nodes in this way should not be pushed down into the view, lest the
> !  * contents of tuples intended to be filtered out be revealed via the side
> !  * effects.
>    */

We don't actually know anything about the function and if it actually
has any side effects or not, so it might better to avoid discussing
that here.  How about:

Returns true if any non-leakproof function is passed in data through a
Var node as that function might then leak see sensetive information.
Only leakproof functions are allowed to see data prior to the qualifiers
which are defined in the security_barrier view and therefore we can only
push down qualifiers if they are either leakproof or if they aren't
passed any Vars from this query level (and therefore they are not able
to see any of the data in the tuple, even if they are pushed down).

> --- 1408,1428 ----
>                   CoerceViaIO *expr = (CoerceViaIO *) node;
>                   Oid            funcid;
>                   Oid            ioparam;
> +                 bool        leaky;
>                   bool        varlena;
>
>                   getTypeInputInfo(exprType((Node *) expr->arg),
>                                    &funcid, &ioparam);
> !                 leaky = !get_func_leakproof(funcid);
>
> !                 if (!leaky)
> !                 {
> !                     getTypeOutputInfo(expr->resulttype, &funcid, &varlena);
> !                     leaky = !get_func_leakproof(funcid);
> !                 }
> !
> !                 if (leaky &&
> !                     contain_var_clause((Node *) expr->arg))
>                       return true;
>               }
>               break;

That seems slightly convoluted to me.  Why not simply do:

bool in_leakproof, out_leakproof;

getTypeInputInfo(exprType((Node *) expr->arg), &funcid, &ioparam);
in_leakproof = get_func_leakproof(funcid);

getTypeOutputInfo(expr->resulttype, &funcid, &varlena);
out_leakproof = get_func_leakproof(funcid);

if (!(in_leakproof && out_leakproof) && contain_var_clause((Node *)))return true;

...

> --- 1432,1452 ----
>                   ArrayCoerceExpr *expr = (ArrayCoerceExpr *) node;
>                   Oid            funcid;
>                   Oid            ioparam;
> +                 bool        leaky;
>                   bool        varlena;
>
>                   getTypeInputInfo(exprType((Node *) expr->arg),
>                                    &funcid, &ioparam);
> !                 leaky = !get_func_leakproof(funcid);
> !
> !                 if (!leaky)
> !                 {
> !                     getTypeOutputInfo(expr->resulttype, &funcid, &varlena);
> !                     leaky = !get_func_leakproof(funcid);
> !                 }
> !
> !                 if (leaky &&
> !                     contain_var_clause((Node *) expr->arg))
>                       return true;
>               }
>               break;

Ditto here.

> *************** contain_leaky_functions_walker(Node *nod
> *** 1435,1446 ****
>               {
>                   RowCompareExpr *rcexpr = (RowCompareExpr *) node;
>                   ListCell   *opid;
>
> !                 foreach(opid, rcexpr->opnos)
>                   {
>                       Oid            funcid = get_opcode(lfirst_oid(opid));
>
> !                     if (!get_func_leakproof(funcid))
>                           return true;
>                   }
>               }
> --- 1455,1472 ----
>               {
>                   RowCompareExpr *rcexpr = (RowCompareExpr *) node;
>                   ListCell   *opid;
> +                 ListCell   *larg;
> +                 ListCell   *rarg;
>
> !                 forthree(opid, rcexpr->opnos,
> !                          larg, rcexpr->largs,
> !                          rarg, rcexpr->rargs)
>                   {
>                       Oid            funcid = get_opcode(lfirst_oid(opid));
>
> !                     if (!get_func_leakproof(funcid) &&
> !                         (contain_var_clause((Node *) lfirst(larg)) ||
> !                          contain_var_clause((Node *) lfirst(rarg))))
>                           return true;
>                   }
>               }

Might look a bit cleaner as:

/* Leakproof functions are ok */
if (get_func_leakproof(funcid))continue;

/* Not leakproof, check if there are any Vars passed in */
if (contain_var_clause((Node *) lfirst(larg)) ||contain_var_clause((Node *) lfirst(rarg)))return true;

The rest looked good to me.
Thanks!
    Stephen

Re: Improving RLS qual pushdown

From
Dean Rasheed
Date:
Thanks for looking at this.

On 28 February 2015 at 04:25, Stephen Frost <sfrost@snowman.net> wrote:
> Dean,
>
> * Dean Rasheed (dean.a.rasheed@gmail.com) wrote:
>> Attached is a patch that does that, allowing restriction clause quals
>> to be pushed down into subquery RTEs if they contain leaky functions,
>> provided that the arglists of those leaky functions contain no Vars
>> (such Vars would necessarily refer to output columns of the subquery,
>> which is the data that must not be leaked).
>
>> --- 1982,1989 ----
>>    * 2. If unsafeVolatile is set, the qual must not contain any volatile
>>    * functions.
>>    *
>> !  * 3. If unsafeLeaky is set, the qual must not contain any leaky functions
>> !  * that might reveal values from the subquery as side effects.
>
> I'd probably extend this comment to note that the way we figure that out
> is by looking for Vars being passed to non-leakproof functions.
>

OK fair enough.

>> --- 1318,1347 ----
>>   }
>>
>>   /*****************************************************************************
>> !  *            Check clauses for non-leakproof functions that might leak Vars
>>    *****************************************************************************/
>
> Might leak data in Vars (or somethhing along those lines...)
>

Perhaps just "Check clauses for Vars passed to non-leakproof
functions". The more detailed explanation below is probably then
sufficient to cover why that's significant.

>>   /*
>> !  * contain_leaked_vars
>> !  *          Recursively scan a clause to discover whether it contains any Var
>> !  *          nodes (of the current query level) that are passed as arguments to
>> !  *          leaky functions.
>>    *
>> !  * Returns true if any function call with side effects may be present in the
>> !  * clause and that function's arguments contain any Var nodes of the current
>> !  * query level.  Qualifiers from outside a security_barrier view that leak
>> !  * Var nodes in this way should not be pushed down into the view, lest the
>> !  * contents of tuples intended to be filtered out be revealed via the side
>> !  * effects.
>>    */
>
> We don't actually know anything about the function and if it actually
> has any side effects or not, so it might better to avoid discussing
> that here.  How about:
>
> Returns true if any non-leakproof function is passed in data through a
> Var node as that function might then leak see sensetive information.
> Only leakproof functions are allowed to see data prior to the qualifiers
> which are defined in the security_barrier view and therefore we can only
> push down qualifiers if they are either leakproof or if they aren't
> passed any Vars from this query level (and therefore they are not able
> to see any of the data in the tuple, even if they are pushed down).
>

That sounds OK (except s/sensetive/sensitive).

>> --- 1408,1428 ----
>>                               CoerceViaIO *expr = (CoerceViaIO *) node;
>>                               Oid                     funcid;
>>                               Oid                     ioparam;
>> +                             bool            leaky;
>>                               bool            varlena;
>>
>>                               getTypeInputInfo(exprType((Node *) expr->arg),
>>                                                                &funcid, &ioparam);
>> !                             leaky = !get_func_leakproof(funcid);
>>
>> !                             if (!leaky)
>> !                             {
>> !                                     getTypeOutputInfo(expr->resulttype, &funcid, &varlena);
>> !                                     leaky = !get_func_leakproof(funcid);
>> !                             }
>> !
>> !                             if (leaky &&
>> !                                     contain_var_clause((Node *) expr->arg))
>>                                       return true;
>>                       }
>>                       break;
>
> That seems slightly convoluted to me.  Why not simply do:
>
> bool in_leakproof, out_leakproof;
>
> getTypeInputInfo(exprType((Node *) expr->arg), &funcid, &ioparam);
> in_leakproof = get_func_leakproof(funcid);
>
> getTypeOutputInfo(expr->resulttype, &funcid, &varlena);
> out_leakproof = get_func_leakproof(funcid);
>
> if (!(in_leakproof && out_leakproof) && contain_var_clause((Node *)))
>         return true;
>

I'm not sure that's really much clearer, because of the 2
double-negatives in the final if-clause, and it's less efficient in
the case where the input function is leaky, when it's not nessecary to
check the output function (which involves 2 cache lookups).

> ...
>
>> --- 1432,1452 ----
>>                               ArrayCoerceExpr *expr = (ArrayCoerceExpr *) node;
>>                               Oid                     funcid;
>>                               Oid                     ioparam;
>> +                             bool            leaky;
>>                               bool            varlena;
>>
>>                               getTypeInputInfo(exprType((Node *) expr->arg),
>>                                                                &funcid, &ioparam);
>> !                             leaky = !get_func_leakproof(funcid);
>> !
>> !                             if (!leaky)
>> !                             {
>> !                                     getTypeOutputInfo(expr->resulttype, &funcid, &varlena);
>> !                                     leaky = !get_func_leakproof(funcid);
>> !                             }
>> !
>> !                             if (leaky &&
>> !                                     contain_var_clause((Node *) expr->arg))
>>                                       return true;
>>                       }
>>                       break;
>
> Ditto here.
>
>> *************** contain_leaky_functions_walker(Node *nod
>> *** 1435,1446 ****
>>                       {
>>                               RowCompareExpr *rcexpr = (RowCompareExpr *) node;
>>                               ListCell   *opid;
>>
>> !                             foreach(opid, rcexpr->opnos)
>>                               {
>>                                       Oid                     funcid = get_opcode(lfirst_oid(opid));
>>
>> !                                     if (!get_func_leakproof(funcid))
>>                                               return true;
>>                               }
>>                       }
>> --- 1455,1472 ----
>>                       {
>>                               RowCompareExpr *rcexpr = (RowCompareExpr *) node;
>>                               ListCell   *opid;
>> +                             ListCell   *larg;
>> +                             ListCell   *rarg;
>>
>> !                             forthree(opid, rcexpr->opnos,
>> !                                              larg, rcexpr->largs,
>> !                                              rarg, rcexpr->rargs)
>>                               {
>>                                       Oid                     funcid = get_opcode(lfirst_oid(opid));
>>
>> !                                     if (!get_func_leakproof(funcid) &&
>> !                                             (contain_var_clause((Node *) lfirst(larg)) ||
>> !                                              contain_var_clause((Node *) lfirst(rarg))))
>>                                               return true;
>>                               }
>>                       }
>
> Might look a bit cleaner as:
>
> /* Leakproof functions are ok */
> if (get_func_leakproof(funcid))
>         continue;
>
> /* Not leakproof, check if there are any Vars passed in */
> if (contain_var_clause((Node *) lfirst(larg)) ||
>         contain_var_clause((Node *) lfirst(rarg)))
>         return true;
>

Shrug. Not sure it makes much difference either way.

> The rest looked good to me.
>

Thanks. Do you want me to post an update, or are you going to hack on it?

Regards,
Dean



Re: Improving RLS qual pushdown

From
Stephen Frost
Date:
Dean,

* Dean Rasheed (dean.a.rasheed@gmail.com) wrote:
> Thanks for looking at this.

Thanks for working on it. :)

> On 28 February 2015 at 04:25, Stephen Frost <sfrost@snowman.net> wrote:
> > * Dean Rasheed (dean.a.rasheed@gmail.com) wrote:
> >> --- 1318,1347 ----
> >>   }
> >>
> >>   /*****************************************************************************
> >> !  *            Check clauses for non-leakproof functions that might leak Vars
> >>    *****************************************************************************/
> >
> > Might leak data in Vars (or somethhing along those lines...)
> >
>
> Perhaps just "Check clauses for Vars passed to non-leakproof
> functions". The more detailed explanation below is probably then
> sufficient to cover why that's significant.

Agreed, that looks fine to me.

> That sounds OK (except s/sensetive/sensitive).

I'm horrible with spelling. :)  Thanks for catching it.

> I'm not sure that's really much clearer, because of the 2
> double-negatives in the final if-clause, and it's less efficient in
> the case where the input function is leaky, when it's not nessecary to
> check the output function (which involves 2 cache lookups).

Yeah, it is a bit less performant.

> Shrug. Not sure it makes much difference either way.

Alright.

> > The rest looked good to me.
> >
>
> Thanks. Do you want me to post an update, or are you going to hack on it?

Either works for me, though I'm happy to handle the modifications to
this if it means you have time to look at the other patches.. :)
Thanks!
    Stephen

Re: Improving RLS qual pushdown

From
Dean Rasheed
Date:
On 1 March 2015 at 18:23, Stephen Frost <sfrost@snowman.net> wrote:
>> Thanks. Do you want me to post an update, or are you going to hack on it?
>
> Either works for me, though I'm happy to handle the modifications to
> this if it means you have time to look at the other patches.. :)
>

OK, I'll continue testing the other patches. I want to think about RLS
and inheritance a bit more.

Regards,
Dean



Re: Improving RLS qual pushdown

From
Dean Rasheed
Date:
I took another look at this and came up with some alternate comment
rewording. I also added a couple of additional comments that should
hopefully clarify the code a bit.

Regards,
Dean

Attachment

Re: Improving RLS qual pushdown

From
Stephen Frost
Date:
Dean,

* Dean Rasheed (dean.a.rasheed@gmail.com) wrote:
> I took another look at this and came up with some alternate comment
> rewording. I also added a couple of additional comments that should
> hopefully clarify the code a bit.

Finally got back to this.  Made a few additional changes that made
things look clearer, to me at least, and added documentation and
comments.  Also added a bit to the regression tests (in particular, an
explicit check on the RLS side of this, not that it should be different,
but just in case things diverge in the future).  Please review and let
me know if you see any issues.
Thanks!
    Stephen

Re: Improving RLS qual pushdown

From
Dean Rasheed
Date:
On 27 April 2015 at 17:33, Stephen Frost <sfrost@snowman.net> wrote:
> Dean,
>
> * Dean Rasheed (dean.a.rasheed@gmail.com) wrote:
>> I took another look at this and came up with some alternate comment
>> rewording. I also added a couple of additional comments that should
>> hopefully clarify the code a bit.
>
> Finally got back to this.  Made a few additional changes that made
> things look clearer, to me at least, and added documentation and
> comments.  Also added a bit to the regression tests (in particular, an
> explicit check on the RLS side of this, not that it should be different,
> but just in case things diverge in the future).  Please review and let
> me know if you see any issues.
>

Thanks, that all looks very good.

Regards,
Dean