Thread: policies with security definer option for allowing inline optimization

policies with security definer option for allowing inline optimization

From
Dan Lynch
Date:
RLS policies quals/checks are optimized inline, and so I generally avoid writing a separate procedure so the optimizer can do it's thing.

However, if you need a security definer to avoid recursive RLS if you're doing a more complex query say, on a join table, anyone wish there was a flag on the policy itself to specify that `WITH CHECK` or `USING` expression could be run via security definer?

The main reason for this is to avoid writing a separate security definer function so you can benefit from the optimizer. 

Is this possible? Would this be worth a feature request to postgres core?

Cheers!

Dan

Re: policies with security definer option for allowing inline optimization

From
Isaac Morland
Date:
On Fri, 2 Apr 2021 at 01:44, Dan Lynch <pyramation@gmail.com> wrote:
RLS policies quals/checks are optimized inline, and so I generally avoid writing a separate procedure so the optimizer can do it's thing.

However, if you need a security definer to avoid recursive RLS if you're doing a more complex query say, on a join table, anyone wish there was a flag on the policy itself to specify that `WITH CHECK` or `USING` expression could be run via security definer?

The main reason for this is to avoid writing a separate security definer function so you can benefit from the optimizer. 

Is this possible? Would this be worth a feature request to postgres core?

If we're going to do this we should do the same for triggers as well.

It's easy to imagine a situation in which RLS policies need to refer to information which should not be accessible to the role using the table, and similarly it's easy to imagine a situation in which a trigger needs to write to another table which should not be accessible to the role using the table which has the trigger.

Re: policies with security definer option for allowing inline optimization

From
Stephen Frost
Date:
Greetings,

* Isaac Morland (isaac.morland@gmail.com) wrote:
> On Fri, 2 Apr 2021 at 01:44, Dan Lynch <pyramation@gmail.com> wrote:
> > RLS policies quals/checks are optimized inline, and so I generally avoid
> > writing a separate procedure so the optimizer can do it's thing.
> >
> > However, if you need a security definer to avoid recursive RLS if you're
> > doing a more complex query say, on a join table, anyone wish there was a
> > flag on the policy itself to specify that `WITH CHECK` or `USING`
> > expression could be run via security definer?
> >
> > The main reason for this is to avoid writing a separate security definer
> > function so you can benefit from the optimizer.
> >
> > Is this possible? Would this be worth a feature request to postgres core?
>
> If we're going to do this we should do the same for triggers as well.

... and views.

> It's easy to imagine a situation in which RLS policies need to refer to
> information which should not be accessible to the role using the table, and
> similarly it's easy to imagine a situation in which a trigger needs to
> write to another table which should not be accessible to the role using the
> table which has the trigger.

I'm generally +1 on adding the ability for the DBA to choose which user
various things run as.  There's definitely use-cases for both in my
experience.  Also would be great to add the ability to have policies on
views too which would probably help address some of these cases.

Thanks,

Stephen

Attachment

Re: policies with security definer option for allowing inline optimization

From
Chapman Flack
Date:
On 04/02/21 09:09, Isaac Morland wrote:
> If we're going to do this we should do the same for triggers as well.
> 
> ... it's easy to imagine a situation in which a trigger needs to
> write to another table which should not be accessible to the role using the
> table which has the trigger.

Triggers seem to be an area of long-standing weirdness[1].

Regards,
-Chap


[1]
https://www.postgresql.org/message-id/b1be2d05-b9fd-b9db-ea7f-38253e4e4bab%40anastigmatix.net



Re: policies with security definer option for allowing inline optimization

From
Isaac Morland
Date:
On Fri, 2 Apr 2021 at 09:30, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,

* Isaac Morland (isaac.morland@gmail.com) wrote:
> On Fri, 2 Apr 2021 at 01:44, Dan Lynch <pyramation@gmail.com> wrote:
> > RLS policies quals/checks are optimized inline, and so I generally avoid
> > writing a separate procedure so the optimizer can do it's thing.
> >
> > However, if you need a security definer to avoid recursive RLS if you're
> > doing a more complex query say, on a join table, anyone wish there was a
> > flag on the policy itself to specify that `WITH CHECK` or `USING`
> > expression could be run via security definer?
> >
> > The main reason for this is to avoid writing a separate security definer
> > function so you can benefit from the optimizer.
> >
> > Is this possible? Would this be worth a feature request to postgres core?
>
> If we're going to do this we should do the same for triggers as well.

... and views.

Views already run security definer, allowing them to be used for some of the same information-hiding purposes as RLS. But I just found something strange: current_user/_role returns the user's role, not the view owner's role:

postgres=# create table tt as select 5;
SELECT 1
postgres=# create view tv as select *, current_user from tt;
CREATE VIEW
postgres=# table tt;
 ?column? 
----------
        5
(1 row)

postgres=# table tv;
 ?column? | current_user 
----------+--------------
        5 | postgres
(1 row)

postgres=# set role to t1;
SET
postgres=> table tt;
ERROR:  permission denied for table tt
postgres=> table tv;
ERROR:  permission denied for view tv
postgres=> set role to postgres;
SET
postgres=# grant select on tv to public;
GRANT
postgres=# set role to t1;
SET
postgres=> table tt;
ERROR:  permission denied for table tt
postgres=> table tv;
 ?column? | current_user 
----------+--------------
        5 | t1
(1 row)

postgres=> 

Note that even though current_user is t1 "inside" the view, it is still able to see the contents of table tt. Shouldn't current_user/_role return the view owner in this situation? By contrast security definer functions work properly:

postgres=# create function get_current_user_sd () returns name security definer language sql as $$ select current_user $$;
CREATE FUNCTION
postgres=# select get_current_user_sd ();
 get_current_user_sd 
---------------------
 postgres
(1 row)

postgres=# set role t1;
SET
postgres=> select get_current_user_sd ();
 get_current_user_sd 
---------------------
 postgres
(1 row)

postgres=> 

Re: policies with security definer option for allowing inline optimization

From
Isaac Morland
Date:
On Fri, 2 Apr 2021 at 09:44, Chapman Flack <chap@anastigmatix.net> wrote:
On 04/02/21 09:09, Isaac Morland wrote:
> If we're going to do this we should do the same for triggers as well.
>
> ... it's easy to imagine a situation in which a trigger needs to
> write to another table which should not be accessible to the role using the
> table which has the trigger.

Triggers seem to be an area of long-standing weirdness[1].

Thanks for that reference. That has convinced me that I was wrong in a previous discussion to say that triggers should run as the table owner: instead, they should run as the trigger owner (implying that triggers should have owners). Of course at this point the change could only be made as an option in order to avoid a backward compatibility break.

[1]
https://www.postgresql.org/message-id/b1be2d05-b9fd-b9db-ea7f-38253e4e4bab%40anastigmatix.net
On 4/2/21 9:57 AM, Isaac Morland wrote:
> Views already run security definer, allowing them to be used for some of the 
> same information-hiding purposes as RLS. But I just found something strange: 
> current_user/_role returns the user's role, not the view owner's role:

> postgres=# set role to t1;
> SET
> postgres=> table tt;
> ERROR:  permission denied for table tt
> postgres=> table tv;
>   ?column? | current_user
> ----------+--------------
>          5 | t1
> (1 row)
> 
> postgres=>
> 
> Note that even though current_user is t1 "inside" the view, it is still able to 
> see the contents of table tt. Shouldn't current_user/_role return the view owner 
> in this situation? By contrast security definer functions work properly:

That is because while VIEWs are effectively SECURITY DEFINER for table access, 
functions running as part of the view are still SECURITY INVOKER if they were 
defined that way. And "current_user" is essentially just a special grammatical 
interface to a SECURITY INVOKER function:

postgres=# \df+ current_user
List of functions
-[ RECORD 1 ]-------+------------------
Schema              | pg_catalog
Name                | current_user
Result data type    | name
Argument data types |
Type                | func
Volatility          | stable
Parallel            | safe
Owner               | postgres
Security            | invoker
Access privileges   |
Language            | internal
Source code         | current_user
Description         | current user name

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: policies with security definer option for allowing inline optimization

From
Stephen Frost
Date:
Greetings,

* Joe Conway (mail@joeconway.com) wrote:
> On 4/2/21 9:57 AM, Isaac Morland wrote:
> >Views already run security definer, allowing them to be used for some of
> >the same information-hiding purposes as RLS. But I just found something
> >strange: current_user/_role returns the user's role, not the view owner's
> >role:
>
> >postgres=# set role to t1;
> >SET
> >postgres=> table tt;
> >ERROR:  permission denied for table tt
> >postgres=> table tv;
> >  ?column? | current_user
> >----------+--------------
> >         5 | t1
> >(1 row)
> >
> >postgres=>
> >
> >Note that even though current_user is t1 "inside" the view, it is still
> >able to see the contents of table tt. Shouldn't current_user/_role return
> >the view owner in this situation? By contrast security definer functions
> >work properly:
>
> That is because while VIEWs are effectively SECURITY DEFINER for table
> access, functions running as part of the view are still SECURITY INVOKER if
> they were defined that way. And "current_user" is essentially just a special
> grammatical interface to a SECURITY INVOKER function:

Right- and what I was really getting at is that it'd sometimes be nice
to have the view run as 'security invoker' for table access.  In
general, it seems like it'd be useful to be able to control each piece
and define if it's to be security invoker or security definer.  We're
able to do that for functions, but not other parts of the system.

Thanks,

Stephen

Attachment
On 4/2/21 10:23 AM, Stephen Frost wrote:
> Greetings,
> 
> * Joe Conway (mail@joeconway.com) wrote:
>> On 4/2/21 9:57 AM, Isaac Morland wrote:
>> >Views already run security definer, allowing them to be used for some of
>> >the same information-hiding purposes as RLS. But I just found something
>> >strange: current_user/_role returns the user's role, not the view owner's
>> >role:
>> 
>> >postgres=# set role to t1;
>> >SET
>> >postgres=> table tt;
>> >ERROR:  permission denied for table tt
>> >postgres=> table tv;
>> >  ?column? | current_user
>> >----------+--------------
>> >         5 | t1
>> >(1 row)
>> >
>> >postgres=>
>> >
>> >Note that even though current_user is t1 "inside" the view, it is still
>> >able to see the contents of table tt. Shouldn't current_user/_role return
>> >the view owner in this situation? By contrast security definer functions
>> >work properly:
>> 
>> That is because while VIEWs are effectively SECURITY DEFINER for table
>> access, functions running as part of the view are still SECURITY INVOKER if
>> they were defined that way. And "current_user" is essentially just a special
>> grammatical interface to a SECURITY INVOKER function:
> 
> Right- and what I was really getting at is that it'd sometimes be nice
> to have the view run as 'security invoker' for table access.  In
> general, it seems like it'd be useful to be able to control each piece
> and define if it's to be security invoker or security definer.  We're
> able to do that for functions, but not other parts of the system.

+1

Agreed -- I have opined similarly in the past

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



My goal is to use RLS for everything, including SELECTs, so it's super important to consider every performance tweak possible. Appreciate any insights or comments. I'm also hoping to document this better for application developers who want to use postgres and RLS.

Does anyone know details of, or where to find more information about the implications of the optimizer on the quals/checks for the policies being functions vs inline? 

It seems that the solution today may be that we have to write functions with security definer. I also saw Joe's linked in share regarding an article using inline functions in the qual/checks to solve a policy size issue, but also wondering the performance implications of inline vs functions:

Imagine you need to do a JOIN to check an owned table against an acl table

(group_id = ANY (ARRAY (
            SELECT
                acl.entity_id
            FROM
                org_memberships_acl acl
                JOIN app_groups obj ON acl.entity_id = obj.owner_id
            WHERE
                acl.actor_id = current_user_id())))


you could wrap that query into a function (so we can apply SECURITY DEFINER to the tables involved to avoid nested RLS lookups)

(group_id = ANY (ARRAY (
        get_group_ids_of_current_user()            
)))

Does anyone here know how the optimizer would handle this? I suppose if the get_group_ids_of_current_user() function is marked as STABLE, would the optimizer cache this value for every row in a SELECT that returned multiple rows? Is it possible that if the function is sql vs plpgsql it makes a difference? 

Am I splitting hairs here, and maybe this is a trivial nuance that shouldn't really matter for performance? If it's true that inline functions would perform bretter, then definitely this thread and potentially feature request seems pretty important. 


Other important RLS Performance Optimizations

I also want to share my research from online so it's documented somewhere. I would love to get more information to formally document these optimizations. Here are the two articles I've found to be useful for how to structure RLS policies performantly:

https://cazzer.medium.com/designing-the-most-performant-row-level-security-strategy-in-postgres-a06084f31945


https://medium.com/@ethanresnick/there-are-a-few-faster-ways-that-i-know-of-to-handle-the-third-case-with-rls-9d22eaa890e5



The 2nd article was particularly useful (which was written in response to this article), highlighting an important detail that should probably be more explicit for folks writing policies, especially for SELECT policies. Essentially it boils down to not passing properties from the rows into the functions used to check security, but instead inverting the logic and instead returning the identifiers as an array and checking if the row's owned key matches one of those identifiers.


For example, 

a GOOD qual/check expr 

owner_id = ANY ( function_that_gets_current_users_organization_ids() )

a BAD qual/check expr 

can_user_access_object(owner_id) 

The main benefit of the first expr is that if function_that_gets_current_users_organization_ids is STABLE, the optimizer can run this once for all rows, and thus for SELECTs should actually run fast. The 2nd expr takes as an argument the column, which would have to run for every single row making SELECTs run very slow depending on the function.

This actually is pretty intuitive once you look at it. Reversing the logic and returning IDs makes sense when you imagine what PG has to do in order to check rows, I suppose there are limitations depending on the cardinality of the IDs returned and postgres's ability to check some_id = ANY (array) for large arrays. 

Dan Lynch
(734) 657-4483


On Fri, Apr 2, 2021 at 7:47 AM Joe Conway <mail@joeconway.com> wrote:
On 4/2/21 10:23 AM, Stephen Frost wrote:
> Greetings,
>
> * Joe Conway (mail@joeconway.com) wrote:
>> On 4/2/21 9:57 AM, Isaac Morland wrote:
>> >Views already run security definer, allowing them to be used for some of
>> >the same information-hiding purposes as RLS. But I just found something
>> >strange: current_user/_role returns the user's role, not the view owner's
>> >role:
>>
>> >postgres=# set role to t1;
>> >SET
>> >postgres=> table tt;
>> >ERROR:  permission denied for table tt
>> >postgres=> table tv;
>> >  ?column? | current_user
>> >----------+--------------
>> >         5 | t1
>> >(1 row)
>> >
>> >postgres=>
>> >
>> >Note that even though current_user is t1 "inside" the view, it is still
>> >able to see the contents of table tt. Shouldn't current_user/_role return
>> >the view owner in this situation? By contrast security definer functions
>> >work properly:
>>
>> That is because while VIEWs are effectively SECURITY DEFINER for table
>> access, functions running as part of the view are still SECURITY INVOKER if
>> they were defined that way. And "current_user" is essentially just a special
>> grammatical interface to a SECURITY INVOKER function:
>
> Right- and what I was really getting at is that it'd sometimes be nice
> to have the view run as 'security invoker' for table access.  In
> general, it seems like it'd be useful to be able to control each piece
> and define if it's to be security invoker or security definer.  We're
> able to do that for functions, but not other parts of the system.

+1

Agreed -- I have opined similarly in the past

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
On Fri, Apr 02, 2021 at 02:24:59PM -0700, Dan Lynch wrote:
> Does anyone know details of, or where to find more information about the
> implications of the optimizer on the quals/checks for the policies being
> functions vs inline?

Roughly, the PostgreSQL optimizer treats LANGUAGE SQL functions like a C
compiler treats "extern inline" functions.  Other PostgreSQL functions behave
like C functions in a shared library.  Non-SQL functions can do arbitrary
things, and the optimizer knows only facts like their volatility and the value
given in CREATE FUNCTION ... COST.

> I suppose if the
> get_group_ids_of_current_user() function is marked as STABLE, would the
> optimizer cache this value for every row in a SELECT that returned
> multiple rows?

While there was a patch to implement caching, it never finished.  The
optimizer is allowed to, and sometimes does, choose plan shapes that reduce
the number of function calls.

> Is it possible that if the function is sql vs plpgsql it
> makes a difference?

Yes; see inline_function() in the PostgreSQL source.  The hard part of
$SUBJECT is creating the infrastructure to inline across a SECURITY DEFINER
boundary.  Currently, a single optimizable statement operates under just one
user identity.  Somehow, the optimizer would need to translate the SECURITY
DEFINER call into a list of moments where the executor shall switch user ID,
then maintain that list across further optimization steps.  security_barrier
views are the most-similar thing, but as Joe Conway mentioned, views differ
from SECURITY DEFINER in crucial ways.



This is great, thanks! It's great to have somewhere in the source to read about the optimizer! very cool!
 

> I suppose if the
> get_group_ids_of_current_user() function is marked as STABLE, would the
> optimizer cache this value for every row in a SELECT that returned
> multiple rows?

While there was a patch to implement caching, it never finished.  The
optimizer is allowed to, and sometimes does, choose plan shapes that reduce
the number of function calls.

So for multiple rows, it's possible that the same query could happen for each row? Even if it's clearly stable and only a read operation is happening?

I suppose if the possibility exists that this could happen, perhaps using RLS for selects is not quite "production ready"? Or perhaps if the RLS qual/check is written well-enough, then maybe the performance hit wouldn't be noticed? 
On Mon, Apr 05, 2021 at 07:51:46PM -0700, Dan Lynch wrote:
> > > I suppose if the
> > > get_group_ids_of_current_user() function is marked as STABLE, would the
> > > optimizer cache this value for every row in a SELECT that returned
> > > multiple rows?
> >
> > While there was a patch to implement caching, it never finished.  The
> > optimizer is allowed to, and sometimes does, choose plan shapes that reduce
> > the number of function calls.
> 
> So for multiple rows, it's possible that the same query could happen for
> each row? Even if it's clearly stable and only a read operation is
> happening?

Yes.  The caching patch thread gives some example queries:
https://postgr.es/m/flat/CABRT9RA-RomVS-yzQ2wUtZ%3Dm-eV61LcbrL1P1J3jydPStTfc6Q%40mail.gmail.com

> I suppose if the possibility exists that this could happen, perhaps using
> RLS for selects is not quite "production ready"?

I would not draw that conclusion.

> Or perhaps if the RLS
> qual/check is written well-enough, then maybe the performance hit wouldn't
> be noticed?

Yes.




> I suppose if the possibility exists that this could happen, perhaps using
> RLS for selects is not quite "production ready"?

I would not draw that conclusion.


This is great to hear! I'm betting a lot on RLS and have been investing a lot into it. 
 
> Or perhaps if the RLS
> qual/check is written well-enough, then maybe the performance hit wouldn't
> be noticed?

Yes.

Amazing to hear. Sounds like the path I'm on is good to go and will only improve over time :)  

Final question: do you think using procedures vs writing inline queries for RLS quals/checks has a big difference in performance (assuming functions are sql)?

Appreciate your info here!
On Tue, Apr 06, 2021 at 01:16:16PM -0700, Dan Lynch wrote:
> Final question: do you think using procedures vs writing inline queries for
> RLS quals/checks has a big difference in performance (assuming functions
> are sql)?

If the function meets the criteria for inlining (see inline_function()),
there's negligible performance difference.  Otherwise, the performance
difference may be large.