Re: policies with security definer option for allowing inline optimization - Mailing list pgsql-hackers

From Dan Lynch
Subject Re: policies with security definer option for allowing inline optimization
Date
Msg-id CA+_muLEz6dBjn_Y9FHkovJXHYL8-VZkaGGg4eGqDG4aNu9aziQ@mail.gmail.com
Whole thread Raw
In response to Re: policies with security definer option for allowing inline optimization  (Joe Conway <mail@joeconway.com>)
Responses Re: policies with security definer option for allowing inline optimization  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
My goal is to use RLS for everything, including SELECTs, so it's super important to consider every performance tweak possible. Appreciate any insights or comments. I'm also hoping to document this better for application developers who want to use postgres and RLS.

Does anyone know details of, or where to find more information about the implications of the optimizer on the quals/checks for the policies being functions vs inline? 

It seems that the solution today may be that we have to write functions with security definer. I also saw Joe's linked in share regarding an article using inline functions in the qual/checks to solve a policy size issue, but also wondering the performance implications of inline vs functions:

Imagine you need to do a JOIN to check an owned table against an acl table

(group_id = ANY (ARRAY (
            SELECT
                acl.entity_id
            FROM
                org_memberships_acl acl
                JOIN app_groups obj ON acl.entity_id = obj.owner_id
            WHERE
                acl.actor_id = current_user_id())))


you could wrap that query into a function (so we can apply SECURITY DEFINER to the tables involved to avoid nested RLS lookups)

(group_id = ANY (ARRAY (
        get_group_ids_of_current_user()            
)))

Does anyone here know how the optimizer would handle this? I suppose if the get_group_ids_of_current_user() function is marked as STABLE, would the optimizer cache this value for every row in a SELECT that returned multiple rows? Is it possible that if the function is sql vs plpgsql it makes a difference? 

Am I splitting hairs here, and maybe this is a trivial nuance that shouldn't really matter for performance? If it's true that inline functions would perform bretter, then definitely this thread and potentially feature request seems pretty important. 


Other important RLS Performance Optimizations

I also want to share my research from online so it's documented somewhere. I would love to get more information to formally document these optimizations. Here are the two articles I've found to be useful for how to structure RLS policies performantly:

https://cazzer.medium.com/designing-the-most-performant-row-level-security-strategy-in-postgres-a06084f31945


https://medium.com/@ethanresnick/there-are-a-few-faster-ways-that-i-know-of-to-handle-the-third-case-with-rls-9d22eaa890e5



The 2nd article was particularly useful (which was written in response to this article), highlighting an important detail that should probably be more explicit for folks writing policies, especially for SELECT policies. Essentially it boils down to not passing properties from the rows into the functions used to check security, but instead inverting the logic and instead returning the identifiers as an array and checking if the row's owned key matches one of those identifiers.


For example, 

a GOOD qual/check expr 

owner_id = ANY ( function_that_gets_current_users_organization_ids() )

a BAD qual/check expr 

can_user_access_object(owner_id) 

The main benefit of the first expr is that if function_that_gets_current_users_organization_ids is STABLE, the optimizer can run this once for all rows, and thus for SELECTs should actually run fast. The 2nd expr takes as an argument the column, which would have to run for every single row making SELECTs run very slow depending on the function.

This actually is pretty intuitive once you look at it. Reversing the logic and returning IDs makes sense when you imagine what PG has to do in order to check rows, I suppose there are limitations depending on the cardinality of the IDs returned and postgres's ability to check some_id = ANY (array) for large arrays. 

Dan Lynch
(734) 657-4483


On Fri, Apr 2, 2021 at 7:47 AM Joe Conway <mail@joeconway.com> wrote:
On 4/2/21 10:23 AM, Stephen Frost wrote:
> Greetings,
>
> * Joe Conway (mail@joeconway.com) wrote:
>> On 4/2/21 9:57 AM, Isaac Morland wrote:
>> >Views already run security definer, allowing them to be used for some of
>> >the same information-hiding purposes as RLS. But I just found something
>> >strange: current_user/_role returns the user's role, not the view owner's
>> >role:
>>
>> >postgres=# set role to t1;
>> >SET
>> >postgres=> table tt;
>> >ERROR:  permission denied for table tt
>> >postgres=> table tv;
>> >  ?column? | current_user
>> >----------+--------------
>> >         5 | t1
>> >(1 row)
>> >
>> >postgres=>
>> >
>> >Note that even though current_user is t1 "inside" the view, it is still
>> >able to see the contents of table tt. Shouldn't current_user/_role return
>> >the view owner in this situation? By contrast security definer functions
>> >work properly:
>>
>> That is because while VIEWs are effectively SECURITY DEFINER for table
>> access, functions running as part of the view are still SECURITY INVOKER if
>> they were defined that way. And "current_user" is essentially just a special
>> grammatical interface to a SECURITY INVOKER function:
>
> Right- and what I was really getting at is that it'd sometimes be nice
> to have the view run as 'security invoker' for table access.  In
> general, it seems like it'd be useful to be able to control each piece
> and define if it's to be security invoker or security definer.  We're
> able to do that for functions, but not other parts of the system.

+1

Agreed -- I have opined similarly in the past

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

pgsql-hackers by date:

Previous
From: Marko Tiikkaja
Date:
Subject: Re: [PATCH] Implement motd for PostgreSQL
Next
From: Peter Geoghegan
Date:
Subject: Re: Using COPY FREEZE in pgbench