Hi hackers,
Row-level security is an awesome feature. I was originally won over by
the simple mental model of implicitly adding WHERE clauses to all
queries, and that it generally comes for free when they can be
satisfied by Index Conds.
I've recently helped implement row-level security in a fairly big
multi-tenant application with about 100 tables. RLS is being used
"voluntarily" by the application, which switches to a role that is
subject to RLS as the first thing after starting a transaction. I
realize that this doesn't offer bulletproof tenant isolation, but it
still makes a lot of sense as a defense-in-depth measure.
It took some time to work through various performance regressions
caused by selectivity estimates changing due to "WHERE tenantId = ..."
implicitly being added to all relations in all queries. Extended
statistics (dependencies) helped with that, as tenantId is highly
correlated with the other id columns that we were already querying and
joining on.
The roughest edge turned out to be the well-known challenge with the
LEAKPROOFness of built-in functions, previously discussed in earlier
threads [1], in blog posts [2], on Stack Overflow [3] and pointed out
in the docs for Row Security Policies [4].
I'm aware that the usual workaround is to ALTER FUNCTION ... LEAKPROOF
for each of the relevant built-in functions after making an evaluation
of whether that's acceptable for the given use case. We weren't able to
do this because we're using a hosted Postgres solution where it's not
permitted. We've tried to talk them (AWS) into offering a way to do it,
but they haven't budged yet.
As in the earlier thread we were affected by enum_eq not being
leakproof, which caused our indexes involving enum columns to not be
fully utilized while RLS is active (because WHERE clauses on the enum
columns cannot be implemented as Index Conds). We mostly overcame that
by replacing each index with a series of partial ones, each with
WHERE enum_column = ...
We also had some functional indexes that either explicitly used a
non-leakproof built-in function, or implicitly exercised a non-leakproof
conversion function such as textin or int8out. Here we ended up cheating
the system by adding LEAKPROOF wrappers around the built-in functions,
similar to [5].
The most difficult thing to overcome was GIN indexes on bigint[]
columns not being used because the built-in arrayoverlap/arraycontained/
array_eq/... functions aren't LEAKPROOF. We eventually figured out that
we could make our own version of gin_array_ops which used functions
marked as LEAKPROOF, and then recreate all the GIN indices with that.
My hunch is that there's a big class of applications like ours where
enforcing LEAKPROOFness isn't really important. When application code is
interacting with the database via its own role, you're not that
concerned with invisible tuples leaking out via error messages or
elaborate timing attacks. That seems more relevant when your
adversaries have direct access to the database and are able to create
their own functions and execute arbitrary queries. I think it would be
extremely useful to be able to forefeit the LEAKPROOF guarantees when
the database is being accessed by an application in exchange for
avoiding the footguns described above.
I dug into the code and noticed that restrictinfo->leakproof is only
being checked in two places (createplan.c and equivclass.c), so it seems
fairly easy to only selectively enforce it. Then there's the question of
how to configure it. I can think of a few possible ways:
1) Add a BYPASSLEAKPROOF role attribute that can only be granted by a
superuser, similar to the BYPASSRLS flag.
2) Add a session variable, eg. enable_security_leakproof, that can only
be set or granted to another role by a superuser.
3) Make it a property of the individual POLICY that grants access to the
table. This would be a bit more granular than a global switch, but
there'd be some ambiguity when multiple policies are involved.
I took a stab at implementing solution 1) above, mostly to understand if
I had found the right places in the code and how hard it'd be. This is
my first time hacking on Postgres, and I'm not really sure if it's OK to
introduce this extra check in a possibly hot code path. I thought about
instead adding a "does the current user have bypassleakproof" flag in
some struct available to the planner, but this is beyond my current
abilities. Also, it's probably better to get some feedback on the idea
before spending too much time optimizing.
What do you think?
Best regards,
Andreas Lind
[1] https://www.postgresql.org/message-id/2811772.0XtDgEdalL%40peanuts2
[2] https://pganalyze.com/blog/5mins-postgres-row-level-security-bypassrls-security-invoker-views-leakproof-functions
[3]
https://stackoverflow.com/questions/63008838/postgresql-ignores-pg-trgm-gin-indexes-when-row-level-security-is-enabled-and-no
[4] https://www.postgresql.org/docs/current/ddl-rowsecurity.html#DDL-ROWSECURITY
[5] https://news.ycombinator.com/item?id=20066350