Thread: Poor plan choice with partial unique indexes on jsonb column andsimple RLS policy (with test script)

Hi all,

I have recently encountered a strange poor query plan choice after implementing RLS.

My table has a number of partial indexes on a jsonb column and the query went from low number of milliseconds to several seconds as the planner chose a different index.

Simply stated, in the jsonb column case, "using ( (select true) )" instead of "using (true)" produces a bad plan, illustrated below:

postgres=# create policy testing_s ON testing for select to testing_user using (
postgres(#     true
postgres(# );

postgres=# set role testing_user;
SET

postgres=> explain (analyze) select * from testing where data->'value' = to_jsonb(10) and type_id = 10 and latest is true;
                                                  QUERY PLAN                                                  
---------------------------------------------------------------------------------------------------------------
 Index Scan using i_10 on testing  (cost=0.15..8.17 rows=1 width=49) (actual time=0.007..0.008 rows=1 loops=1)
   Index Cond: ((data -> 'value'::text) = to_jsonb(10))
 Planning Time: 0.221 ms
 Execution Time: 0.017 ms
(4 rows)

postgres=# alter policy testing_s ON testing to testing_user using (
postgres(#     (select true)
postgres(# );

postgres=> explain (analyze) select * from testing where data->'value' = to_jsonb(10) and type_id = 10 and latest is true;
                                                    QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on testing  (cost=9.16..17582.89 rows=1 width=49) (actual time=0.088..0.877 rows=1 loops=1)
   Recheck Cond: ((type_id = 10) AND (latest IS TRUE))
   Filter: ($0 AND ((data -> 'value'::text) = to_jsonb(10)))
   Rows Removed by Filter: 199
   Heap Blocks: exact=185
   InitPlan 1 (returns $0)
     ->  Result  (cost=0.00..0.01 rows=1 width=1) (actual time=0.000..0.000 rows=1 loops=1)
   ->  Bitmap Index Scan on i_10  (cost=0.00..9.14 rows=7500 width=0) (actual time=0.012..0.012 rows=200 loops=1)
 Planning Time: 0.306 ms
 Execution Time: 0.894 ms
(10 rows)

Tested on PostgreSQL 11.1 (Debian 11.1-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit

My two full tests cases are linked below, the first works as expected, the second produces a bad plan:


This problem seems to make row level security unusable for me, I am missing something in regards to RLS and indexes on jsonb?

Alastair
Alastair McKinley <a.mckinley@analyticsengines.com> writes:
> I have recently encountered a strange poor query plan choice after implementing RLS.
> My table has a number of partial indexes on a jsonb column and the query went from low number of milliseconds to
severalseconds as the planner chose a different index. 
> Simply stated, in the jsonb column case, "using ( (select true) )" instead of "using (true)" produces a bad plan,
illustratedbelow: 

If the planner isn't sure you have access to all rows in the table,
that disables some of its ability to estimate where-clause selectivity.
In particular it can't run "leaky" where-clauses against all values in
the table's statistics entries to see how many pass, because a nefarious
user could use that to glean info about what's in the table.  Eyeing your
test query, it looks like the issue is that jsonb "->" isn't leakproof,
so that clause falls back to a default selectivity estimate, and you
get a bad plan as a result.

            regards, tom lane



Hi Tom,

Thank you for having a look at this.  In the interim I discovered that I could trigger the issue by creating a security barrier view, whereas a regular view worked fine, so I think that also points to your conclusion about leakyness?

I attempted to workaround the issue with a leakproof function, so far with no success.

I tried a leakproof function as below:

create or replace function jsonb_select(obj jsonb,keys text[]) returns jsonb as
$$
    select jsonb_agg(value order by key) from jsonb_each(obj) where keys @> ARRAY[key];
$$ language sql immutable strict leakproof;

And created the expression indexes:

create unique index i_10 on testing ((jsonb_select(data,'{value}'))) where type_id = 10 and latest is true;

But my query still produces a bad plan (bitmap heap scan) with rls or a security barrier view enabled:

explain (analyze) select * from testing where (jsonb_select(data,'{value}')) = to_jsonb(10) and type_id = 10 and latest is true;

Is this an approach that could fundamentally work?

Best regards,

Alastair


From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: 04 March 2020 00:04
To: Alastair McKinley <a.mckinley@analyticsengines.com>
Cc: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Poor plan choice with partial unique indexes on jsonb column and simple RLS policy (with test script)
 
Alastair McKinley <a.mckinley@analyticsengines.com> writes:
> I have recently encountered a strange poor query plan choice after implementing RLS.
> My table has a number of partial indexes on a jsonb column and the query went from low number of milliseconds to several seconds as the planner chose a different index.
> Simply stated, in the jsonb column case, "using ( (select true) )" instead of "using (true)" produces a bad plan, illustrated below:

If the planner isn't sure you have access to all rows in the table,
that disables some of its ability to estimate where-clause selectivity.
In particular it can't run "leaky" where-clauses against all values in
the table's statistics entries to see how many pass, because a nefarious
user could use that to glean info about what's in the table.  Eyeing your
test query, it looks like the issue is that jsonb "->" isn't leakproof,
so that clause falls back to a default selectivity estimate, and you
get a bad plan as a result.

                        regards, tom lane
Alastair McKinley <a.mckinley@analyticsengines.com> writes:
> Thank you for having a look at this.  In the interim I discovered that I could trigger the issue by creating a
securitybarrier view, whereas a regular view worked fine, so I think that also points to your conclusion about
leakyness?
> I attempted to workaround the issue with a leakproof function, so far with no success.
> ...
> Is this an approach that could fundamentally work?

Forcing the expression to be considered leakproof should work.
I'm not sure that your partial index is OK for the purpose of
collecting stats, though -- does it help if you make a non-partial
index on that function expression?  Otherwise, it's possible that
I guessed wrong about which part of the WHERE clause is problematic.
You could try doing EXPLAINs with different portions of the WHERE
to see how the rowcount estimate changes.

BTW, just marking something "leakproof" when it isn't really so
is possibly a security problem.  You should think twice about
what threat model you're hoping RLS will protect against.

            regards, tom lane



Hi Tom,

Thanks once again for your time looking at this.  I have a resolution but didn't exactly get to the bottom of what was going on.

Forcing the function used in the index to be leakproof did not work.  I guessed, but am not certain, that this is because either to_jsonb() or jsonb_eq operator are not leakproof as well?

During my testing of a solution (which basically was not to use jsonb for this) I saw this message while using RLS in an unrelated query.

DEBUG:  not using statistics because function "enum_eq" is not leak-proof

I did not see a message like this using my jsonb indexes, even though it seems like a related issue. 

Is there another effect potentially going on here or incomplete debugging messages?

Best regards,

Alastair

From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: 04 March 2020 04:22
To: Alastair McKinley <a.mckinley@analyticsengines.com>
Cc: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Poor plan choice with partial unique indexes on jsonb column and simple RLS policy (with test script)
 
Alastair McKinley <a.mckinley@analyticsengines.com> writes:
> Thank you for having a look at this.  In the interim I discovered that I could trigger the issue by creating a security barrier view, whereas a regular view worked fine, so I think that also points to your conclusion about leakyness?
> I attempted to workaround the issue with a leakproof function, so far with no success.
> ...
> Is this an approach that could fundamentally work?

Forcing the expression to be considered leakproof should work.
I'm not sure that your partial index is OK for the purpose of
collecting stats, though -- does it help if you make a non-partial
index on that function expression?  Otherwise, it's possible that
I guessed wrong about which part of the WHERE clause is problematic.
You could try doing EXPLAINs with different portions of the WHERE
to see how the rowcount estimate changes.

BTW, just marking something "leakproof" when it isn't really so
is possibly a security problem.  You should think twice about
what threat model you're hoping RLS will protect against.

                        regards, tom lane