Re: [PATCH] Fix leaky VIEWs for RLS - Mailing list pgsql-hackers

From KaiGai Kohei
Subject Re: [PATCH] Fix leaky VIEWs for RLS
Date
Msg-id 4C084983.6080301@ak.jp.nec.com
Whole thread Raw
In response to Re: [pgsql-hackers] Daily digest v1.10705 (13 messages)  (Marc Munro <marc@bloodnok.com>)
List pgsql-hackers
I fixed up the subject.

(2010/06/04 2:23), Marc Munro wrote:
> On Thu, 2010-06-03 at 05:53 -0300, pgsql-hackers-owner@postgresql.org
> wrote:
>> [ . . . ]
>>
>> In my current idea, when a qual-node that contains FuncExpr tries to
>> reference a part of relations within a security view, its referencing
>> relations will be expanded to whole of the security view at
>> distribute_qual_to_rels().
>> [ . . . ]
> 
> I may be missing something here but this seems a bit too simplistic and,
> I think, fails to deal with an important use case.
> 
> Security views, that do anything useful at all, tend to introduce
> performance issues.  I am concerned that placing a conceptual barrier
> between the secured and unsecured parts of queries is going to
> unnecessarily restrict what the optimiser can do.
> 
> For example consider that we have three secured views, each of the form:
> 
>    create view s_x as select * from x where i_can_see(x.key);
> 
> and consider the query:
> 
>    select stuff from s_x
>      inner join s_y on s_y.key = s_x.key
>      inner join s_z on s_z.key = s_x.key
>    where fn(s_x.a) = 3;
> 
> The optimiser ought to be able to spot the fact that i_can_see() need
> only be called once for each joined result.  By placing a barrier (if I
> understand your proposal correctly) between the outermost joins and the
> inner views, doesn't this optimisation become impossible?
> 
It seems to me incorrect.

If i_can_see() can filter a part of tuples within table: x, the optimizer
prefers to call i_can_see() on scanning each tables rather than result of
join, because it effectively reduce the size of scan.

In fact, the existing optimizer make the following plan:
 postgres=# CREATE FUNCTION i_can_see(int) RETURNS bool IMMUTABLE     AS 'begin return $1 % 1 = 1; end;' LANGUAGE
'plpgsql';CREATE FUNCTION postgres=# CREATE VIEW v1 as select * from t1 where i_can_see(a); CREATE VIEW postgres=#
CREATEVIEW v2 as select * from t2 where i_can_see(x); CREATE VIEW postgres=# CREATE VIEW v3 as select * from t3 where
i_can_see(s);CREATE VIEW
 
 postgres=# EXPLAIN SELECT * FROM (v1 JOIN v2 ON v1.a=v2.x) JOIN v3 on v1.a=v3.s;
QUERYPLAN -------------------------------------------------------------------------------  Nested Loop
(cost=0.00..860.47rows=410 width=108)    ->  Nested Loop  (cost=0.00..595.13 rows=410 width=72)          ->  Seq Scan
ont1  (cost=0.00..329.80 rows=410 width=36)                Filter: i_can_see(a)          ->  Index Scan using t2_pkey
ont2  (cost=0.00..0.63 rows=1 width=36)                Index Cond: (t2.x = t1.a)                Filter: i_can_see(t2.x)
  ->  Index Scan using t3_pkey on t3  (cost=0.00..0.63 rows=1 width=36)          Index Cond: (t3.s = t1.a)
Filter:i_can_see(t3.s) (10 rows)
 

Sorry, I may miss something your point.

> I think a simpler solution may be possible here.  If you can tag the
> function i_can_see() as a security function, at least in the context of
> its use in the security views, and then create the rule that security
> functions are always considered to be lower cost than user-defined
> non-security functions, don't we achieve the result of preventing the
> insecure function from seeing rows that it shouldn't?
> 
Sorry, it seems to me you mixed up different issues.

My patch tries to tackle the problem that optimizer distributes outermost
functions into inside of the view when the function takes arguments only
from a side of join, even if security views.
This issue is independent from cost of functions.

On the other hand, when a scan plan has multiple qualifiers to filter
fetched tuples, we have to pay attention on the order to be called.
If outermost functions are called prior to view's policy functions,
it may cause unexpected leaks.

In my opinion, we should consider the nestlevel of the function where
it is originally put. But it is not concluded yet, and the patch does
not tackle anything about this issue.

> I guess my concern is that a query may be constructed a=out of secured
> and unsecured parts and the optimiser should be free to group all of the
> secured parts together before considering the unsecured parts.
> 
Yes, it is an opinion, but it may cause unnecessary performance regression
when user given condition is obviously harmless.

E.g, If table: t has primary key t.a, and a security view is defined as:
 CREATE VIEW v AS SELECT * FROM t WHERE f_policy(t.b);

When user gives the following query:
 SELECT * FROM v WHERE a = 100;

If we strictly separate secured and unsecure part prior to optimizer,
it will break a chance to scan the table: t with index.

I also think security is not free, so it may take a certain level of
performance degradation. But it should not bring down more than necessity.

Thanks,
-- 
KaiGai Kohei <kaigai@ak.jp.nec.com>


pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: Exposing the Xact commit order to the user
Next
From: Takahiro Itagaki
Date:
Subject: Open item: slave to standby in docs