Re: leaky views, yet again - Mailing list pgsql-hackers

From KaiGai Kohei
Subject Re: leaky views, yet again
Date
Msg-id 4C7E10E5.7050004@ak.jp.nec.com
Whole thread Raw
In response to Re: leaky views, yet again  (KaiGai Kohei <kaigai@kaigai.gr.jp>)
Responses Re: leaky views, yet again
List pgsql-hackers
(2010/07/21 19:35), KaiGai Kohei wrote:
> (2010/07/21 19:26), Robert Haas wrote:
>> 2010/7/21 KaiGai Kohei<kaigai@ak.jp.nec.com>:
>>>> On the other hand, if it's enough from a performance
>>>> point of view to review and mark only a few built-in functions like
>>>> index operators, maybe it's ok.
>>>>
>>> I also think it is a worthful idea to try as a proof-of-concept.
>>
>> Yeah. So, should we mark this patch as Returned with Feedback, and
>> you can submit a proof-of-concept patch for the next CF?
>>
> Yes, it's fair enough.
>
The attached patch is a proof-of-concept one according to the suggestion
from Heikki before.

Right now, it stands on a strict assumption that considers operators
implemented with built-in functions are safe; it does not have no
possibility to leak supplied arguments anywhere.

This patch modifies the logic that the planner decides where the
given qualifier clause should be distributed.

If the clause does not contain any "leakable" functions, nothing
were changed. In this case, the clause shall be pushed down into
inside of the join, if it depends on one-side of the join.

Elsewhere, if the clause contains any "leakable" functions, this
patch prevents to push down the clause into join loop, because
the clause need to be evaluated after the condition of join.
If it would not be prevented, the "leakable" function may expose
the contents to be invisible for users; due to the VIEWs for
row-level security purpose.

Example
--------------------------------------------
postgres=# CREATE OR REPLACE FUNCTION f_policy(int)
               RETURNS bool LANGUAGE 'plpgsql'
               AS 'begin return $1 % 2 = 0; end;';
CREATE FUNCTION
postgres=# CREATE OR REPLACE FUNCTION f_malicious(text)
               RETURNS bool LANGUAGE 'plpgsql' COST 0.001
               AS 'begin raise notice ''leak: %'', $1; return true; end';
CREATE FUNCTION
postgres=# CREATE TABLE t1 (a int primary key, b text);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "t1_pkey" for table "t1"
CREATE TABLE
postgres=# CREATE TABLE t2 (x int primary key, y text);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "t2_pkey" for table "t2"
CREATE TABLE
postgres=# INSERT INTO t1 VALUES (1,'aaa'), (2,'bbb'), (3,'ccc');
INSERT 0 3
postgres=# INSERT INTO t2 VALUES (2,'xxx'), (3,'yyy'), (4,'zzz');
INSERT 0 3
postgres=# CREATE VIEW v AS SELECT * FROM t1 JOIN t2 ON f_policy(a+x);
CREATE VIEW

* SELECT * FROM v WHERE f_malicious(b);

[without this patch]
postgres=# EXPLAIN SELECT * FROM v WHERE f_malicious(b);
                            QUERY PLAN
------------------------------------------------------------------
 Nested Loop  (cost=0.00..133685.13 rows=168100 width=72)
   Join Filter: f_policy((t1.a + t2.x))
   ->  Seq Scan on t2  (cost=0.00..22.30 rows=1230 width=36)
   ->  Materialize  (cost=0.00..24.35 rows=410 width=36)
         ->  Seq Scan on t1  (cost=0.00..22.30 rows=410 width=36)
               Filter: f_malicious(b)
(6 rows)
 ==> f_malicious() may be raise a notice about invisible tuples.

[with this patch]
postgres=# EXPLAIN SELECT * FROM v WHERE f_malicious(b);
                            QUERY PLAN
-------------------------------------------------------------------
 Nested Loop  (cost=0.00..400969.96 rows=168100 width=72)
   Join Filter: (f_malicious(t1.b) AND f_policy((t1.a + t2.x)))
   ->  Seq Scan on t1  (cost=0.00..22.30 rows=1230 width=36)
   ->  Materialize  (cost=0.00..28.45 rows=1230 width=36)
         ->  Seq Scan on t2  (cost=0.00..22.30 rows=1230 width=36)
(5 rows)
 ==> f_malicious() is moved to outside of the join.
     It is evaluated earlier than f_policy() in same level due to
     the function cost, but it is another matter.


* SELECT * FROM v WHERE a = 2;
[without this patch]
postgres=# EXPLAIN SELECT * FROM v WHERE a = 2;
                               QUERY PLAN
-------------------------------------------------------------------------
 Nested Loop  (cost=0.00..353.44 rows=410 width=72)
   Join Filter: f_policy((t1.a + t2.x))
   ->  Index Scan using t1_pkey on t1  (cost=0.00..8.27 rows=1 width=36)
         Index Cond: (a = 2)
   ->  Seq Scan on t2  (cost=0.00..22.30 rows=1230 width=36)
(5 rows)

[with this patch]
postgres=# EXPLAIN SELECT * FROM v WHERE a = 2;
                               QUERY PLAN
-------------------------------------------------------------------------
 Nested Loop  (cost=0.00..353.44 rows=410 width=72)
   Join Filter: f_policy((t1.a + t2.x))
   ->  Index Scan using t1_pkey on t1  (cost=0.00..8.27 rows=1 width=36)
         Index Cond: (a = 2)
   ->  Seq Scan on t2  (cost=0.00..22.30 rows=1230 width=36)
(5 rows)

  ==> "a = 2" is a built-in operator, so we assume it is safe.
      This clause was pushed down into the join loop, then utilized to
      index scan.


* SELECT * FROM v WHERE a::text = 'a';

[without this patch]
postgres=# EXPLAIN SELECT * FROM v WHERE a::text = '2';
                           QUERY PLAN
----------------------------------------------------------------
 Nested Loop  (cost=0.00..2009.54 rows=2460 width=72)
   Join Filter: f_policy((t1.a + t2.x))
   ->  Seq Scan on t2  (cost=0.00..22.30 rows=1230 width=36)
   ->  Materialize  (cost=0.00..31.55 rows=6 width=36)
         ->  Seq Scan on t1  (cost=0.00..31.52 rows=6 width=36)
               Filter: ((a)::text = '2'::text)
(6 rows)
  ==> "a::text = 'a'" was pushed down into the join loop.

[with this patch]
postgres=# EXPLAIN SELECT * FROM v WHERE a::text = '2';
                               QUERY PLAN
-------------------------------------------------------------------------
 Nested Loop  (cost=0.00..412312.92 rows=2521 width=72)
   Join Filter: (((t1.a)::text = '2'::text) AND f_policy((t1.a + t2.x)))
   ->  Seq Scan on t1  (cost=0.00..22.30 rows=1230 width=36)
   ->  Materialize  (cost=0.00..28.45 rows=1230 width=36)
         ->  Seq Scan on t2  (cost=0.00..22.30 rows=1230 width=36)
(5 rows)
  ==> The "a::text = '2'" clause contains CoerceViaIO node, so this patch
      assumes it is not safe.


Please note that this patch does not case about a case when
a function inside a view and a function outside a view are
distributed into same level and the later function has lower
cost value.
To fix this problem definitely, we also need to mark nest-level
of the clause being originally supplied, and need to order
the clauses by the nest-level with higher priority than cost.

If we found f_malicious() and f_policy() in same level at the
above example, f_policy() should be executed earlier because
it was originally supplied in the deeper nest level.

Thanks,
--
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

pgsql-hackers by date:

Previous
From: Thom Brown
Date:
Subject: Re: array_agg() NULL Handling
Next
From: Itagaki Takahiro
Date:
Subject: Re: I: About "Our CLUSTER implementation is pessimal" patch