Re: leaky views, yet again - Mailing list pgsql-hackers
From | KaiGai Kohei |
---|---|
Subject | Re: leaky views, yet again |
Date | |
Msg-id | 4C7E10E5.7050004@ak.jp.nec.com Whole thread Raw |
In response to | Re: leaky views, yet again (KaiGai Kohei <kaigai@kaigai.gr.jp>) |
Responses |
Re: leaky views, yet again
|
List | pgsql-hackers |
(2010/07/21 19:35), KaiGai Kohei wrote: > (2010/07/21 19:26), Robert Haas wrote: >> 2010/7/21 KaiGai Kohei<kaigai@ak.jp.nec.com>: >>>> On the other hand, if it's enough from a performance >>>> point of view to review and mark only a few built-in functions like >>>> index operators, maybe it's ok. >>>> >>> I also think it is a worthful idea to try as a proof-of-concept. >> >> Yeah. So, should we mark this patch as Returned with Feedback, and >> you can submit a proof-of-concept patch for the next CF? >> > Yes, it's fair enough. > The attached patch is a proof-of-concept one according to the suggestion from Heikki before. Right now, it stands on a strict assumption that considers operators implemented with built-in functions are safe; it does not have no possibility to leak supplied arguments anywhere. This patch modifies the logic that the planner decides where the given qualifier clause should be distributed. If the clause does not contain any "leakable" functions, nothing were changed. In this case, the clause shall be pushed down into inside of the join, if it depends on one-side of the join. Elsewhere, if the clause contains any "leakable" functions, this patch prevents to push down the clause into join loop, because the clause need to be evaluated after the condition of join. If it would not be prevented, the "leakable" function may expose the contents to be invisible for users; due to the VIEWs for row-level security purpose. Example -------------------------------------------- postgres=# CREATE OR REPLACE FUNCTION f_policy(int) RETURNS bool LANGUAGE 'plpgsql' AS 'begin return $1 % 2 = 0; end;'; CREATE FUNCTION postgres=# CREATE OR REPLACE FUNCTION f_malicious(text) RETURNS bool LANGUAGE 'plpgsql' COST 0.001 AS 'begin raise notice ''leak: %'', $1; return true; end'; CREATE FUNCTION postgres=# CREATE TABLE t1 (a int primary key, b text); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t1_pkey" for table "t1" CREATE TABLE postgres=# CREATE TABLE t2 (x int primary key, y text); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t2_pkey" for table "t2" CREATE TABLE postgres=# INSERT INTO t1 VALUES (1,'aaa'), (2,'bbb'), (3,'ccc'); INSERT 0 3 postgres=# INSERT INTO t2 VALUES (2,'xxx'), (3,'yyy'), (4,'zzz'); INSERT 0 3 postgres=# CREATE VIEW v AS SELECT * FROM t1 JOIN t2 ON f_policy(a+x); CREATE VIEW * SELECT * FROM v WHERE f_malicious(b); [without this patch] postgres=# EXPLAIN SELECT * FROM v WHERE f_malicious(b); QUERY PLAN ------------------------------------------------------------------ Nested Loop (cost=0.00..133685.13 rows=168100 width=72) Join Filter: f_policy((t1.a + t2.x)) -> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36) -> Materialize (cost=0.00..24.35 rows=410 width=36) -> Seq Scan on t1 (cost=0.00..22.30 rows=410 width=36) Filter: f_malicious(b) (6 rows) ==> f_malicious() may be raise a notice about invisible tuples. [with this patch] postgres=# EXPLAIN SELECT * FROM v WHERE f_malicious(b); QUERY PLAN ------------------------------------------------------------------- Nested Loop (cost=0.00..400969.96 rows=168100 width=72) Join Filter: (f_malicious(t1.b) AND f_policy((t1.a + t2.x))) -> Seq Scan on t1 (cost=0.00..22.30 rows=1230 width=36) -> Materialize (cost=0.00..28.45 rows=1230 width=36) -> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36) (5 rows) ==> f_malicious() is moved to outside of the join. It is evaluated earlier than f_policy() in same level due to the function cost, but it is another matter. * SELECT * FROM v WHERE a = 2; [without this patch] postgres=# EXPLAIN SELECT * FROM v WHERE a = 2; QUERY PLAN ------------------------------------------------------------------------- Nested Loop (cost=0.00..353.44 rows=410 width=72) Join Filter: f_policy((t1.a + t2.x)) -> Index Scan using t1_pkey on t1 (cost=0.00..8.27 rows=1 width=36) Index Cond: (a = 2) -> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36) (5 rows) [with this patch] postgres=# EXPLAIN SELECT * FROM v WHERE a = 2; QUERY PLAN ------------------------------------------------------------------------- Nested Loop (cost=0.00..353.44 rows=410 width=72) Join Filter: f_policy((t1.a + t2.x)) -> Index Scan using t1_pkey on t1 (cost=0.00..8.27 rows=1 width=36) Index Cond: (a = 2) -> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36) (5 rows) ==> "a = 2" is a built-in operator, so we assume it is safe. This clause was pushed down into the join loop, then utilized to index scan. * SELECT * FROM v WHERE a::text = 'a'; [without this patch] postgres=# EXPLAIN SELECT * FROM v WHERE a::text = '2'; QUERY PLAN ---------------------------------------------------------------- Nested Loop (cost=0.00..2009.54 rows=2460 width=72) Join Filter: f_policy((t1.a + t2.x)) -> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36) -> Materialize (cost=0.00..31.55 rows=6 width=36) -> Seq Scan on t1 (cost=0.00..31.52 rows=6 width=36) Filter: ((a)::text = '2'::text) (6 rows) ==> "a::text = 'a'" was pushed down into the join loop. [with this patch] postgres=# EXPLAIN SELECT * FROM v WHERE a::text = '2'; QUERY PLAN ------------------------------------------------------------------------- Nested Loop (cost=0.00..412312.92 rows=2521 width=72) Join Filter: (((t1.a)::text = '2'::text) AND f_policy((t1.a + t2.x))) -> Seq Scan on t1 (cost=0.00..22.30 rows=1230 width=36) -> Materialize (cost=0.00..28.45 rows=1230 width=36) -> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36) (5 rows) ==> The "a::text = '2'" clause contains CoerceViaIO node, so this patch assumes it is not safe. Please note that this patch does not case about a case when a function inside a view and a function outside a view are distributed into same level and the later function has lower cost value. To fix this problem definitely, we also need to mark nest-level of the clause being originally supplied, and need to order the clauses by the nest-level with higher priority than cost. If we found f_malicious() and f_policy() in same level at the above example, f_policy() should be executed earlier because it was originally supplied in the deeper nest level. Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
pgsql-hackers by date: