Thread: leaky views, yet again
I think we pretty much have conceptual agreement on the shape of the solution to this problem: when a view is created with CREATE SECURITY VIEW, restrict functions and operators that might disclose tuple data from being pushed down into view (unless, perhaps, the user invoking the view has sufficient privileges to execute the underlying query anyhow, e.g. superuser or view owner). What we have not resolved is exactly how we're going to decide which functions and operators might do such a dastardly thing. I think the viable options are as follows: 1. Adopt Heikki's proposal of treating indexable operators as non-leaky. http://archives.postgresql.org/pgsql-hackers/2010-06/msg00291.php Or, perhaps, and even more restrictively, treat only hashable/mergeable operators as non-leaky. 2. Add an explicit flag to pg_proc, which can only be set by superusers (and which is cleared if a non-superuser modifies it in any way), allowing a function to be tagged as non-leaky. I believe that it would be reasonable to set this flag for all of our built-in indexable operators (though I'd read the code before doing it), but it would remove the need for the assumption that third-party operators are equally sane. CREATE OR REPLACE FUNCTION doesnt_leak() RETURNS integer AS $$SELECT 42$$ IMMUTABLE SEAWORTHY; -- doesn't leak This problem is not going away, so we should try to decide on something. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
(2010/07/08 22:08), Robert Haas wrote: > I think we pretty much have conceptual agreement on the shape of the > solution to this problem: when a view is created with CREATE SECURITY > VIEW, restrict functions and operators that might disclose tuple data > from being pushed down into view (unless, perhaps, the user invoking > the view has sufficient privileges to execute the underlying query > anyhow, e.g. superuser or view owner). What we have not resolved is > exactly how we're going to decide which functions and operators might > do such a dastardly thing. I think the viable options are as follows: > > 1. Adopt Heikki's proposal of treating indexable operators as non-leaky. > > http://archives.postgresql.org/pgsql-hackers/2010-06/msg00291.php > > Or, perhaps, and even more restrictively, treat only > hashable/mergeable operators as non-leaky. > > 2. Add an explicit flag to pg_proc, which can only be set by > superusers (and which is cleared if a non-superuser modifies it in any > way), allowing a function to be tagged as non-leaky. I believe that > it would be reasonable to set this flag for all of our built-in > indexable operators (though I'd read the code before doing it), but it > would remove the need for the assumption that third-party operators > are equally sane. > > CREATE OR REPLACE FUNCTION doesnt_leak() RETURNS integer AS $$SELECT > 42$$ IMMUTABLE SEAWORTHY; -- doesn't leak > > This problem is not going away, so we should try to decide on something. > I'd like to vote the second option, because this approach will be also applied on another aspect of leaky views. When leaky and non-leaky functions are chained within a WHERE clause, it will be ordered by the cost of functions. So, we have possibility that leaky functions are executed earlier than non-leaky functions. It will not be an easy work to mark built-in functions as either leaky or non-leaky, but it seems to me the most straight forward solution. Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
2010/7/8 KaiGai Kohei <kaigai@ak.jp.nec.com>: > (2010/07/08 22:08), Robert Haas wrote: >> I think we pretty much have conceptual agreement on the shape of the >> solution to this problem: when a view is created with CREATE SECURITY >> VIEW, restrict functions and operators that might disclose tuple data >> from being pushed down into view (unless, perhaps, the user invoking >> the view has sufficient privileges to execute the underlying query >> anyhow, e.g. superuser or view owner). What we have not resolved is >> exactly how we're going to decide which functions and operators might >> do such a dastardly thing. I think the viable options are as follows: >> >> 1. Adopt Heikki's proposal of treating indexable operators as non-leaky. >> >> http://archives.postgresql.org/pgsql-hackers/2010-06/msg00291.php >> >> Or, perhaps, and even more restrictively, treat only >> hashable/mergeable operators as non-leaky. >> >> 2. Add an explicit flag to pg_proc, which can only be set by >> superusers (and which is cleared if a non-superuser modifies it in any >> way), allowing a function to be tagged as non-leaky. I believe that >> it would be reasonable to set this flag for all of our built-in >> indexable operators (though I'd read the code before doing it), but it >> would remove the need for the assumption that third-party operators >> are equally sane. >> >> CREATE OR REPLACE FUNCTION doesnt_leak() RETURNS integer AS $$SELECT >> 42$$ IMMUTABLE SEAWORTHY; -- doesn't leak >> >> This problem is not going away, so we should try to decide on something. >> > I'd like to vote the second option, because this approach will be also > applied on another aspect of leaky views. > > When leaky and non-leaky functions are chained within a WHERE clause, > it will be ordered by the cost of functions. So, we have possibility > that leaky functions are executed earlier than non-leaky functions. > > It will not be an easy work to mark built-in functions as either leaky > or non-leaky, but it seems to me the most straight forward solution. Does anyone else have an opinion on this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On 09/07/10 06:47, KaiGai Kohei wrote: > When leaky and non-leaky functions are chained within a WHERE clause, > it will be ordered by the cost of functions. So, we have possibility > that leaky functions are executed earlier than non-leaky functions. No, that needs to be forbidden as part of the fix. Leaky functions must not be executed before all the quals from the view are evaluated. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 19/07/10 20:08, Robert Haas wrote: > 2010/7/8 KaiGai Kohei<kaigai@ak.jp.nec.com>: >> (2010/07/08 22:08), Robert Haas wrote: >>> I think we pretty much have conceptual agreement on the shape of the >>> solution to this problem: when a view is created with CREATE SECURITY >>> VIEW, restrict functions and operators that might disclose tuple data >>> from being pushed down into view (unless, perhaps, the user invoking >>> the view has sufficient privileges to execute the underlying query >>> anyhow, e.g. superuser or view owner). What we have not resolved is >>> exactly how we're going to decide which functions and operators might >>> do such a dastardly thing. I think the viable options are as follows: >>> >>> 1. Adopt Heikki's proposal of treating indexable operators as non-leaky. >>> >>> http://archives.postgresql.org/pgsql-hackers/2010-06/msg00291.php >>> >>> Or, perhaps, and even more restrictively, treat only >>> hashable/mergeable operators as non-leaky. >>> >>> 2. Add an explicit flag to pg_proc, which can only be set by >>> superusers (and which is cleared if a non-superuser modifies it in any >>> way), allowing a function to be tagged as non-leaky. I believe that >>> it would be reasonable to set this flag for all of our built-in >>> indexable operators (though I'd read the code before doing it), but it >>> would remove the need for the assumption that third-party operators >>> are equally sane. >>> >>> CREATE OR REPLACE FUNCTION doesnt_leak() RETURNS integer AS $$SELECT >>> 42$$ IMMUTABLE SEAWORTHY; -- doesn't leak >>> >>> This problem is not going away, so we should try to decide on something. >>> >> I'd like to vote the second option, because this approach will be also >> applied on another aspect of leaky views. >> >> When leaky and non-leaky functions are chained within a WHERE clause, >> it will be ordered by the cost of functions. So, we have possibility >> that leaky functions are executed earlier than non-leaky functions. >> >> It will not be an easy work to mark built-in functions as either leaky >> or non-leaky, but it seems to me the most straight forward solution. > > Does anyone else have an opinion on this? I have a bad feeling that marking functions explicitly as seaworthy is going to be hard to get right, and every time you get that wrong it's a security issue. On the other hand, if it's enough from a performance point of view to review and mark only a few built-in functions like index operators, maybe it's ok. It would be easier to assess this if we had a patch to play with that contained all the planner changes to keep track of the seaworthiness of functions and apply the quals in right order. You could then try out different scenarios to see what the performance is like. I guess what I'm saying is "write a patch, and I'll shoot it down when you're done" :-/. But hopefully the planner changes don't depend much on how we deduce which quals are leaky. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, Jul 19, 2010 at 1:19 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > I guess what I'm saying is "write a patch, and I'll shoot it down when > you're done" :-/. But hopefully the planner changes don't depend much on how > we deduce which quals are leaky. Oh, great... I love that. /me rolls eyes -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
(2010/07/20 2:13), Heikki Linnakangas wrote: > On 09/07/10 06:47, KaiGai Kohei wrote: >> When leaky and non-leaky functions are chained within a WHERE clause, >> it will be ordered by the cost of functions. So, we have possibility >> that leaky functions are executed earlier than non-leaky functions. > > No, that needs to be forbidden as part of the fix. Leaky functions must > not be executed before all the quals from the view are evaluated. > IIUC, a view is extracted to a subquery in the rewriter phase, then it can be pulled up to join clause at pull_up_subqueries(). In this case, WHERE clause may have the quals come from different origins, isn't it? E.g) SELECT * FROM v1 WHERE f_malicious(v1.a); At the rewriter: -> SELECT v1.* FROM (SELECT * FROM t1 WHERE f_policy(t1.b)) v1 WHERE f_malicious(v1.a); At the pull_up_subqueries() -> SELECT * FROM t1 WHERE f_policy(t1.b) AND f_malicious(t1.a); ^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^ cost = 100 cost = 0.0001 Apart from an idea of secure/leaky function mark, isn't it necessary any mechanism to enforce f_policy() shall be executed earlier than f_malicious()? Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
(2010/07/20 2:19), Heikki Linnakangas wrote: > On 19/07/10 20:08, Robert Haas wrote: >> 2010/7/8 KaiGai Kohei<kaigai@ak.jp.nec.com>: >>> (2010/07/08 22:08), Robert Haas wrote: >>>> I think we pretty much have conceptual agreement on the shape of the >>>> solution to this problem: when a view is created with CREATE SECURITY >>>> VIEW, restrict functions and operators that might disclose tuple data >>>> from being pushed down into view (unless, perhaps, the user invoking >>>> the view has sufficient privileges to execute the underlying query >>>> anyhow, e.g. superuser or view owner). What we have not resolved is >>>> exactly how we're going to decide which functions and operators might >>>> do such a dastardly thing. I think the viable options are as follows: >>>> >>>> 1. Adopt Heikki's proposal of treating indexable operators as >>>> non-leaky. >>>> >>>> http://archives.postgresql.org/pgsql-hackers/2010-06/msg00291.php >>>> >>>> Or, perhaps, and even more restrictively, treat only >>>> hashable/mergeable operators as non-leaky. >>>> >>>> 2. Add an explicit flag to pg_proc, which can only be set by >>>> superusers (and which is cleared if a non-superuser modifies it in any >>>> way), allowing a function to be tagged as non-leaky. I believe that >>>> it would be reasonable to set this flag for all of our built-in >>>> indexable operators (though I'd read the code before doing it), but it >>>> would remove the need for the assumption that third-party operators >>>> are equally sane. >>>> >>>> CREATE OR REPLACE FUNCTION doesnt_leak() RETURNS integer AS $$SELECT >>>> 42$$ IMMUTABLE SEAWORTHY; -- doesn't leak >>>> >>>> This problem is not going away, so we should try to decide on >>>> something. >>>> >>> I'd like to vote the second option, because this approach will be also >>> applied on another aspect of leaky views. >>> >>> When leaky and non-leaky functions are chained within a WHERE clause, >>> it will be ordered by the cost of functions. So, we have possibility >>> that leaky functions are executed earlier than non-leaky functions. >>> >>> It will not be an easy work to mark built-in functions as either leaky >>> or non-leaky, but it seems to me the most straight forward solution. >> >> Does anyone else have an opinion on this? > > I have a bad feeling that marking functions explicitly as seaworthy is > going to be hard to get right, and every time you get that wrong it's a > security issue. > Yes, it is indeed a uphill work to mark either secure or leaky on the functions correctly. :( > On the other hand, if it's enough from a performance > point of view to review and mark only a few built-in functions like > index operators, maybe it's ok. > I also think it is a worthful idea to try as a proof-of-concept. If we only allow to push down indexable operators, do you have any idea to distinguish between a purely indexable operator qual and others? At the distribute_qual_to_rels() stage, how can we find out a qual which is consist of only indexable operators? > It would be easier to assess this if we had a patch to play with that > contained all the planner changes to keep track of the seaworthiness of > functions and apply the quals in right order. You could then try out > different scenarios to see what the performance is like. > > I guess what I'm saying is "write a patch, and I'll shoot it down when > you're done" :-/. But hopefully the planner changes don't depend much on > how we deduce which quals are leaky. > I agree. The decision to mark either secure or leaky on functions should be done independently from the planner code, like contain_volatile_functions(). Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
2010/7/21 KaiGai Kohei <kaigai@ak.jp.nec.com>: > (2010/07/20 2:13), Heikki Linnakangas wrote: >> On 09/07/10 06:47, KaiGai Kohei wrote: >>> When leaky and non-leaky functions are chained within a WHERE clause, >>> it will be ordered by the cost of functions. So, we have possibility >>> that leaky functions are executed earlier than non-leaky functions. >> >> No, that needs to be forbidden as part of the fix. Leaky functions must >> not be executed before all the quals from the view are evaluated. >> > > IIUC, a view is extracted to a subquery in the rewriter phase, then it > can be pulled up to join clause at pull_up_subqueries(). In this case, > WHERE clause may have the quals come from different origins, isn't it? > > E.g) > SELECT * FROM v1 WHERE f_malicious(v1.a); > > At the rewriter: > -> SELECT v1.* FROM (SELECT * FROM t1 WHERE f_policy(t1.b)) v1 WHERE f_malicious(v1.a); > > At the pull_up_subqueries() > -> SELECT * FROM t1 WHERE f_policy(t1.b) AND f_malicious(t1.a); > ^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^ > cost = 100 cost = 0.0001 > > Apart from an idea of secure/leaky function mark, isn't it necessary any > mechanism to enforce f_policy() shall be executed earlier than f_malicious()? I think you guys are in fact agreeing with each other. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
2010/7/21 KaiGai Kohei <kaigai@ak.jp.nec.com>: >> On the other hand, if it's enough from a performance >> point of view to review and mark only a few built-in functions like >> index operators, maybe it's ok. >> > I also think it is a worthful idea to try as a proof-of-concept. Yeah. So, should we mark this patch as Returned with Feedback, and you can submit a proof-of-concept patch for the next CF? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
(2010/07/21 19:26), Robert Haas wrote: > 2010/7/21 KaiGai Kohei<kaigai@ak.jp.nec.com>: >>> On the other hand, if it's enough from a performance >>> point of view to review and mark only a few built-in functions like >>> index operators, maybe it's ok. >>> >> I also think it is a worthful idea to try as a proof-of-concept. > > Yeah. So, should we mark this patch as Returned with Feedback, and > you can submit a proof-of-concept patch for the next CF? > Yes, it's fair enough. -- KaiGai Kohei <kaigai@kaigai.gr.jp>
(2010/07/21 19:35), KaiGai Kohei wrote: > (2010/07/21 19:26), Robert Haas wrote: >> 2010/7/21 KaiGai Kohei<kaigai@ak.jp.nec.com>: >>>> On the other hand, if it's enough from a performance >>>> point of view to review and mark only a few built-in functions like >>>> index operators, maybe it's ok. >>>> >>> I also think it is a worthful idea to try as a proof-of-concept. >> >> Yeah. So, should we mark this patch as Returned with Feedback, and >> you can submit a proof-of-concept patch for the next CF? >> > Yes, it's fair enough. > The attached patch is a proof-of-concept one according to the suggestion from Heikki before. Right now, it stands on a strict assumption that considers operators implemented with built-in functions are safe; it does not have no possibility to leak supplied arguments anywhere. This patch modifies the logic that the planner decides where the given qualifier clause should be distributed. If the clause does not contain any "leakable" functions, nothing were changed. In this case, the clause shall be pushed down into inside of the join, if it depends on one-side of the join. Elsewhere, if the clause contains any "leakable" functions, this patch prevents to push down the clause into join loop, because the clause need to be evaluated after the condition of join. If it would not be prevented, the "leakable" function may expose the contents to be invisible for users; due to the VIEWs for row-level security purpose. Example -------------------------------------------- postgres=# CREATE OR REPLACE FUNCTION f_policy(int) RETURNS bool LANGUAGE 'plpgsql' AS 'begin return $1 % 2 = 0; end;'; CREATE FUNCTION postgres=# CREATE OR REPLACE FUNCTION f_malicious(text) RETURNS bool LANGUAGE 'plpgsql' COST 0.001 AS 'begin raise notice ''leak: %'', $1; return true; end'; CREATE FUNCTION postgres=# CREATE TABLE t1 (a int primary key, b text); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t1_pkey" for table "t1" CREATE TABLE postgres=# CREATE TABLE t2 (x int primary key, y text); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t2_pkey" for table "t2" CREATE TABLE postgres=# INSERT INTO t1 VALUES (1,'aaa'), (2,'bbb'), (3,'ccc'); INSERT 0 3 postgres=# INSERT INTO t2 VALUES (2,'xxx'), (3,'yyy'), (4,'zzz'); INSERT 0 3 postgres=# CREATE VIEW v AS SELECT * FROM t1 JOIN t2 ON f_policy(a+x); CREATE VIEW * SELECT * FROM v WHERE f_malicious(b); [without this patch] postgres=# EXPLAIN SELECT * FROM v WHERE f_malicious(b); QUERY PLAN ------------------------------------------------------------------ Nested Loop (cost=0.00..133685.13 rows=168100 width=72) Join Filter: f_policy((t1.a + t2.x)) -> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36) -> Materialize (cost=0.00..24.35 rows=410 width=36) -> Seq Scan on t1 (cost=0.00..22.30 rows=410 width=36) Filter: f_malicious(b) (6 rows) ==> f_malicious() may be raise a notice about invisible tuples. [with this patch] postgres=# EXPLAIN SELECT * FROM v WHERE f_malicious(b); QUERY PLAN ------------------------------------------------------------------- Nested Loop (cost=0.00..400969.96 rows=168100 width=72) Join Filter: (f_malicious(t1.b) AND f_policy((t1.a + t2.x))) -> Seq Scan on t1 (cost=0.00..22.30 rows=1230 width=36) -> Materialize (cost=0.00..28.45 rows=1230 width=36) -> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36) (5 rows) ==> f_malicious() is moved to outside of the join. It is evaluated earlier than f_policy() in same level due to the function cost, but it is another matter. * SELECT * FROM v WHERE a = 2; [without this patch] postgres=# EXPLAIN SELECT * FROM v WHERE a = 2; QUERY PLAN ------------------------------------------------------------------------- Nested Loop (cost=0.00..353.44 rows=410 width=72) Join Filter: f_policy((t1.a + t2.x)) -> Index Scan using t1_pkey on t1 (cost=0.00..8.27 rows=1 width=36) Index Cond: (a = 2) -> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36) (5 rows) [with this patch] postgres=# EXPLAIN SELECT * FROM v WHERE a = 2; QUERY PLAN ------------------------------------------------------------------------- Nested Loop (cost=0.00..353.44 rows=410 width=72) Join Filter: f_policy((t1.a + t2.x)) -> Index Scan using t1_pkey on t1 (cost=0.00..8.27 rows=1 width=36) Index Cond: (a = 2) -> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36) (5 rows) ==> "a = 2" is a built-in operator, so we assume it is safe. This clause was pushed down into the join loop, then utilized to index scan. * SELECT * FROM v WHERE a::text = 'a'; [without this patch] postgres=# EXPLAIN SELECT * FROM v WHERE a::text = '2'; QUERY PLAN ---------------------------------------------------------------- Nested Loop (cost=0.00..2009.54 rows=2460 width=72) Join Filter: f_policy((t1.a + t2.x)) -> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36) -> Materialize (cost=0.00..31.55 rows=6 width=36) -> Seq Scan on t1 (cost=0.00..31.52 rows=6 width=36) Filter: ((a)::text = '2'::text) (6 rows) ==> "a::text = 'a'" was pushed down into the join loop. [with this patch] postgres=# EXPLAIN SELECT * FROM v WHERE a::text = '2'; QUERY PLAN ------------------------------------------------------------------------- Nested Loop (cost=0.00..412312.92 rows=2521 width=72) Join Filter: (((t1.a)::text = '2'::text) AND f_policy((t1.a + t2.x))) -> Seq Scan on t1 (cost=0.00..22.30 rows=1230 width=36) -> Materialize (cost=0.00..28.45 rows=1230 width=36) -> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36) (5 rows) ==> The "a::text = '2'" clause contains CoerceViaIO node, so this patch assumes it is not safe. Please note that this patch does not case about a case when a function inside a view and a function outside a view are distributed into same level and the later function has lower cost value. To fix this problem definitely, we also need to mark nest-level of the clause being originally supplied, and need to order the clauses by the nest-level with higher priority than cost. If we found f_malicious() and f_policy() in same level at the above example, f_policy() should be executed earlier because it was originally supplied in the deeper nest level. Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
2010/9/1 KaiGai Kohei <kaigai@ak.jp.nec.com>: > Right now, it stands on a strict assumption that considers operators > implemented with built-in functions are safe; it does not have no > possibility to leak supplied arguments anywhere. > > Please note that this patch does not case about a case when > a function inside a view and a function outside a view are > distributed into same level and the later function has lower > cost value. Without making some attempt to address these two points, I don't see the point of this patch. Also, I believe we decided previously do this deoptimization only in case the user requests it with CREATE SECURITY VIEW. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
(2010/09/02 11:57), Robert Haas wrote: > 2010/9/1 KaiGai Kohei<kaigai@ak.jp.nec.com>: >> Right now, it stands on a strict assumption that considers operators >> implemented with built-in functions are safe; it does not have no >> possibility to leak supplied arguments anywhere. >> >> Please note that this patch does not case about a case when >> a function inside a view and a function outside a view are >> distributed into same level and the later function has lower >> cost value. > > Without making some attempt to address these two points, I don't see > the point of this patch. > > Also, I believe we decided previously do this deoptimization only in > case the user requests it with CREATE SECURITY VIEW. > Perhaps, I remember the previous discussion incorrectly. If we have a hint about whether the supplied view is intended to security purpose, or not, it seems to me it is a reliable method to prevent pulling up the subqueries come from security views. Is it too much deoptimization? If we keep security views as subqueries, the later point shall be solved implicitly. Even if we try to optimize security views in a certain level, it is not difficult to solve, as I demonstrated before. Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
2010/9/1 KaiGai Kohei <kaigai@ak.jp.nec.com>: > (2010/09/02 11:57), Robert Haas wrote: >> 2010/9/1 KaiGai Kohei<kaigai@ak.jp.nec.com>: >>> Right now, it stands on a strict assumption that considers operators >>> implemented with built-in functions are safe; it does not have no >>> possibility to leak supplied arguments anywhere. >>> >>> Please note that this patch does not case about a case when >>> a function inside a view and a function outside a view are >>> distributed into same level and the later function has lower >>> cost value. >> >> Without making some attempt to address these two points, I don't see >> the point of this patch. >> >> Also, I believe we decided previously do this deoptimization only in >> case the user requests it with CREATE SECURITY VIEW. >> > Perhaps, I remember the previous discussion incorrectly. > > If we have a hint about whether the supplied view is intended to security > purpose, or not, it seems to me it is a reliable method to prevent pulling > up the subqueries come from security views. > Is it too much deoptimization? Well, that'd prevent something like id = 3 from getting pushed down, which seems a bit harsh. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
(2010/09/02 12:38), Robert Haas wrote: > 2010/9/1 KaiGai Kohei<kaigai@ak.jp.nec.com>: >> (2010/09/02 11:57), Robert Haas wrote: >>> 2010/9/1 KaiGai Kohei<kaigai@ak.jp.nec.com>: >>>> Right now, it stands on a strict assumption that considers operators >>>> implemented with built-in functions are safe; it does not have no >>>> possibility to leak supplied arguments anywhere. >>>> >>>> Please note that this patch does not case about a case when >>>> a function inside a view and a function outside a view are >>>> distributed into same level and the later function has lower >>>> cost value. >>> >>> Without making some attempt to address these two points, I don't see >>> the point of this patch. >>> >>> Also, I believe we decided previously do this deoptimization only in >>> case the user requests it with CREATE SECURITY VIEW. >>> >> Perhaps, I remember the previous discussion incorrectly. >> >> If we have a hint about whether the supplied view is intended to security >> purpose, or not, it seems to me it is a reliable method to prevent pulling >> up the subqueries come from security views. >> Is it too much deoptimization? > > Well, that'd prevent something like id = 3 from getting pushed down, > which seems a bit harsh. > Hmm. If so, we need to remember what FromExpr was come from subqueries of security views, and what were not. Then, we need to prevent leakable clause will be distributed to inside of them. In addition, we also need to care about the order of function calls in same level, because it is not implicitly solved. At first, let's consider top-half of the matter. When views are expanded into subqueries in query-rewriter, Query tree lost an information OID of the view, because RangeTblEntry does not have OID of the relation when it is RTE_SUBQUERY. So, we need to patch here to mark a flag whether the supplied view is security focused, or not. Then, pull_up_simple_subquery() pulls up a supplied subquery into normal join, if possible. In this case, FromExpr is chained into the upper level. Of course, FromExpr does not have a flag to show its origin, so we also need to copy the new flag in RangeTblEntry to FromExpr. Then, when distribute_qual_to_rels() is called, the caller also provides a Bitmapset of relation-Ids which are contained under the FromExpr with the flag saying it came from the security views. Even if the supplied clause references a part of the Bitmapset, we need to prevent the clause being pushed down into the relations came from security views, except for ones we can make sure these are safe. Perhaps, it is not impossible.... Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
(2010/09/02 13:30), KaiGai Kohei wrote: > (2010/09/02 12:38), Robert Haas wrote: >> 2010/9/1 KaiGai Kohei<kaigai@ak.jp.nec.com>: >>> (2010/09/02 11:57), Robert Haas wrote: >>>> 2010/9/1 KaiGai Kohei<kaigai@ak.jp.nec.com>: >>>>> Right now, it stands on a strict assumption that considers operators >>>>> implemented with built-in functions are safe; it does not have no >>>>> possibility to leak supplied arguments anywhere. >>>>> >>>>> Please note that this patch does not case about a case when >>>>> a function inside a view and a function outside a view are >>>>> distributed into same level and the later function has lower >>>>> cost value. >>>> >>>> Without making some attempt to address these two points, I don't see >>>> the point of this patch. >>>> >>>> Also, I believe we decided previously do this deoptimization only in >>>> case the user requests it with CREATE SECURITY VIEW. >>>> >>> Perhaps, I remember the previous discussion incorrectly. >>> >>> If we have a hint about whether the supplied view is intended to security >>> purpose, or not, it seems to me it is a reliable method to prevent pulling >>> up the subqueries come from security views. >>> Is it too much deoptimization? >> >> Well, that'd prevent something like id = 3 from getting pushed down, >> which seems a bit harsh. >> I've tried to implement a proof of the concept patch according to the following logic. (For the quick hack, it does not include statement support. It just considers view with name begun from "s" as security views.) > Hmm. If so, we need to remember what FromExpr was come from subqueries > of security views, and what were not. Then, we need to prevent leakable > clause will be distributed to inside of them. > In addition, we also need to care about the order of function calls in > same level, because it is not implicitly solved. > > At first, let's consider top-half of the matter. > > When views are expanded into subqueries in query-rewriter, Query tree > lost an information OID of the view, because RangeTblEntry does not have > OID of the relation when it is RTE_SUBQUERY. So, we need to patch here > to mark a flag whether the supplied view is security focused, or not. > This patch added 'security_view' flag into RangeTblEntry. It shall be set when the query rewriter expands security views. > Then, pull_up_simple_subquery() pulls up a supplied subquery into normal > join, if possible. In this case, FromExpr is chained into the upper level. > Of course, FromExpr does not have a flag to show its origin, so we also > need to copy the new flag in RangeTblEntry to FromExpr. > This patch also added 'security_view' flag into FromExpr. It shall be set when the pull_up_simple_subquery() pulled up a RangeTblEntry with security_view = true. > Then, when distribute_qual_to_rels() is called, the caller also provides > a Bitmapset of relation-Ids which are contained under the FromExpr with > the flag saying it came from the security views. > Even if the supplied clause references a part of the Bitmapset, we need > to prevent the clause being pushed down into the relations came from > security views, except for ones we can make sure these are safe. > Just before distribute_qual_to_rels(), deconstruct_recurse() is invoked. It walks on the supplied join-tree to collect what relations are appeared under the current FromExpr/JoinExpr. The deconstruct_recurse() was modified to take two new arguments of 'bool below_sec_barriers' and 'Relids *sec_barriers'. The first one means the current recursion is under the FromExpr with security_view being true. At that time, it set appeared relations on the sec_barriers, then returns. In the result, the 'sec_barriers' shall become a bitmapset of relations being under the FromExpr which is originated by security views. Then, 'sec_barriers' shall be delivered to distribute_qual_to_rels(). If the supplied qualifier references a part of 'sec_barriers' and contains possibly leakable functions, it appends whole of the sec_barriers to the bitmapset of relations on which the clause is depending. In the result, it shall not be pushed down into the security view. Example) testdb=# CREATE VIEW n_view AS SELECT * FROM t1 JOIN t2 ON t1.a = t2.x; CREATE VIEW testdb=# CREATE VIEW s_view AS SELECT * FROM t1 JOIN t2 ON t1.a = t2.x; CREATE VIEW testdb=# EXPLAIN SELECT * FROM n_view WHERE f_malicious(y); QUERY PLAN ------------------------------------------------------------------- Hash Join (cost=334.93..365.94 rows=410 width=72) Hash Cond: (t1.a = t2.x) -> Seq Scan on t1 (cost=0.00..22.30 rows=1230 width=36) -> Hash (cost=329.80..329.80 rows=410 width=36) -> Seq Scan on t2 (cost=0.00..329.80 rows=410 width=36) Filter: f_malicious(y) (6 rows) testdb=# EXPLAIN SELECT * FROM s_view WHERE f_malicious(y); QUERY PLAN ------------------------------------------------------------------- Hash Join (cost=37.68..384.39 rows=410 width=72) Hash Cond: (t1.a = t2.x) Join Filter: f_malicious(t2.y) -> Seq Scan on t1 (cost=0.00..22.30 rows=1230 width=36) -> Hash (cost=22.30..22.30 rows=1230 width=36) -> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36) (6 rows) ==> f_malicious() was moved to outside of the join loop testdb=# EXPLAIN SELECT * FROM n_view WHERE f_malicious(y) AND a = 2; QUERY PLAN ------------------------------------------------------------------------- Nested Loop (cost=0.00..16.80 rows=1 width=72) -> Index Scan using t1_pkey on t1 (cost=0.00..8.27 rows=1 width=36) Index Cond: (a = 2) -> Index Scan using t2_pkey on t2 (cost=0.00..8.52 rows=1 width=36) Index Cond: (x = 2) Filter: f_malicious(y) (6 rows) testdb=# EXPLAIN SELECT * FROM s_view WHERE f_malicious(y) AND a = 2; QUERY PLAN ------------------------------------------------------------------------- Nested Loop (cost=0.00..16.80 rows=1 width=72) Join Filter: f_malicious(t2.y) -> Index Scan using t1_pkey on t1 (cost=0.00..8.27 rows=1 width=36) Index Cond: (a = 2) -> Index Scan using t2_pkey on t2 (cost=0.00..8.27 rows=1 width=36) Index Cond: (x = 2) (6 rows) ==> Because 'a = 2' is not leakable, this clause was pushed down into the join loop. But f_malicious() is not in 's_view' example. testdb=# CREATE VIEW n_view2 AS SELECT * FROM t1 WHERE f_policy(a); CREATE VIEW testdb=# CREATE VIEW s_view2 AS SELECT * FROM t1 WHERE f_policy(a); CREATE VIEW testdb=# EXPLAIN SELECT * FROM n_view2 WHERE f_malicious(b); QUERY PLAN ------------------------------------------------------- Seq Scan on t1 (cost=0.00..329.80 rows=137 width=36) Filter: (f_malicious(b) AND f_policy(a)) (2 rows) testdb=# EXPLAIN SELECT * FROM s_view2 WHERE f_malicious(b); QUERY PLAN ------------------------------------------------------- Seq Scan on t1 (cost=0.00..329.80 rows=137 width=36) Filter: (f_malicious(b) AND f_policy(a)) (2 rows) ==> But it does not affect anything on the case when a leakable function from outside of the view is chained to same level with security policy function of the view. In this case, we need to mark functions the original nest level, then sort by the nest level rather than cost on the order_qual_clauses(). Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
(2010/09/02 11:57), Robert Haas wrote: > 2010/9/1 KaiGai Kohei<kaigai@ak.jp.nec.com>: >> Right now, it stands on a strict assumption that considers operators >> implemented with built-in functions are safe; it does not have no >> possibility to leak supplied arguments anywhere. >> >> Please note that this patch does not case about a case when >> a function inside a view and a function outside a view are >> distributed into same level and the later function has lower >> cost value. > > Without making some attempt to address these two points, I don't see > the point of this patch. > > Also, I believe we decided previously do this deoptimization only in > case the user requests it with CREATE SECURITY VIEW. > The attached patch tackles both of the above two points, and allows to control this deoptimization when CREATE SECURITY VIEW is used. Example --------- CREATE FUNCTION f_policy(int) RETURNS bool LANGUAGE 'plpgsql' AS 'begin return $1 % 2 = 0; end'; CREATE FUNCTION f_malicious(text) RETURNS bool COST 0.0001 LANGUAGE 'plpgsql' AS 'begin raise notice ''leak: %'', $1; return true; end'; CREATE OR REPLACE VIEW v1 AS SELECT * FROM t1 JOIN t2 ON t1.a = t2.x WHERE f_policy(x); CREATE OR REPLACE VIEW v2 AS SELECT * FROM t1 WHERE f_policy(a); In this case, we don't want to invoke f_malicious() with argument come from the tuples which should be filtered inside of the views, even if people provides it on the WHERE clause. Without this patch ==================== postgres=# EXPLAIN select * from v1 where f_malicious(b); QUERY PLAN ------------------------------------------------------------------------- Nested Loop (cost=0.00..287.63 rows=410 width=72) -> Seq Scan on t1 (cost=0.00..22.30 rows=410 width=36) Filter: f_malicious(b) -> Index Scan using t2_pkey on t2 (cost=0.00..0.63 rows=1 width=36) Index Cond: (x = t1.a) Filter: f_policy(x) (6 rows) ==> f_policy(x) was chained inside of the join loop, so it enables to something to be filtered out. postgres=# select * from v1 where f_malicious(b); NOTICE: leak: aaa NOTICE: leak: bbb NOTICE: leak: ccc NOTICE: leak: ddd a | b | x | y ---+-----+---+----- 2 | bbb | 2 | xxx 4 | ddd | 4 | zzz (2 rows) postgres=# EXPLAIN select * from v2 where f_malicious(b); QUERY PLAN ------------------------------------------------------- Seq Scan on t1 (cost=0.00..329.80 rows=137 width=36) Filter: (f_malicious(b) AND f_policy(a)) (2 rows) ==> The view of v2 is pulled up to the upper level, because this subquery is enough simple. In this case, f_malicious() and f_policy() are handled in a same level, and ordered by its cost to execution. Since f_malicious() has lower cost (0.001) the f_policy(), so it shall be executed earlier, although f_policy() is used in more deep depth. With this patch ================= postgres=# EXPLAIN select * from v1 where f_malicious(b); QUERY PLAN ------------------------------------------------------------------------- Nested Loop (cost=0.00..287.63 rows=410 width=72) -> Seq Scan on t1 (cost=0.00..22.30 rows=410 width=36) Filter: f_malicious(b) -> Index Scan using t2_pkey on t2 (cost=0.00..0.63 rows=1 width=36) Index Cond: (x = t1.a) Filter: f_policy(x) (6 rows) ==> It has same result of the case when without this patch, because this view is declared without "SECURITY" postgres=# CREATE OR REPLACE SECURITY VIEW v1 AS SELECT * FROM t1 JOIN t2 ON t1.a = t2.x WHERE f_policy(x); CREATE VIEW postgres=# EXPLAIN select * from v1 where f_malicious(b); QUERY PLAN ------------------------------------------------------------------- Hash Join (cost=37.68..384.39 rows=137 width=72) Hash Cond: (t1.a = t2.x) Join Filter: (f_policy(t2.x) AND f_malicious(t1.b)) -> Seq Scan on t1 (cost=0.00..22.30 rows=1230 width=36) -> Hash (cost=22.30..22.30 rows=1230 width=36) -> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36) (6 rows) ==> After we replaced this view with SECURITY option, it prevents f_malicious() being come from outside of the view into inside of the join loop. postgres=# EXPLAIN select * from v1 where f_malicious(b) and a = 2; QUERY PLAN ------------------------------------------------------------------------- Nested Loop (cost=0.00..16.80 rows=1 width=72) Join Filter: (f_policy(t2.x) AND f_malicious(t1.b)) -> Index Scan using t1_pkey on t1 (cost=0.00..8.27 rows=1 width=36) Index Cond: (a = 2) -> Index Scan using t2_pkey on t2 (cost=0.00..8.27 rows=1 width=36) Index Cond: (x = 2) (6 rows) ==> We assume operators implemented with built-in functions are safe. So, we don't prevent this pushing-down, if the clause does not contains something leakable. postgres=# EXPLAIN select * from v2 where f_malicious(b); QUERY PLAN ------------------------------------------------------- Seq Scan on t1 (cost=0.00..329.80 rows=137 width=36) Filter: (f_policy(a) AND f_malicious(b)) (2 rows) ==> In addition, when functions come from different nest levels are distributed a certain scan filter, these are ordered by its original nest level, not only function cost. Introduction of the patch =========================== This patch tries to tackle the two matters. The first one is pushing down leakable functions into inside of join-loops being originated from security view. The distribute_qual_to_rels() distributes qualifiers of WHERE/JOIN ... ON clauses to appropriate set of relations. Even if a qualifier is depending on a part of join-loop inside of security view, we need to prevent it to be pushed down. This patch informs distribute_qual_to_rels() what relations being come from security views using Bitmapset of 'sec_barriers'. If a qualifier overlaps with any members of security_barriers, its dependency shall be expanded to the union set of 'sec_barriers'. In this case, this qualifier depends on whole of the relations inside of the security view, we cannot push it down anymore. Views are expanded to sub-queries at query rewriter. This patch add a flag to FromExpr to distinguish whether this sub-query was expanded from security view, or not. When deconstruct_recurse() walks on the supplied join-tree, it construct a bitmapset of the relations under the FromExpr with security_view = true, then, it shall be delivered to the distribute_qual_to_rels(). Instead of modifying the structure of pg_class, this patch stores a flag of security view on the reloption. So, it needed to modify routines about reloptions because it is the first case to store reloptions of views. The other matter is priority to order qualifiers of a certain scanning filter. As I mentioned above, the planner pull up simple subqueries to join, so a function originated from inside of view and outside of view are distributed to same scan plan. Then, these are ordered by cost value. It means functions come from outside of views (maybe leakable) are invoked with arguments come from tuples to be filtered out with functions come from inside of the views. So, I added 'depth' field to FuncExpr and so on. It shall be set just after query rewriter, then referenced in order_qual_clauses(). If the supplied plan multiple qualifiers, these are ordered by the depth of qualifier first, then ordered by the cost when here are multiple qualifiers with same depth. It makes sure qualifiers being originated inside of views are executed earlier than others. Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
2010/9/6 KaiGai Kohei <kaigai@ak.jp.nec.com>: > The attached patch tackles both of the above two points, and allows to > control this deoptimization when CREATE SECURITY VIEW is used. I'll send a review for the patch. Contents & Purpose ================== The patch adds a new SECURITY option for VIEWs. Views defined with CREATE SECURITY VIEW command are not merged with external WHERE-clauses including security-leakable functions in queries. However, since internal functions and operators are not considered as leakable functions, the planner still works well for security views unless we use user-defined functions in the WHERE-clause. Initial Run =========== * Because of the delayed review, the patch has one hunk: 1 out of 5 hunks FAILED -- saving rejects to file src/backend/commands/tablecmds.c.rej but it is not serious at all, and can be easily fixed. * It can be compiled, but has two warnings: rewriteHandler.c: In function 'MarkFuncExprDepthWalker': rewriteHandler.c:1155: warning: cast from pointer to integer of different size rewriteHandler.c:1227: warning: cast to pointer from integer of different size They should be harmless, but casting intptr_t is often used to avoid warnings: - L1155: (int) (intptr_t) context; - L1227:(void *) (intptr_t) (depth + 1) * It passes all regression tests, but doesn't have own new tests. Remaining works =============== The patch might include only core code, but I'll let you know additional sub-works are still required to complete the feature. Also, I've not measured the performance yet, though performance might not be so critical for the patch. * Regression tests and documentation for the feature are required. * pg_dump must support to dump SECURITY VIEWs. They are dumped as normal views for now. * pg_views can have "security" column. * psql's \dv can show security attributes of views. Questions ========= > postgres=# EXPLAIN select * from v1 where f_malicious(b) and a = 2; > ==> We assume operators implemented with built-in functions are safe. > So, we don't prevent this pushing-down, if the clause does not > contains something leakable. The term "built-in functions" means functions written in INTERNAL language here. But we also have SQL functions by default. Some of them are just a wrapper to internal functions. I'm not sure the checking of INTERNAL language is the best way for the purpose. Did you compare it with other methods? For example, "oid < FirstNormalObjectId" looks workable for me. > Instead of modifying the structure of pg_class, this patch stores a flag of > security view on the reloption. So, it needed to modify routines about > reloptions because it is the first case to store reloptions of views. Why did you need to extend StdRdOptions strucuture? Since other fields in the structure are not used by views at all, I think adding a new structure struct ViewOptions { vl_len, security_view } would be better than extending StdRdOptions. -- Itagaki Takahiro
(2010/10/05 18:01), Itagaki Takahiro wrote: > 2010/9/6 KaiGai Kohei<kaigai@ak.jp.nec.com>: >> The attached patch tackles both of the above two points, and allows to >> control this deoptimization when CREATE SECURITY VIEW is used. > > I'll send a review for the patch. > Thanks for your reviewing. > Contents& Purpose > ================== > The patch adds a new SECURITY option for VIEWs. Views defined with > CREATE SECURITY > VIEW command are not merged with external WHERE-clauses including > security-leakable > functions in queries. However, since internal functions and operators are not > considered as leakable functions, the planner still works well for > security views > unless we use user-defined functions in the WHERE-clause. > > Initial Run > =========== > * Because of the delayed review, the patch has one hunk: > 1 out of 5 hunks FAILED -- saving rejects to file > src/backend/commands/tablecmds.c.rej > but it is not serious at all, and can be easily fixed. > The patch was submitted a month ago. I'll fix it. > * It can be compiled, but has two warnings: > rewriteHandler.c: In function 'MarkFuncExprDepthWalker': > rewriteHandler.c:1155: warning: cast from pointer to integer of different size > rewriteHandler.c:1227: warning: cast to pointer from integer of different size > > They should be harmless, but casting intptr_t is often used to avoid warnings: > - L1155: (int) (intptr_t) context; > - L1227: (void *) (intptr_t) (depth + 1) > Thanks for the good idea. > * It passes all regression tests, but doesn't have own new tests. > OK, I'll add it. Perhaps, EXPLAIN enables to show us differences between security views and others. > Remaining works > =============== > The patch might include only core code, but I'll let you know additional > sub-works are still required to complete the feature. Also, I've not measured > the performance yet, though performance might not be so critical for the patch. > > * Regression tests and documentation for the feature are required. > * pg_dump must support to dump SECURITY VIEWs. > They are dumped as normal views for now. > * pg_views can have "security" column. > * psql's \dv can show security attributes of views. > All are fair enough for me. > Questions > ========= >> postgres=# EXPLAIN select * from v1 where f_malicious(b) and a = 2; >> ==> We assume operators implemented with built-in functions are safe. >> So, we don't prevent this pushing-down, if the clause does not >> contains something leakable. > > The term "built-in functions" means functions written in INTERNAL language > here. But we also have SQL functions by default. Some of them are just a > wrapper to internal functions. I'm not sure the checking of INTERNAL language > is the best way for the purpose. Did you compare it with other methods? > For example, "oid< FirstNormalObjectId" looks workable for me. > Hmm. I'm not sure they can be used for index-scans. If these operators are not corresponding to index-scans, I want to keep the logic to check INTERNAL language, because these have obviously no side effects (= not leakable anything). kaigai=# select oprname, oprleft::regtype, oprright::regtype, prosrc from pg_operator o join pg_proc p on o.oprcode= p.oid where prolang <> 12;oprname | oprleft | oprright | prosrc ---------+------------------------+-----------------------------+------------------------------------@> | path | point | select pg_catalog.on_ppath($2, $1)+ | time without time zone | date | select ($2 + $1)+ | time with time zone | date | select ($2 + $1)+ | bigint | inet | select $2 + $1+ | interval | timewithout time zone | select $2 + $1+ | interval | date | select $2 + $1+ | interval | time with time zone | select $2 + $1+ | interval | timestampwithout time zone | select $2 + $1+ | interval | timestamp with time zone | select $2 + $1+ | integer | date | select $2 + $1|| | text | anynonarray | select $1 || $2::pg_catalog.text|| | anynonarray | text | select $1::pg_catalog.text || $2~ | path | point | select pg_catalog.on_ppath($2,$1) (13 rows) >> Instead of modifying the structure of pg_class, this patch stores a flag of >> security view on the reloption. So, it needed to modify routines about >> reloptions because it is the first case to store reloptions of views. > > Why did you need to extend StdRdOptions strucuture? Since other fields in > the structure are not used by views at all, I think adding a new structure > struct ViewOptions { vl_len, security_view } > would be better than extending StdRdOptions. > Hmm. It might be better than ad-hoc enhancement of StdRdOptions. BTW, which is more preference to store the flag of security view: reloption of the view or a new bool variable in the pg_class? I tried to store this flag within reloptions to minimize the patch size, but it seems to me reloption support for views makes the patch size larger in the result. Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
2010/10/5 KaiGai Kohei <kaigai@ak.jp.nec.com>: >> The term "built-in functions" means functions written in INTERNAL language >> here. But we also have SQL functions by default. Some of them are just a >> wrapper to internal functions. I'm not sure the checking of INTERNAL language >> is the best way for the purpose. Did you compare it with other methods? >> For example, "oid< FirstNormalObjectId" looks workable for me. >> > Hmm. I'm not sure they can be used for index-scans. If these operators are not > corresponding to index-scans, I want to keep the logic to check INTERNAL language, > because these have obviously no side effects (= not leakable anything). I think the idea that all internal operators are safe has been thoroughly discredited already. > Hmm. It might be better than ad-hoc enhancement of StdRdOptions. > BTW, which is more preference to store the flag of security view: > reloption of the view or a new bool variable in the pg_class? > > I tried to store this flag within reloptions to minimize the patch > size, but it seems to me reloption support for views makes the patch > size larger in the result. I think a boolean in pg_class is the way to go. Is there a padding byte we can steal, to avoid making the fixed-size portion larger? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
(2010/10/05 23:16), Robert Haas wrote: > 2010/10/5 KaiGai Kohei<kaigai@ak.jp.nec.com>: >>> The term "built-in functions" means functions written in INTERNAL language >>> here. But we also have SQL functions by default. Some of them are just a >>> wrapper to internal functions. I'm not sure the checking of INTERNAL language >>> is the best way for the purpose. Did you compare it with other methods? >>> For example, "oid< FirstNormalObjectId" looks workable for me. >>> >> Hmm. I'm not sure they can be used for index-scans. If these operators are not >> corresponding to index-scans, I want to keep the logic to check INTERNAL language, >> because these have obviously no side effects (= not leakable anything). > > I think the idea that all internal operators are safe has been > thoroughly discredited already. > How can we distinguish an internal operator and others? Because pg_operator does not have a property that we can use to distinguish them, I tried to check function of the operators... >> Hmm. It might be better than ad-hoc enhancement of StdRdOptions. >> BTW, which is more preference to store the flag of security view: >> reloption of the view or a new bool variable in the pg_class? >> >> I tried to store this flag within reloptions to minimize the patch >> size, but it seems to me reloption support for views makes the patch >> size larger in the result. > > I think a boolean in pg_class is the way to go. Is there a padding > byte we can steal, to avoid making the fixed-size portion larger? > OK, I'll add a boolean in pg_class. Perhaps, 'relissecview'? Thanks -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On Tue, Oct 5, 2010 at 10:27 AM, KaiGai Kohei <kaigai@kaigai.gr.jp> wrote: > (2010/10/05 23:16), Robert Haas wrote: >> >> 2010/10/5 KaiGai Kohei<kaigai@ak.jp.nec.com>: >>>> >>>> The term "built-in functions" means functions written in INTERNAL >>>> language >>>> here. But we also have SQL functions by default. Some of them are just a >>>> wrapper to internal functions. I'm not sure the checking of INTERNAL >>>> language >>>> is the best way for the purpose. Did you compare it with other methods? >>>> For example, "oid< FirstNormalObjectId" looks workable for me. >>>> >>> Hmm. I'm not sure they can be used for index-scans. If these operators >>> are not >>> corresponding to index-scans, I want to keep the logic to check INTERNAL >>> language, >>> because these have obviously no side effects (= not leakable anything). >> >> I think the idea that all internal operators are safe has been >> thoroughly discredited already. >> > How can we distinguish an internal operator and others? > Because pg_operator does not have a property that we can use to > distinguish them, I tried to check function of the operators... Checking the functions of the operators is the right thing to do, but assuming that internal = safe does not work. For example, pushing down arithmetic operators allows you to probe for any given value in a hidden row by testing whether 1 / (x - constant) throws a division by zero error. I believe we need a flag in pg_proc, which we should probably set only for a very limited set of operators to start. Really, the only things that seem important to get right for the first version are things that might allow useful indexing, and even then we should err on the side of caution. I think that for review purposes this should be split into two or three patches. First, a patch to add the flag to pg_proc (with SQL support, pg_dump support, psql support, etc.); second, a patch to add CREATE SECURITY VIEW (with SQL support, pg_dump support, psql support, etc.); third (or possibly this can be done together with the previous item), a patch to make the necessary behavior changes for security views. We should have all the patches in hand before we start committing anything, but it will be easier to review in phases. >>> Hmm. It might be better than ad-hoc enhancement of StdRdOptions. >>> BTW, which is more preference to store the flag of security view: >>> reloption of the view or a new bool variable in the pg_class? >>> >>> I tried to store this flag within reloptions to minimize the patch >>> size, but it seems to me reloption support for views makes the patch >>> size larger in the result. >> >> I think a boolean in pg_class is the way to go. Is there a padding >> byte we can steal, to avoid making the fixed-size portion larger? >> > OK, I'll add a boolean in pg_class. Perhaps, 'relissecview'? Doesn't sound bad, but I just had another thought. How about defining a new relkind? It'll make the patch a little more complex but I think if it avoids adding a column to pg_class it's probably worth it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > Checking the functions of the operators is the right thing to do, but > assuming that internal = safe does not work. For example, pushing > down arithmetic operators allows you to probe for any given value in a > hidden row by testing whether 1 / (x - constant) throws a division by > zero error. Well, if the goal is "make it impossible to tell whether such-and-such a value exists", I think this approach can't meet it at all. There are too many side channels. You're focusing here on the error-report side channel, but there's also performance, ie how long did the query take. (BTW, is the intent to somehow lobotomize EXPLAIN so you can't use that to see what happened?) Personally I think this is a dead end that we shouldn't be wasting any more time on. > Doesn't sound bad, but I just had another thought. How about defining > a new relkind? It'll make the patch a little more complex but I think > if it avoids adding a column to pg_class it's probably worth it. New relkind would make the patch affect a lot of stuff that shouldn't need to care, including client-side code. regards, tom lane
On Tue, Oct 5, 2010 at 10:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Personally I think this is a dead end that we shouldn't be wasting > any more time on. But you haven't proposed a reasonable alternative. As far as I can see, there are only two ways to go here. Option #1: Remove all mention from the documentation of using views for security purposes. Don't allow views to have explicit permissions attached to them; they are merely shorthand for a SELECT, for which you either do or do not have privileges. Option #2: Define a standard for what constitutes acceptable information leakage and what does not. Then write the code to try to meet that standard. The status quo, whereby we advise people to security their data by doing something that doesn't actually work, is, to use the non-technical term, dumb. We need to decide what we're going to do about it, not whether we're going to do anything about it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
(2010/10/05 23:56), Tom Lane wrote: > Robert Haas<robertmhaas@gmail.com> writes: >> Checking the functions of the operators is the right thing to do, but >> assuming that internal = safe does not work. For example, pushing >> down arithmetic operators allows you to probe for any given value in a >> hidden row by testing whether 1 / (x - constant) throws a division by >> zero error. > > Well, if the goal is "make it impossible to tell whether such-and-such a > value exists", I think this approach can't meet it at all. There are > too many side channels. You're focusing here on the error-report side > channel, but there's also performance, ie how long did the query take. > (BTW, is the intent to somehow lobotomize EXPLAIN so you can't use that > to see what happened?) > Good point. Major commercial RDBMS with row-level access control (such as Oracle VPD) does not care about any side channels that allows us to infer existence of a certain value. Their features focus on control regular data channels. It allows malicious users to transfer contents of invisible tuples into others unexpectedly. It corresponds to a user defined function which insert the supplied argument into temporary table in my example. So, if we should catch up same level earlier, I think we need to ignore such kind of side channel attacks. If so, the matter become much simple. We need to consider whether contents of the error messages are delivered via main-channel or side-channel. If we consider it is a side-channel, we can trust all the buili-in functions because nothing tries to write out the supplied argument into somewhere. If we consider it is a regular-channel, we need to distinguish safe and unsafe functions based on a certain criteria, maybe, 'safe' flag in pg_proc. In my opinion, I like to categorize error messages as side-channel, because its through-put is much less than regular-channels, so scale of the threat is relatively small. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
(2010/10/06 0:33), KaiGai Kohei wrote: > (2010/10/05 23:56), Tom Lane wrote: >> Robert Haas<robertmhaas@gmail.com> writes: >>> Checking the functions of the operators is the right thing to do, but >>> assuming that internal = safe does not work. For example, pushing >>> down arithmetic operators allows you to probe for any given value in a >>> hidden row by testing whether 1 / (x - constant) throws a division by >>> zero error. >> >> Well, if the goal is "make it impossible to tell whether such-and-such a >> value exists", I think this approach can't meet it at all. There are >> too many side channels. You're focusing here on the error-report side >> channel, but there's also performance, ie how long did the query take. >> (BTW, is the intent to somehow lobotomize EXPLAIN so you can't use that >> to see what happened?) >> > Good point. Major commercial RDBMS with row-level access control > (such as Oracle VPD) does not care about any side channels that > allows us to infer existence of a certain value. > > Their features focus on control regular data channels. It allows Sorry, prevents ^^^^^^ > malicious users to transfer contents of invisible tuples into others > unexpectedly. It corresponds to a user defined function which insert > the supplied argument into temporary table in my example. > > So, if we should catch up same level earlier, I think we need to > ignore such kind of side channel attacks. > > If so, the matter become much simple. We need to consider whether > contents of the error messages are delivered via main-channel or > side-channel. > If we consider it is a side-channel, we can trust all the buili-in > functions because nothing tries to write out the supplied argument > into somewhere. > If we consider it is a regular-channel, we need to distinguish safe and > unsafe functions based on a certain criteria, maybe, 'safe' flag in > pg_proc. > > In my opinion, I like to categorize error messages as side-channel, > because its through-put is much less than regular-channels, so scale > of the threat is relatively small. > > Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Oct 5, 2010 at 10:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Personally I think this is a dead end that we shouldn't be wasting >> any more time on. > But you haven't proposed a reasonable alternative. Tom: "This problem is insoluble." Robert: "You can't claim that without offering a solution." Sorry ... > Option #1: Remove all mention from the documentation of using views > for security purposes. Don't allow views to have explicit permissions > attached to them; they are merely shorthand for a SELECT, for which > you either do or do not have privileges. The SQL standard requires us to attach permissions to views. The standard makes no claims whatsoever about how leak-proof views should be; it only says that you can't call a view without the appropriate permissions. I do think it's reasonable for the docs to point out that views that do row-filtering should not be presumed to be absolutely bulletproof. That doesn't make permissions on them useless, so you're attacking a straw man. regards, tom lane
On Tue, 2010-10-05 at 12:27 -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Tue, Oct 5, 2010 at 10:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Personally I think this is a dead end that we shouldn't be wasting > >> any more time on. > > > But you haven't proposed a reasonable alternative. > > Tom: "This problem is insoluble." > Robert: "You can't claim that without offering a solution." > > Sorry ... > > > Option #1: Remove all mention from the documentation of using views > > for security purposes. Don't allow views to have explicit permissions > > attached to them; they are merely shorthand for a SELECT, for which > > you either do or do not have privileges. > > The SQL standard requires us to attach permissions to views. The > standard makes no claims whatsoever about how leak-proof views should > be; it only says that you can't call a view without the appropriate > permissions. > > I do think it's reasonable for the docs to point out that views that do > row-filtering should not be presumed to be absolutely bulletproof. > That doesn't make permissions on them useless, so you're attacking a > straw man. +1 JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
On Tue, Oct 5, 2010 at 12:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Option #1: Remove all mention from the documentation of using views >> for security purposes. Don't allow views to have explicit permissions >> attached to them; they are merely shorthand for a SELECT, for which >> you either do or do not have privileges. > > The SQL standard requires us to attach permissions to views. The > standard makes no claims whatsoever about how leak-proof views should > be; it only says that you can't call a view without the appropriate > permissions. > > I do think it's reasonable for the docs to point out that views that do > row-filtering should not be presumed to be absolutely bulletproof. > That doesn't make permissions on them useless, so you're attacking a > straw man. Really? I'm confused. What is the use case for the status quo? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Oct 5, 2010 at 12:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> That doesn't make permissions on them useless, so you're attacking a >> straw man. > Really? I'm confused. What is the use case for the status quo? Access to tables that you have no direct privileges on, mainly. (This is significantly more useful when you consider updates on views than when you consider selects alone.) Even if it were totally useless from the point of view you're choosing to adopt, you'd have a hard row to hoe to convince us to take out a SQL-standard feature. The SQL facility *is* leak-proof AFAICS for column filtering. Row filtering, not so much. We could make it leak-proof (by making the view into a hard optimization boundary) but I think everyone's agreed that the performance consequences would be so bad as to make such a feature useless. Unfortunately I don't see any design that avoids the performance penalty while still being guaranteed leak-proof. Once you realize that performance itself can be a leak channel, it may well be that that's simply a contradiction in terms. And I don't see a lot of use in plugging some side channels while others remain open. At least, not without a much more explicit threat model than anyone's actually offered for this patch, which would explain exactly why we need to close these particular side channels and not others. regards, tom lane
Robert Haas <robertmhaas@gmail.com> wrote: > What is the use case for the status quo? Much simplified: create table party ( countyno smallint not null, caseno varchar(14) not null, partyno smallint not null, name text not null, address text, isaddrsealedboolean not null, primary key (countyno, caseno, partyno) ); create table sealedaddrauth ( userid text not null primary key ); create view partyview as select countyno, caseno, partyno, case when isaddrsealed and not exists (select * fromsealedaddrauth where userid = current_user) then '*** SEALED ***' else address end as address, isaddrsealedfrom party ; insert into party values (1,'2010FA000123',1,'Jane Doe', '123 Victim Ave., Anytown, WI 53599',true); insert into party values (1,'2010FA000123',2,'John Doe', '123 Stalker St., Hometown, WI 53666',false); -- Kevin
On Tue, Oct 5, 2010 at 1:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Oct 5, 2010 at 12:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> That doesn't make permissions on them useless, so you're attacking a >>> straw man. > >> Really? I'm confused. What is the use case for the status quo? > > Access to tables that you have no direct privileges on, mainly. > (This is significantly more useful when you consider updates on > views than when you consider selects alone.) Even if it were totally > useless from the point of view you're choosing to adopt, you'd have > a hard row to hoe to convince us to take out a SQL-standard feature. > > The SQL facility *is* leak-proof AFAICS for column filtering. > Row filtering, not so much. OK, that's true. I guess if you only want to filter columns, and not rows, then the way it's done right now works. I didn't think of that. > We could make it leak-proof (by making > the view into a hard optimization boundary) but I think everyone's > agreed that the performance consequences would be so bad as to make > such a feature useless. Unfortunately I don't see any design that > avoids the performance penalty while still being guaranteed > leak-proof. Once you realize that performance itself can be a leak > channel, it may well be that that's simply a contradiction in terms. > And I don't see a lot of use in plugging some side channels while > others remain open. At least, not without a much more explicit > threat model than anyone's actually offered for this patch, which > would explain exactly why we need to close these particular side > channels and not others. Well, the only thing I've ever wanted to do this for was to allow sales reps to see their own customers but not the customers of other sales reps (because if they could pull our complete customer list, then once they left and went to work for $COMPETITOR they'd start trying to pick off our customers; of course, we couldn't prevent them from maintaining a list of their own customers, and no doubt they knew who some of the other customers were, but they couldn't just dump out the complete list from the database). I agree it's hopeless to prevent all side-channel leaks, but I'd describe the goal like this: Prevent access to the actual tuple contents of the hidden rows. Failing to solve this problem at the database level doesn't remove the business requirement. I've solved this problem in the past by ensuring that only trusted users have access to the database, and forcing everyone else to go through an application that restricts the set of queries they can issue. That doesn't eliminate the side-channel leak, though: they can still pull out a stopwatch and attempt to infer the size of the table from the query execution time. But in the above example, the total number of our customers isn't particularly secret, just the identities of those customers (and related information like who the main contact is and when their contract term is up and how much they're paying and all those sorts of things). I think this is a common pattern; and it seems to me that if I can enforce this security policy by writing an application wrapper around the database, I should also be able to have the database itself enforce it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Tue, Oct 5, 2010 at 11:01 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Well, the only thing I've ever wanted to do this for was to allow > sales reps to see their own customers but not the customers of other > sales reps (because if they could pull our complete customer list, > then once they left and went to work for $COMPETITOR they'd start > trying to pick off our customers; of course, we couldn't prevent them > from maintaining a list of their own customers, and no doubt they knew > who some of the other customers were, but they couldn't just dump out > the complete list from the database). I agree it's hopeless to > prevent all side-channel leaks, but I'd describe the goal like this: > > Prevent access to the actual tuple contents of the hidden rows. Though I find it unlikely the sales people would have direct access to run arbitrary SQL -- let alone create custom functions. If the users that have select access on the view don't have DDL access doesn't that make them leak-proof for those users? -- greg
* Greg Stark (gsstark@mit.edu) wrote: > Though I find it unlikely the sales people would have direct access to > run arbitrary SQL -- let alone create custom functions. I'm not really sure why..? Perhaps not quite the same, but I've got quite a few users who have direct SQL access (though they use ODBC on their side, typically, there's nothing which forces them to) and I'd certainly like to have the views that I've created which do row-level filtering work correctly. It's not to the point where I've started using set-returning functions, but it's really not a situation I like being in. :/ > If the users that have select access on the view don't have DDL access > doesn't that make them leak-proof for those users? Yeah.. I feel like we 'fixed' that whole problem with DO in any case.. Thanks, Stephen
On 05.10.2010 21:08, Greg Stark wrote: > If the users that have select access on the view don't have DDL access > doesn't that make them leak-proof for those users? No. You can use built-in functions for leaking data as well. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > ... I agree it's hopeless to > prevent all side-channel leaks, but I'd describe the goal like this: > Prevent access to the actual tuple contents of the hidden rows. > Failing to solve this problem at the database level doesn't remove the > business requirement. I've solved this problem in the past by > ensuring that only trusted users have access to the database, and > forcing everyone else to go through an application that restricts the > set of queries they can issue. That doesn't eliminate the > side-channel leak, though: they can still pull out a stopwatch and > attempt to infer the size of the table from the query execution time. I think you were missing the point of my comment about performance. If the goal is "prevent users from inferring whether value X is present in the table", I believe this patch cannot fix it because it's possible (in some cases) to infer that from performance measurements, ie how long does it take to execute a query that mentions X versus one that mentions Y. I agree that it's unlikely to be practical to extract values that you don't already have a clue about, but broad claims like "prevent all access" are untenable. I believe that we might be able to solve your case of ensuring that a user can't trivially extract the entire table contents, but I don't believe we can solve Kevin's version of the problem, which is whether a stalker can verify the address of a victim that he's not supposed to be able to see. So we need a pretty clear description of exactly what it is we're going to be able to prevent and why such a facility is worth the mess (and future security bugs) it's going to result in. BTW, I thought Kevin's example view was mighty interesting, because it applies the security check in a totally different way than what we've all been implicitly assuming. Ie, instead ofselect * from underlying_table where security_check(); he didselect security_wrapper(underlying_col) from underlying_table; Offhand these approaches seem to have quite different properties. regards, tom lane
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > On 05.10.2010 21:08, Greg Stark wrote: >> If the users that have select access on the view don't have DDL access >> doesn't that make them leak-proof for those users? > No. You can use built-in functions for leaking data as well. There's a difference between "can be used to extract data wholesale" and "can be used to probe for the existence of a specific value". We might need to start using more specific terminology than "leak". regards, tom lane
On Tue, Oct 5, 2010 at 2:08 PM, Greg Stark <gsstark@mit.edu> wrote: > Though I find it unlikely the sales people would have direct access to > run arbitrary SQL -- let alone create custom functions. I have definitely seen shops where virtually everyone has SQL-level access to the database. Several of them. Most of them were pretty insecure, but it certainly doesn't help anything when the database has no capability to do anything better. Now, I will grant you that not everyone in those organizations was actually smart enough to do meaningful things with the access they had, but I never found that very comforting. > If the users that have select access on the view don't have DDL access > doesn't that make them leak-proof for those users? Depends what they can do with pre-existing, or built-in, functions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Tue, 2010-10-05 at 14:49 -0400, Robert Haas wrote: > On Tue, Oct 5, 2010 at 2:08 PM, Greg Stark <gsstark@mit.edu> wrote: > > Though I find it unlikely the sales people would have direct access to > > run arbitrary SQL -- let alone create custom functions. > > I have definitely seen shops where virtually everyone has SQL-level > access to the database. Uhh... yeah it is very common to point access at the database and say go for it. Very common. > Several of them. Most of them were pretty > insecure, but it certainly doesn't help anything when the database has > no capability to do anything better. Now, I will grant you that not > everyone in those organizations was actually smart enough to do > meaningful things with the access they had, but I never found that > very comforting. The better argument here is, the majority (by far, just google it) of espionage is done IN HOUSE. It doesn't matter if it is a sales person. It could be a disgruntled DBA. JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
On Tue, Oct 5, 2010 at 2:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> On 05.10.2010 21:08, Greg Stark wrote: >>> If the users that have select access on the view don't have DDL access >>> doesn't that make them leak-proof for those users? > >> No. You can use built-in functions for leaking data as well. > > There's a difference between "can be used to extract data wholesale" > and "can be used to probe for the existence of a specific value". > We might need to start using more specific terminology than "leak". Yeah. There are a lot of cases. The worst is if you can (1a) dump the underlying table wholesale, or maybe (1b) extract it one row at a time or something like that. Not quite as bad is if you can (2) infer the presence or absence of particular values in particular columns, e.g. via division-by-zero. This is still pretty bad though, because you can probably just keep guessing until you eventually can enumerate everything in that column. If it's a text field or a UUID that may be pretty hard, but if the range of interesting values for that column is limited to, say, a million or so, then you can just iterate through them until you find everything. A related problem is where you can (3) infer the frequency of a value based on the plan choice, either by viewing the EXPLAIN output directly or by timing attacks; and then there's (4) the ability to infer something about the overall amount of data in the underlying table. Any others? Of those, I'm inclined to think that it's possible to close off (1) and (2) pretty thoroughly with sufficient care, but the best you'd be able to do for (3) and (4) is refuse to EXPLAIN to a user without sufficient privileges; the timing attacks seem intractable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On 05.10.2010 21:48, Tom Lane wrote: > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: >> On 05.10.2010 21:08, Greg Stark wrote: >>> If the users that have select access on the view don't have DDL access >>> doesn't that make them leak-proof for those users? > >> No. You can use built-in functions for leaking data as well. > > There's a difference between "can be used to extract data wholesale" > and "can be used to probe for the existence of a specific value". Define wholesale. I'm also not too worried about probing attacks, where you can ask "does this value exist?", but there are plenty of built-in fúnctions that can regurgitate the argument verbatim in an error message. Using that, you can build a script to fetch the value for every row in the table, one row at a time. It's orders of magnitude slower than "SELECT * FROM table", but orders of magnitude faster than probing for every possible value for every row. > We might need to start using more specific terminology than "leak". Yeah. There are many different categories of leakage: 1. Wholesale retrieval of all rows. For example, a custom function that emits the argument with a NOTICE, used in a WHERE clause. 2. Retrieval of a given value in an arbitrary attacker-chosen row. Using a function like text to integer cast that emits the argument in an ERROR falls into this category. You can access any value in the table, but only one at a time because ERROR causes a rollback. 3. Retrieval of some values, not attacker-chosen. For example, statistics might leak the top 100 values. 4. Probing attacks. Let's you check if a row with given value exists. 5. Leakage of statistical information. Lets you know roughly how many rows there are in a table, for example. There's some fuzziness in those too, you might be able to probe for values in an indexed column but not others, for example. Or you might be able to retrieve all values in one column, or all values in another column, but not put them together to form the original rows in the table. IMHO we don't need to protect against 5. Probing attacks can be nasty, but it's next to impossible to protect from them completely. And for many applications, probing attacks are a non-issue. For example, imagine table of usernames and passwords, with a view that lets you see your own password. Probing for other people's passwords would be useless, you might as well try to log in with the guessed password directly. Retrieval of some non-attacker chosen rows, like from statistics, would be nice to avoid where feasible, but I won't lose my sleep over it. I do think we need to protect against 1 and 2. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Tom Lane <tgl@sss.pgh.pa.us> wrote: > I don't believe we can solve Kevin's version of the problem, which > is whether a stalker can verify the address of a victim that he's > not supposed to be able to see. I'm surprised; I thought that we were already there. If someone has SELECT rights on that view, how would they be able to verify an address? More importantly, do you see a way to find out what a particular party's address is when it is unknown? I'm getting the unsettling feeling that I've been missing something important.... By the way, I didn't mean to leave the name column out of the view, but I guess I inadvertently demonstrated another way in which I think the current view implementation adds security. If the column isn't exposed to the view at all, I don't see how access to the view can leak much about the omitted column, but perhaps I'm missing something there, too? > BTW, I thought Kevin's example view was mighty interesting, > because it applies the security check in a totally different way > than what we've all been implicitly assuming. Ie, instead of > select * from underlying_table where security_check(); > he did > select security_wrapper(underlying_col) from underlying_table; > Offhand these approaches seem to have quite different properties. We do both (sometimes in the same query), but obfuscating detail about a database object (case, party, address) is much more common than hiding the existence of these objects. The obfuscated columns are usually not indexed or usable as search criteria. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I don't believe we can solve Kevin's version of the problem, which >> is whether a stalker can verify the address of a victim that he's >> not supposed to be able to see. > I'm surprised; I thought that we were already there. Well, the approach you suggested of putting a security wrapper around the output column might be bulletproof against that; I'm not entirely sure, but I don't see a hole in it at the moment. The trouble with it is that it's pretty bad from a performance point of view, at least for columns that people are supposed to be able to use in WHERE clauses. You couldn't index the wrapper expression either. So I'm not seeing a universal solution there. > By the way, I didn't mean to leave the name column out of the view, > but I guess I inadvertently demonstrated another way in which I > think the current view implementation adds security. If the column > isn't exposed to the view at all, I don't see how access to the view > can leak much about the omitted column, but perhaps I'm missing > something there, too? Right, *column* filtering seems easy and entirely secure. The angst here is about row filtering. Can we have a view in which users can see the values of a column for some rows, with perfect security that they can't identify values for the hidden rows? The stronger form is that they shouldn't even be able to tell that hidden rows exist, which is something your view doesn't try to do; but there are at least some applications where that would be desirable. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Well, the approach you suggested of putting a security wrapper > around the output column might be bulletproof against that; I'm > not entirely sure, but I don't see a hole in it at the moment. > The trouble with it is that it's pretty bad from a performance > point of view, at least for columns that people are supposed to be > able to use in WHERE clauses. OK. We don't really intend for such columns to be used for selection criteria or sorting, so I think we're fine. :-) Thanks for the clarification. > The angst here is about row filtering. OK, I think we're all back on the same page. I only posted because Robert questioned whether there was any security use case for current views. > The stronger form is that they shouldn't even be able to tell that > hidden rows exist, which is something your view doesn't try to do; > but there are at least some applications where that would be > desirable. I can understand that, but from what I've read on the topic, it seems that even some of the most security-conscious government and military users concede that point and just go with meaningless identifiers for inter-table references, so that what leaks is meaningless. <digression> The courts have needs which are a bit different than some other security users; it generally comes down to balancing privacy rights against the rights of the public to access matters of public record. Through the continuing efforts of various committees recommendations, supreme court rules, and legislation, we have one view of the data which is available on the Internet, a more generous view of a county's data if you actually walk into that county's courthouse and use a public access workstation, another level for all parties on certain case types (with the ability to secure some of the data about one party from others if ordered by the court), and then it gets really complicated when you get to what different court staff are allowed to see or modify. Then there can be special exceptions for some business partners, like children's services or police agencies. Right now this is managed by query classes in our Java applications, but as we're moving to a variety of new and different technologies it's getting harder for the DBAs to ensure that nothing is leaking to inappropriate recipients. :-( I think we're going to need to move more of the enforcement to database views and/or security restrictions based on database roles. Some of this seems to fit fairly neatly with the general direction in which KaiGai has been pushing; some of it maybe not so much, because we don't operate on something as simple as "secret", "top secret", etc. But we could use some of the features which seem to be considered pretty esoteric -- like showing different versions of a row to people with different security. For example, on a court calendar, as available to the public in the courthouse for cases scheduled on the current date, a juvenile case would show the child's initials ("In the Interest of J.D."), while the case would not show for the public on the Internet, but the judge involved and the deputy clerks of court who deal with juvenile cases would see the entire name. </digression> Sorry for drifting off topic.... -Kevin
On Tue, Oct 5, 2010 at 1:25 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Right now this is managed by query classes in our Java applications, > but as we're moving to a variety of new and different technologies > it's getting harder for the DBAs to ensure that nothing is leaking > to inappropriate recipients. :-( I think we're going to need to > move more of the enforcement to database views and/or security > restrictions based on database roles. Does Veil cover some of those needs? http://veil.projects.postgresql.org/curdocs/index.html I've never used it, but from what I recall hearing about it, it did something similar (I thought).
On Tue, Oct 5, 2010 at 4:25 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >> The stronger form is that they shouldn't even be able to tell that >> hidden rows exist, which is something your view doesn't try to do; >> but there are at least some applications where that would be >> desirable. > > I can understand that, but from what I've read on the topic, it > seems that even some of the most security-conscious government and > military users concede that point and just go with meaningless > identifiers for inter-table references, so that what leaks is > meaningless. Even apart from inter-table references, you can potentially infer things like table sizes from query response times. But I think that stuff is just intractable as a database problem. In real world situations, you can handle it by interjecting massive latency. For example, if some semi-trusted party asks the US "do you have any nuclear submarines that are currently near this lat/long?", you can give them the answer back, say, the next day. At that point it's pretty hard for them to infer anything about how long it took you to search your database of nuclear-submarine-locations, which is information you might want to keep secret for a variety of reasons. But the amount of latency that you need to insert to provide a safety valve is going to be highly application-dependent, and in many cases it's basically "none", as in the sales rep/customer database I mentioned earlier. > Some of this seems to fit fairly neatly with the general direction > in which KaiGai has been pushing; some of it maybe not so much, > because we don't operate on something as simple as "secret", "top > secret", etc. I wouldn't get the particular issue of leaky views confused with SE-Linux integration. There is definitely a use case out there for label-based mandatory access control, but I don't think anyone would deny that it's a small subset of our total user base. Being able to use views to hide a subset of the rows in some table is a much more generally useful thing to be able to do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
bricklen <bricklen@gmail.com> writes: > Does Veil cover some of those needs? > http://veil.projects.postgresql.org/curdocs/index.html This entire discussion arose from the desire to plug the holes in Veil... regards, tom lane
bricklen <bricklen@gmail.com> wrote: > Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >> Right now this is managed by query classes in our Java >> applications, but as we're moving to a variety of new and >> different technologies it's getting harder for the DBAs to ensure >> that nothing is leaking to inappropriate recipients. :-( I >> think we're going to need to move more of the enforcement to >> database views and/or security restrictions based on database >> roles. > > Does Veil cover some of those needs? > http://veil.projects.postgresql.org/curdocs/index.html > I've never used it, but from what I recall hearing about it, it > did something similar (I thought). Possibly. The general tone of references to it, as well as the fact that it's still classified as being in beta testing, with a release number less than one, have put me off from looking at it closely. Is anyone using it in a situation where they have three thousand directly connected users? Or on a database backing a web site with five million hits a day? On a schema with over 300 tables (normalized, no partitioning)? If so, I'd be interested in hearing about it. -Kevin
On Tue, 2010-10-05 at 12:27 -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Tue, Oct 5, 2010 at 10:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Personally I think this is a dead end that we shouldn't be wasting > >> any more time on. > > > But you haven't proposed a reasonable alternative. > > Tom: "This problem is insoluble." > Robert: "You can't claim that without offering a solution." > > Sorry ... > > > Option #1: Remove all mention from the documentation of using views > > for security purposes. Don't allow views to have explicit permissions > > attached to them; they are merely shorthand for a SELECT, for which > > you either do or do not have privileges. > > The SQL standard requires us to attach permissions to views. The > standard makes no claims whatsoever about how leak-proof views should > be; it only says that you can't call a view without the appropriate > permissions. > > I do think it's reasonable for the docs to point out that views that do > row-filtering should not be presumed to be absolutely bulletproof. > That doesn't make permissions on them useless, so you're attacking a > straw man. +1 JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
(2010/10/06 4:06), Robert Haas wrote: > On Tue, Oct 5, 2010 at 2:48 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: >>> On 05.10.2010 21:08, Greg Stark wrote: >>>> If the users that have select access on the view don't have DDL access >>>> doesn't that make them leak-proof for those users? >> >>> No. You can use built-in functions for leaking data as well. >> >> There's a difference between "can be used to extract data wholesale" >> and "can be used to probe for the existence of a specific value". >> We might need to start using more specific terminology than "leak". > > Yeah. There are a lot of cases. The worst is if you can (1a) dump > the underlying table wholesale, or maybe (1b) extract it one row at a > time or something like that. Not quite as bad is if you can (2) infer > the presence or absence of particular values in particular columns, > e.g. via division-by-zero. This is still pretty bad though, because > you can probably just keep guessing until you eventually can enumerate > everything in that column. If it's a text field or a UUID that may be > pretty hard, but if the range of interesting values for that column is > limited to, say, a million or so, then you can just iterate through > them until you find everything. A related problem is where you can > (3) infer the frequency of a value based on the plan choice, either by > viewing the EXPLAIN output directly or by timing attacks; and then > there's (4) the ability to infer something about the overall amount of > data in the underlying table. Any others? > > Of those, I'm inclined to think that it's possible to close off (1) > and (2) pretty thoroughly with sufficient care, but the best you'd be > able to do for (3) and (4) is refuse to EXPLAIN to a user without > sufficient privileges; the timing attacks seem intractable. > Thanks for good summarize. I also think the case (1) should be closed off soon, because it allows to expose hidden data-contents without any inference of attacker; and its throughput is unignorably fast, so its degree of threat is relatively higher than other cases. <side-note> The idea of throughput is not my own idea. It come from the classic of security evaluation criteria: Trusted Computer System Evaluation Criteria (TCSEC, 1985) See the page.80 of: http://csrc.nist.gov/publications/history/dod85.pdf | From a security perspective, covert channels with low bandwidths represent a | lower threat than those with high bandwidths. However, for many types of | covert channels, techniques used to reduce the bandwidth below a certain rate | (which depends on the specific channel mechanism and the system architecture) | also have the effect of degrading the performance provided to legitimate | system users. Hence, a trade-off between system performance and covert | channel bandwidth must be made. </side-node> I also think we should care about a part of (2) cases. Could you separate the (2) into two cases. The (2a) allows people to see hidden value using error message. In this case, people can see direct value to be hidden, but thorough-put is not fast. The (2b) allows people to infer existence or absence of a certain value using PK or UNIQUE conflicts. (2a) is the reason why my patch allows to push down only operators with internal functions, because these are not obviously leakable. However, I don't think (2b) is the case we should fix up here, because no commercial RDBMSs with RLS don't handle such kind of side-channel attacks using key conflicts. And, it seems to me the cost will be too expensive to care about the case (3) and (4). So, I think it is worthless to fix up them. Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
(2010/10/06 4:15), Heikki Linnakangas wrote: > On 05.10.2010 21:48, Tom Lane wrote: >> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: >>> On 05.10.2010 21:08, Greg Stark wrote: >>>> If the users that have select access on the view don't have DDL access >>>> doesn't that make them leak-proof for those users? >> >>> No. You can use built-in functions for leaking data as well. >> >> There's a difference between "can be used to extract data wholesale" >> and "can be used to probe for the existence of a specific value". > > Define wholesale. I'm also not too worried about probing attacks, where you can ask "does this value exist?", but thereare plenty of built-in fúnctions that can regurgitate the argument verbatim in an error message. Using that, you canbuild a script to > fetch the value for every row in the table, one row at a time. It's orders of magnitude slower than "SELECT * FROM table",but orders of magnitude faster than probing for every possible value for every row. > > > We might need to start using more specific terminology than "leak". > > Yeah. There are many different categories of leakage: > > 1. Wholesale retrieval of all rows. For example, a custom function that> emits the argument with a NOTICE, used in aWHERE clause. It contains a custom function with side-effect; such as INSERT the supplied arguments into private tables. > 2. Retrieval of a given value in an arbitrary attacker-chosen row. Using> a function like text to integer cast that emitsthe argument in an ERROR> falls into this category. You can access any value in the table, but only> one at a time becauseERROR causes a rollback. > > 3. Retrieval of some values, not attacker-chosen. For example, statistics> might leak the top 100 values. > > 4. Probing attacks. Let's you check if a row with given value exists. > > 5. Leakage of statistical information. Lets you know roughly how many rows> there are in a table, for example. > > There's some fuzziness in those too, you might be able to probe for values> in an indexed column but not others, for example.Or you might be able to> retrieve all values in one column, or all values in another column, but not> put them togetherto form the original rows in the table. > > IMHO we don't need to protect against 5. Probing attacks can be nasty, but> it's next to impossible to protect from themcompletely. And for many> applications, probing attacks are a non-issue. For example, imagine table> of usernames andpasswords, with a view that lets you see your own password.> Probing for other people's passwords would be useless, youmight as well> try to log in with the guessed password directly. > > Retrieval of some non-attacker chosen rows, like from statistics, would be> nice to avoid where feasible, but I won't losemy sleep over it. I do think> we need to protect against 1 and 2. > I also agree with this attitude. The case 1 and 2 have differences from others. It allows to expose hidden values, then people may be able to see these values without any inference. So, their through-put is relatively faster than others. It means degree of threat is also higher. If we try to tacker to the matter 1 and 2, suggestions from Itagaki-san are still available, because this patch was designed to prevent people to see hidden data without inferences. Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
On Tue, 2010-10-05 at 14:49 -0400, Robert Haas wrote: > On Tue, Oct 5, 2010 at 2:08 PM, Greg Stark <gsstark@mit.edu> wrote: > > Though I find it unlikely the sales people would have direct access to > > run arbitrary SQL -- let alone create custom functions. > > I have definitely seen shops where virtually everyone has SQL-level > access to the database. Uhh... yeah it is very common to point access at the database and say go for it. Very common. > Several of them. Most of them were pretty > insecure, but it certainly doesn't help anything when the database has > no capability to do anything better. Now, I will grant you that not > everyone in those organizations was actually smart enough to do > meaningful things with the access they had, but I never found that > very comforting. The better argument here is, the majority (by far, just google it) of espionage is done IN HOUSE. It doesn't matter if it is a sales person. It could be a disgruntled DBA. JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
The attached patch is a revised version of the fix-leaky-view patch. List of updates: - It was rebased to the latest git master. - It revised pointer castings in rewriteHandler.c. - It added a new regression test: 'security_views'. - It added SECURITY VIEW support to pg_dump. - It modified the definition of pg_views to show whether it is a security view, or not. - It revised psql to show SECURITY VIEW attribute of views in verbose mode. - I tried to add a new field to pg_class, but scale of code changes are eventually similar to previous version. So, all I revised was to define StdViewOptions, instead of abuse of StdRdOptions. To avoid confusion, I'd like to make clear what is the matter this patch tries to tackle. As we have discussed for a long time, an interaction between functions with side-effects and query optimization on views may cause information leaks, even if owner of the views intend to use them for row-level security. In this context, we define the term of 'leak' as system can expose content of tuples to be invisible unexpectedly, then people can see the content directly without any inferences. Thus, if and when people supplied a function that writes out given arguments into other tables on the WHERE clause of views, it should be invoked after all the conditions within view definition. E.g) CREATE OR REPLACE FUNCTION f_malicious(text) RETURNS bool COST 0.01 LANGUAGE 'sql' AS 'INSERT INTO my_private VALUES($1); SELECT true;'; SELECT * FROM secret_view v WHERE f_malicious(v); In addition, some of functions (including built-in functions) raise an error with message that may contain given arguments. It also allows to expose content of invisible tuples (although throughput of the channel is relatively slow), if it would be pushed down into inside of join-loop. E.g) postgres=# select * from v2; a | b | x | y ---+-----+---+----- 2 | bbb | 2 | www 4 | eee | 4 | yyy (2 rows) postgres=# select * from v2 where y::int = 1; ERROR: invalid input syntax for integer: "vvv" However, it is too expensive to prohibit pushing down all the function calls into join-loops. So, we need allow to push down a part of harmless functions. In this patch, I don't change the logic to decide it. Thus, only operators implemented by functions with INTERNALlanguage are allowed to push down into SECURITY VIEWs. On the other hand, we can infer a certain value of hidden tuple with iteration of probes. In this scenario, people never see the hidden values directly. So, it is not a matter this patch tries to tackle. E.g) postgres=# select * from v1; a | b ---+----- 2 | bbb 4 | eee (2 rows) postgres=# insert into t1 values (3, 'xyz'); ERROR: duplicate key value violates unique constraint "t1_pkey" DETAIL: Key (a)=(3) already exists. postgres=# select * from v1 where 1/(3-a) > 0; ERROR: division by zero As long as I know, commercial RDBMSs with RLS also don't tackle such kind of information leaks via side-channels. At least, I don't think it is a matter that we should tackle with first priority. Thanks, (2010/10/05 18:01), Itagaki Takahiro wrote: > 2010/9/6 KaiGai Kohei<kaigai@ak.jp.nec.com>: >> The attached patch tackles both of the above two points, and allows to >> control this deoptimization when CREATE SECURITY VIEW is used. > > I'll send a review for the patch. > > Contents& Purpose > ================== > The patch adds a new SECURITY option for VIEWs. Views defined with > CREATE SECURITY > VIEW command are not merged with external WHERE-clauses including > security-leakable > functions in queries. However, since internal functions and operators are not > considered as leakable functions, the planner still works well for > security views > unless we use user-defined functions in the WHERE-clause. > > Initial Run > =========== > * Because of the delayed review, the patch has one hunk: > 1 out of 5 hunks FAILED -- saving rejects to file > src/backend/commands/tablecmds.c.rej > but it is not serious at all, and can be easily fixed. > > * It can be compiled, but has two warnings: > rewriteHandler.c: In function 'MarkFuncExprDepthWalker': > rewriteHandler.c:1155: warning: cast from pointer to integer of different size > rewriteHandler.c:1227: warning: cast to pointer from integer of different size > > They should be harmless, but casting intptr_t is often used to avoid warnings: > - L1155: (int) (intptr_t) context; > - L1227: (void *) (intptr_t) (depth + 1) > > * It passes all regression tests, but doesn't have own new tests. > > Remaining works > =============== > The patch might include only core code, but I'll let you know additional > sub-works are still required to complete the feature. Also, I've not measured > the performance yet, though performance might not be so critical for the patch. > > * Regression tests and documentation for the feature are required. > * pg_dump must support to dump SECURITY VIEWs. > They are dumped as normal views for now. > * pg_views can have "security" column. > * psql's \dv can show security attributes of views. > > Questions > ========= >> postgres=# EXPLAIN select * from v1 where f_malicious(b) and a = 2; >> ==> We assume operators implemented with built-in functions are safe. >> So, we don't prevent this pushing-down, if the clause does not >> contains something leakable. > > The term "built-in functions" means functions written in INTERNAL language > here. But we also have SQL functions by default. Some of them are just a > wrapper to internal functions. I'm not sure the checking of INTERNAL language > is the best way for the purpose. Did you compare it with other methods? > For example, "oid< FirstNormalObjectId" looks workable for me. > >> Instead of modifying the structure of pg_class, this patch stores a flag of >> security view on the reloption. So, it needed to modify routines about >> reloptions because it is the first case to store reloptions of views. > > Why did you need to extend StdRdOptions strucuture? Since other fields in > the structure are not used by views at all, I think adding a new structure > struct ViewOptions { vl_len, security_view } > would be better than extending StdRdOptions. > -- KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
On Tue, Oct 5, 2010 at 3:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Right, *column* filtering seems easy and entirely secure. The angst > here is about row filtering. Can we have a view in which users can see > the values of a column for some rows, with perfect security that they > can't identify values for the hidden rows? The stronger form is that > they shouldn't even be able to tell that hidden rows exist, which is > something your view doesn't try to do; but there are at least some > applications where that would be desirable. I took a crack at documenting the current behavior; see attached. It turns out that a view which only uses boolean operators in the WHERE clause is not obviously subvertable, because we judge those operations to have no cost. (It seems unwise to rely on this for security, though.) Anything more complicated - that does row filtering - is easily hacked. See within for details. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Attachment
On 07.10.2010 06:39, Robert Haas wrote: > On Tue, Oct 5, 2010 at 3:42 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Right, *column* filtering seems easy and entirely secure. The angst >> here is about row filtering. Can we have a view in which users can see >> the values of a column for some rows, with perfect security that they >> can't identify values for the hidden rows? The stronger form is that >> they shouldn't even be able to tell that hidden rows exist, which is >> something your view doesn't try to do; but there are at least some >> applications where that would be desirable. > > I took a crack at documenting the current behavior; see attached. Looks good. It gives the impression that you need to be able to a create custom function to exploit, though. It would be good to mention that internal functions can be used too, revoking access to CREATE FUNCTION does not make you safe. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Oct 7, 2010 at 2:02 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 07.10.2010 06:39, Robert Haas wrote: >> >> On Tue, Oct 5, 2010 at 3:42 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> >>> Right, *column* filtering seems easy and entirely secure. The angst >>> here is about row filtering. Can we have a view in which users can see >>> the values of a column for some rows, with perfect security that they >>> can't identify values for the hidden rows? The stronger form is that >>> they shouldn't even be able to tell that hidden rows exist, which is >>> something your view doesn't try to do; but there are at least some >>> applications where that would be desirable. >> >> I took a crack at documenting the current behavior; see attached. > > Looks good. It gives the impression that you need to be able to a create > custom function to exploit, though. It would be good to mention that > internal functions can be used too, revoking access to CREATE FUNCTION does > not make you safe. OK, second try attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Attachment
* Robert Haas (robertmhaas@gmail.com) wrote: > On Thu, Oct 7, 2010 at 2:02 AM, Heikki Linnakangas > > Looks good. It gives the impression that you need to be able to a create > > custom function to exploit, though. It would be good to mention that > > internal functions can be used too, revoking access to CREATE FUNCTION does > > not make you safe. > > OK, second try attached. This might be overly pedantic, but I don't think 'tampering' gives the right impression. Also, there's a marked difference between viewing data by using built-ins such as casting (since you'll only get to see the first value in a column that fails the cast) and being able to write a function that pulls out every row of the table and dumps it into another table. I think it'd have a much bigger impression if you went ahead and changed the 'raise notice' to an 'insert into table x;'. Also, even if you can't create functions (due to lack of create privileges on any schema), you could use DO clauses now. Revoking usage rights on all languages should prevent both though. Thanks, Stephen
On 07.10.2010 16:10, Stephen Frost wrote: > Also, even if you can't create functions (due to lack of create > privileges on any schema), you could use DO clauses now. There's no way to shoehorn a DO clause into a SELECT, you can't do: SELECT data FROM view WHERE (DO $$ RAISE NOTICE argument; $$) = 1 -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Oct 7, 2010 at 9:10 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> On Thu, Oct 7, 2010 at 2:02 AM, Heikki Linnakangas >> > Looks good. It gives the impression that you need to be able to a create >> > custom function to exploit, though. It would be good to mention that >> > internal functions can be used too, revoking access to CREATE FUNCTION does >> > not make you safe. >> >> OK, second try attached. > > This might be overly pedantic, but I don't think 'tampering' gives the > right impression. I'm open to suggestions. > Also, there's a marked difference between viewing > data by using built-ins such as casting (since you'll only get to see > the first value in a column that fails the cast) and being able to write > a function that pulls out every row of the table and dumps it into > another table. Well, that's why I give the more serious example first. Even with casting failures, there's a good chance you can probe a bunch of different rows by throwing random filter conditions into the clause. (function_that_returns_false() OR (name = 'some_constant' AND number::box) > I think it'd have a much bigger impression if you went > ahead and changed the 'raise notice' to an 'insert into table x;'. Possibly, but it makes the example slightly longer, and I think it's clear enough as-is. > Also, even if you can't create functions (due to lack of create > privileges on any schema), you could use DO clauses now. Revoking > usage rights on all languages should prevent both though. I don't see how to make it work with a DO clause. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
* Heikki Linnakangas (heikki.linnakangas@enterprisedb.com) wrote: > On 07.10.2010 16:10, Stephen Frost wrote: >> Also, even if you can't create functions (due to lack of create >> privileges on any schema), you could use DO clauses now. > > There's no way to shoehorn a DO clause into a SELECT, you can't do: > > SELECT data FROM view WHERE (DO $$ RAISE NOTICE argument; $$) = 1 Wow, I just kind of assumed you could; I'm not really sure why. Perhaps it'll be possible in the future though, so might be something to think about if/when it happens. Can't see a way to abuse the view from inside a DO or in a function in the same way either. Stephen
* Robert Haas (robertmhaas@gmail.com) wrote: > On Thu, Oct 7, 2010 at 9:10 AM, Stephen Frost <sfrost@snowman.net> wrote: > > This might be overly pedantic, but I don't think 'tampering' gives the > > right impression. > > I'm open to suggestions. Yeah, wasn't coming up with a better word myself. :/ Maybe something with "circumvented"? > > Also, even if you can't create functions (due to lack of create > > privileges on any schema), you could use DO clauses now. Revoking > > usage rights on all languages should prevent both though. > > I don't see how to make it work with a DO clause. Yeah, Heikki pointed out that I was assuming PG could work more magic than it can today. :) Sorry for the noise. Stephen
Hi, Stephen Frost <sfrost@snowman.net> writes: > Wow, I just kind of assumed you could; I'm not really sure why. Perhaps > it'll be possible in the future though, so might be something to think > about if/when it happens. Can't see a way to abuse the view from inside > a DO or in a function in the same way either. It took me some time and reviewing the patch to think about it this way, so maybe that would help some readers here: DO is a utility command, not a query. That also explains why it does not get parameters. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
I noticed the previous patch has an obvious conflict to the latest git mater, and it does not have any documentation updates. So, I rebased the patch, and added descriptions about secure views. Rest of parts are unchanged. Thanks, (2010/10/06 17:01), KaiGai Kohei wrote: > The attached patch is a revised version of the fix-leaky-view patch. > > List of updates: > - It was rebased to the latest git master. > - It revised pointer castings in rewriteHandler.c. > - It added a new regression test: 'security_views'. > - It added SECURITY VIEW support to pg_dump. > - It modified the definition of pg_views to show whether it is > a security view, or not. > - It revised psql to show SECURITY VIEW attribute of views in > verbose mode. > - I tried to add a new field to pg_class, but scale of code changes > are eventually similar to previous version. So, all I revised was > to define StdViewOptions, instead of abuse of StdRdOptions. > > > To avoid confusion, I'd like to make clear what is the matter this > patch tries to tackle. > As we have discussed for a long time, an interaction between functions > with side-effects and query optimization on views may cause information > leaks, even if owner of the views intend to use them for row-level > security. > In this context, we define the term of 'leak' as system can expose > content of tuples to be invisible unexpectedly, then people can see > the content directly without any inferences. > > Thus, if and when people supplied a function that writes out given > arguments into other tables on the WHERE clause of views, it should > be invoked after all the conditions within view definition. > > E.g) CREATE OR REPLACE FUNCTION f_malicious(text) > RETURNS bool COST 0.01 LANGUAGE 'sql' > AS 'INSERT INTO my_private VALUES($1); SELECT true;'; > SELECT * FROM secret_view v WHERE f_malicious(v); > > In addition, some of functions (including built-in functions) raise > an error with message that may contain given arguments. It also allows > to expose content of invisible tuples (although throughput of the > channel is relatively slow), if it would be pushed down into inside > of join-loop. > > E.g) postgres=# select * from v2; > a | b | x | y > ---+-----+---+----- > 2 | bbb | 2 | www > 4 | eee | 4 | yyy > (2 rows) > > postgres=# select * from v2 where y::int = 1; > ERROR: invalid input syntax for integer: "vvv" > > However, it is too expensive to prohibit pushing down all the function > calls into join-loops. So, we need allow to push down a part of harmless > functions. In this patch, I don't change the logic to decide it. > Thus, only operators implemented by functions with INTERNALlanguage > are allowed to push down into SECURITY VIEWs. > > On the other hand, we can infer a certain value of hidden tuple with > iteration of probes. In this scenario, people never see the hidden > values directly. So, it is not a matter this patch tries to tackle. > > E.g) postgres=# select * from v1; > a | b > ---+----- > 2 | bbb > 4 | eee > (2 rows) > > postgres=# insert into t1 values (3, 'xyz'); > ERROR: duplicate key value violates unique constraint "t1_pkey" > DETAIL: Key (a)=(3) already exists. > postgres=# select * from v1 where 1/(3-a)> 0; > ERROR: division by zero > > As long as I know, commercial RDBMSs with RLS also don't tackle such > kind of information leaks via side-channels. At least, I don't think > it is a matter that we should tackle with first priority. > > Thanks, > > (2010/10/05 18:01), Itagaki Takahiro wrote: >> 2010/9/6 KaiGai Kohei<kaigai@ak.jp.nec.com>: >>> The attached patch tackles both of the above two points, and allows to >>> control this deoptimization when CREATE SECURITY VIEW is used. >> >> I'll send a review for the patch. >> >> Contents& Purpose >> ================== >> The patch adds a new SECURITY option for VIEWs. Views defined with >> CREATE SECURITY >> VIEW command are not merged with external WHERE-clauses including >> security-leakable >> functions in queries. However, since internal functions and operators are not >> considered as leakable functions, the planner still works well for >> security views >> unless we use user-defined functions in the WHERE-clause. >> >> Initial Run >> =========== >> * Because of the delayed review, the patch has one hunk: >> 1 out of 5 hunks FAILED -- saving rejects to file >> src/backend/commands/tablecmds.c.rej >> but it is not serious at all, and can be easily fixed. >> >> * It can be compiled, but has two warnings: >> rewriteHandler.c: In function 'MarkFuncExprDepthWalker': >> rewriteHandler.c:1155: warning: cast from pointer to integer of different size >> rewriteHandler.c:1227: warning: cast to pointer from integer of different size >> >> They should be harmless, but casting intptr_t is often used to avoid warnings: >> - L1155: (int) (intptr_t) context; >> - L1227: (void *) (intptr_t) (depth + 1) >> >> * It passes all regression tests, but doesn't have own new tests. >> >> Remaining works >> =============== >> The patch might include only core code, but I'll let you know additional >> sub-works are still required to complete the feature. Also, I've not measured >> the performance yet, though performance might not be so critical for the patch. >> >> * Regression tests and documentation for the feature are required. >> * pg_dump must support to dump SECURITY VIEWs. >> They are dumped as normal views for now. >> * pg_views can have "security" column. >> * psql's \dv can show security attributes of views. >> >> Questions >> ========= >>> postgres=# EXPLAIN select * from v1 where f_malicious(b) and a = 2; >>> ==> We assume operators implemented with built-in functions are safe. >>> So, we don't prevent this pushing-down, if the clause does not >>> contains something leakable. >> >> The term "built-in functions" means functions written in INTERNAL language >> here. But we also have SQL functions by default. Some of them are just a >> wrapper to internal functions. I'm not sure the checking of INTERNAL language >> is the best way for the purpose. Did you compare it with other methods? >> For example, "oid< FirstNormalObjectId" looks workable for me. >> >>> Instead of modifying the structure of pg_class, this patch stores a flag of >>> security view on the reloption. So, it needed to modify routines about >>> reloptions because it is the first case to store reloptions of views. >> >> Why did you need to extend StdRdOptions strucuture? Since other fields in >> the structure are not used by views at all, I think adding a new structure >> struct ViewOptions { vl_len, security_view } >> would be better than extending StdRdOptions. >> > > > > > -- KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
2010/10/12 KaiGai Kohei <kaigai@ak.jp.nec.com>: > I noticed the previous patch has an obvious conflict to the latest > git mater, and it does not have any documentation updates. > > So, I rebased the patch, and added descriptions about secure views. > Rest of parts are unchanged. It seems that we have rough agreement that the existing VIEW feature is adequate for column filtering but not for row filtering. The attack vector is that the planner might reorder quals such that a value not intended to be visible to the user is passed to a function which has a side-effect that can expose the value passed to it: either overtly (as by a user-defined function that writes it to the table or prints it using RAISE NOTICE) or in some more subtle way (as in the case where division by zero exposes throws an exception when passed zero, but not some other value). With the possible exception of Tom, everyone seems to agree that it would be a good step forward to provide a way of plugging these holes, even if it didn't cover subtler information leaks such as by reading the EXPLAIN output or timing query execution. 1. Does anyone wish to argue (or continue arguing) that plugging these more overt information leaks is not worthwhile? 2. Supposing that the answer to question #1 is in the negative, does anyone wish to argue that this patch as currently written is an adequate solution to this problem? It seems obvious to me that it isn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > With the possible exception of Tom, > everyone seems to agree that it would be a good step forward to > provide a way of plugging these holes, even if it didn't cover subtler > information leaks such as by reading the EXPLAIN output or timing > query execution. > 1. Does anyone wish to argue (or continue arguing) that plugging these > more overt information leaks is not worthwhile? Yeah, I will. Plugging an "overt" information leak without plugging other channels in the same area isn't a security improvement. It's merely PR, and rather lame PR at that. An attacker is not bound to use only the attack methods you'd like him to. This would only be a security improvement if there were plausible attack scenarios in which the attacker would have access to the plugged channel and not access to the other known channels. Now, perhaps that's the case, but no one has put forward an argument showing it. I think the burden of proof is on those who favor the patch to put forward that argument, not for those who don't favor it to try to prove that no such scenario exists. > 2. Supposing that the answer to question #1 is in the negative, does > anyone wish to argue that this patch as currently written is an > adequate solution to this problem? It seems obvious to me that it > isn't. In that case, one's opinion about #1 hardly matters does it? regards, tom lane
(2010/10/13 22:43), Tom Lane wrote: > Robert Haas<robertmhaas@gmail.com> writes: >> With the possible exception of Tom, >> everyone seems to agree that it would be a good step forward to >> provide a way of plugging these holes, even if it didn't cover subtler >> information leaks such as by reading the EXPLAIN output or timing >> query execution. > >> 1. Does anyone wish to argue (or continue arguing) that plugging these >> more overt information leaks is not worthwhile? > > Yeah, I will. Plugging an "overt" information leak without plugging > other channels in the same area isn't a security improvement. It's > merely PR, and rather lame PR at that. An attacker is not bound to > use only the attack methods you'd like him to. > It seems to me an extreme opinion, and different from the standard point of security view. It is a quotation from the classic of security evaluation criteria. Trusted Computer System Evaluation Criteria (TCSEC, DoD) says in the chapter of "A GUIDELINE ON COVERT CHANNELS" as follows: http://csrc.nist.gov/publications/history/dod85.pdf | From a security perspective, covert channels with low bandwidths represent a | lower threat than those with high bandwidths. However, for many types of | covert channels, techniques used to reduce the bandwidth below a certain rate | (which depends on the specific channel mechanism and the system architecture) | also have the effect of degrading the performance provided to legitimate | system users. Hence, a trade-off between system performance and covert | channel bandwidth must be made The "overt" channels has a capability to leak massive invisible information, so we need to consider them as a serious threat to be fixed up in higher priority. However, it is doubtful whether the rest of channels provides enough bandwidth as actual threat. It also means degree of the threat is relatively small than the "overt" channels. Previous security researcher pointed out security is trading-off, not all-or-nothing. If we can plug most part of the threat with reasonable performance degrading, it is worthwhile to fix up. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
KaiGai Kohei <kaigai@kaigai.gr.jp> wrote: > Previous security researcher pointed out security is trading-off, > not all-or-nothing. If we can plug most part of the threat with > reasonable performance degrading, it is worthwhile to fix up. I had the pleasure of hearing Admiral Grace Hopper[1] speak at an ACM luncheon once. When she discussed security, she asserted that there was no such thing as security which could not be breached. The goal of security efforts should not be to make it perfect, because you can't; any time you convince yourself you have that you are simply fooling yourself and missing the vulnerabilities. In her view the goal was to make the costs of breaching security higher to the perpetrator than the benefits. Each obstacle in their way helps tip the scales in your favor. -Kevin http://en.wikipedia.org/wiki/Grace_Hopper
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > I had the pleasure of hearing Admiral Grace Hopper[1] speak at an > ACM luncheon once. When she discussed security, she asserted that > there was no such thing as security which could not be breached. > The goal of security efforts should not be to make it perfect, > because you can't; any time you convince yourself you have that you > are simply fooling yourself and missing the vulnerabilities. In her > view the goal was to make the costs of breaching security higher to > the perpetrator than the benefits. Each obstacle in their way helps > tip the scales in your favor. That's all true, but you have to consider how much the obstacle actually gets in their way versus how painful it is on your end to create and maintain the obstacle. I don't think this proposed patch measures up very well on either end of that tradeoff. regards, tom lane
On Wed, Oct 13, 2010 at 9:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> With the possible exception of Tom, >> everyone seems to agree that it would be a good step forward to >> provide a way of plugging these holes, even if it didn't cover subtler >> information leaks such as by reading the EXPLAIN output or timing >> query execution. > >> 1. Does anyone wish to argue (or continue arguing) that plugging these >> more overt information leaks is not worthwhile? > > Yeah, I will. Plugging an "overt" information leak without plugging > other channels in the same area isn't a security improvement.It's > merely PR, and rather lame PR at that. An attacker is not bound to > use only the attack methods you'd like him to. You may as well argue that I shouldn't bother locking the doors to my house because a determined attacker could simply break the windows. They certainly could, and yet I am altogether convinced that a habit of locking my doors when I am away reduces the chances that my house will be burgled. Breaking the windows would be altogether more obvious and more likely to be result in the police being summoned. Locking the doors won't protect me against someone who is bound and determined to rob my house specifically, but it provides fairly good protection against someone who walks around the neighborhood and robs each unlocked house, which is not an unrealistic threat model. But let us suppose that I went a step further and purchased the best burglar alarm money can buy, reinforced steel doors, and an expensive alarm monitoring service. Further, let us suppose that I retained a 24x7 armed guard. This would likely be a waste of money because there's not a whole lot in my house worth stealing (and if there were I wouldn't post the details to a public mailing list) but let us suppose that I did it anyway. I would now be about as secure against burglary as one can hope to be, and yet I'm still pretty sure that the CIA could manage to clandestinely remove something from my home against my will, were they of a mind to do so. In other words - true, an attacker isn't bound to use only the attack methods I'd like him to - but on the other hand, I'm not bound to care about protecting myself against every type of attack. I set a goal for what level of security I'd like to achieve, and then I try to meet it. You seem to believe that being able to infer the total size of a table or the frequency of some particular key in the table is equivalent to being able to trivially read every row of it. That seems off the wall to me, and I'd like to see you justify that position. I and others have already posted examples of situations where this is not the case, such as my example of allowing sales reps to view only their own customers. I believe Kevin has posted similar examples: the number of cases is not a secret, but details of individual ones may be. These cases are taken from real business situations and I don't understand, what, if anything, you find implausible about them. http://archives.postgresql.org/pgsql-hackers/2010-10/msg00299.php > This would only be a security improvement if there were plausible attack > scenarios in which the attacker would have access to the plugged channel > and not access to the other known channels. Now, perhaps that's the > case, but no one has put forward an argument showing it. I think the > burden of proof is on those who favor the patch to put forward that > argument, not for those who don't favor it to try to prove that no such > scenario exists. I don't favor this particular patch, but I think you're the only person arguing that there is no subset of this problem which is both tractable and useful. As far as I can tell, Heikki, KaiGai, Stephen Frost, Kevin Grittner, and myself are all on approximately the same page about where a meaningful dividing line can be drawn. The question is not really whether the attacker would have access to the un-plugged channels but how much and what type of information they actually leak. AIUI, the vectors that the proposed approach doesn't block are basically EXPLAIN and query response times. What can you infer from those? AFAICS, you're not going to be able to do better than inferring whether a given value is an MCV, which is of a totally different order than the wholesale data leakage which can be trivially created using today's system. And even that can be made much harder by blocking EXPLAIN, which can be done quite easily using ProcessUtility_hook, and therefore doesn't need to be addressed by the patch. But even suppose someone can reliably infer MCVs. It does not follow that because I'm worried about someone enumerating the contents of my entire table that I am also worried about them learning the MCVs of those columns which have non-unique indices. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Oct 13, 2010 at 11:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> I had the pleasure of hearing Admiral Grace Hopper[1] speak at an >> ACM luncheon once. When she discussed security, she asserted that >> there was no such thing as security which could not be breached. >> The goal of security efforts should not be to make it perfect, >> because you can't; any time you convince yourself you have that you >> are simply fooling yourself and missing the vulnerabilities. In her >> view the goal was to make the costs of breaching security higher to >> the perpetrator than the benefits. Each obstacle in their way helps >> tip the scales in your favor. > > That's all true, but you have to consider how much the obstacle actually > gets in their way versus how painful it is on your end to create and > maintain the obstacle. I don't think this proposed patch measures up > very well on either end of that tradeoff. I think it would behoove us to try to separate concerns about this particular patch from concerns about the viability of the whole approach. Whether or not it's useful to do X is a different question than whether it can be done with few enough lines of code and/or whether this patch actually does it well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Oct 13, 2010 at 11:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> That's all true, but you have to consider how much the obstacle actually >> gets in their way versus how painful it is on your end to create and >> maintain the obstacle. �I don't think this proposed patch measures up >> very well on either end of that tradeoff. > I think it would behoove us to try to separate concerns about this > particular patch from concerns about the viability of the whole > approach. Whether or not it's useful to do X is a different question > than whether it can be done with few enough lines of code and/or > whether this patch actually does it well. I think I left the wrong impression: I'm concerned about the whole approach. I haven't even read this particular patch lately. I think that trying to guarantee that the planner applies independent constraints in a particular order will be expensive, fragile, and prone to recurring security bugs no matter how it's implemented --- unless you do it by lobotomizing query pullup/pushdown, which seems unacceptable from a performance standpoint. Just to give one example of what this patch misses (probably; as I said I haven't read it lately), what about selectivity estimation? One of the things we like to do when we have an otherwise-unknown function is to try it on all the histogram members and see what percentage yield true. That might already represent enough of an information leak to be objectionable ... and yet, if we don't do it, the consequences for performance could be pretty horrid because we'll have to work without any reasonable selectivity estimate at all. There are cases where this technique isn't applied at the moment but probably should be, which is why I characterize the leak-prevention idea as creating future security issues: doing that is a constraint that will have to be accounted for in every future planner change, and we are certain to miss the implications sometimes. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > You seem to believe that being able to infer the total size of a > table or the frequency of some particular key in the table is > equivalent to being able to trivially read every row of it. I don't say that they're equivalent. I do say that what this patch is mostly trying to do is solve a PR problem, and from the PR standpoint it doesn't help: the "OMG Postgres exposes my information" crowd is not going to distinguish leaks that only expose MCVs from those that trivially allow sucking out the entire table. There are furthermore plenty of situations where statistical information *is* of interest to attackers; the traditional example is obtaining the min and max of a salary column to infer something about what particular people are getting paid. So I think if we accept this patch or something like it, we are going to spend a large part of the next ten years trying to close other holes of the same ilk, and that's not a development plan I'm willing to buy into. I am much happier just making the statement that we don't try to prevent that type of leak than giving people the impression that we are committed to trying to prevent it. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > the "OMG Postgres exposes my information" crowd is not going to > distinguish leaks that only expose MCVs from those that trivially > allow sucking out the entire table. Well, I'd be in the crowd that would go "OMG" over one but not the other. At least in our case management software I can't think of any MCVs which would be a problem, while exposing entire tables would be a big problem. If you get the name, address, birth date, or even the social security number in isolation, it doesn't mean much. If you get all of those for one party, it does. I suppose that if you could find that a particular name was used somewhere in the Party table but it was not visible in the public record, you could guess that someone by that name (which is certainly not guaranteed to be unique!) was somehow involved in some role in a juvenile, mental commitment, adoption, sealed, or other confidential case -- but what role in what kind of case would still be a complete mystery, making it much less of a leak than the row in its entirety, much less the entire table (which could expose, for example, who adopted whom -- information not available from a single row). If you are arguing that the ability of someone to know that someone, somewhere, who has had contact with the Wisconsin court system has social security number 987-65-4321 is the same as knowing who has that social security number, and all the demographics about that person, you're dangerously mistaken. -Kevin
(2010/10/14 1:52), Tom Lane wrote: > Robert Haas<robertmhaas@gmail.com> writes: >> On Wed, Oct 13, 2010 at 11:45 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> That's all true, but you have to consider how much the obstacle actually >>> gets in their way versus how painful it is on your end to create and >>> maintain the obstacle. �I don't think this proposed patch measures up >>> very well on either end of that tradeoff. > >> I think it would behoove us to try to separate concerns about this >> particular patch from concerns about the viability of the whole >> approach. Whether or not it's useful to do X is a different question >> than whether it can be done with few enough lines of code and/or >> whether this patch actually does it well. +1. If the patch I proposed is not enough elegant to commit immediately, we can discuss how we can get the patch fixing the problem commitable. > I think I left the wrong impression: I'm concerned about the whole > approach. I haven't even read this particular patch lately. I think > that trying to guarantee that the planner applies independent > constraints in a particular order will be expensive, fragile, and prone > to recurring security bugs no matter how it's implemented --- unless you > do it by lobotomizing query pullup/pushdown, which seems unacceptable > from a performance standpoint. > It can be controllable by users. Unless specifying "SECURITY VIEW" flag, it does not restrain a part of optimizations that allows to execute functions which may leak the supplied arguments unexpectedly earlier than the functions in deeper level. Also, it does not ignore performance point of view. Even if we apply this patch, a part of functions that we believe they don't leak any supplied arguments can be pushed down inside of the join-loop. Eventually, these invocations shall be utilized for index-scanning. The reason why I, Robert and Heikki had a discussion what type of functions should be allowed to push down is to decide a principle to bypass this security barrier. > Just to give one example of what this patch misses (probably; as I said > I haven't read it lately), what about selectivity estimation? One of > the things we like to do when we have an otherwise-unknown function is > to try it on all the histogram members and see what percentage yield > true. That might already represent enough of an information leak to be > objectionable ... and yet, if we don't do it, the consequences for > performance could be pretty horrid because we'll have to work without > any reasonable selectivity estimate at all. It is a bit unclear for me where is the point of your concerns. If user wants to generate a histogram from result set of a view that filters tuples to be invisible, it should just generate the histogram from the filtered data set. Even if possible malicious functions are appended to WHERE clause, the optimizer does not execute them prior to deeper level functions, as long as is has "SECURITY VIEW" flag. Of course, here is a performance trade-off, as earlier researcher pointed out, but user can inform on which does he give higher priority. Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
On Mon, Oct 18, 2010 at 5:02 AM, KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: >> Just to give one example of what this patch misses (probably; as I said >> I haven't read it lately), what about selectivity estimation? One of >> the things we like to do when we have an otherwise-unknown function is >> to try it on all the histogram members and see what percentage yield >> true. That might already represent enough of an information leak to be >> objectionable ... and yet, if we don't do it, the consequences for >> performance could be pretty horrid because we'll have to work without >> any reasonable selectivity estimate at all. > > It is a bit unclear for me where is the point of your concerns. > If user wants to generate a histogram from result set of a view that > filters tuples to be invisible, it should just generate the histogram > from the filtered data set. > Even if possible malicious functions are appended to WHERE clause, > the optimizer does not execute them prior to deeper level functions, > as long as is has "SECURITY VIEW" flag. > Of course, here is a performance trade-off, as earlier researcher > pointed out, but user can inform on which does he give higher priority. You need to go back and reread Tom's email until you understand what he's complaining about, because it's a very serious problem. I hope that there is a way around it, because we're not going to be able to implement any form of row-level security - EVER - unless we figure out how to address it. I've been mulling it over a bit and so far I'm stumped (which is why I haven't replied). Unfortunately, I don't have any more time to devote to this right now, so I haven't studied the code in detail, but at the moment I'd say we're dead in the water. I am going to mark this patch Returned with Feedback. Even were the issue raised by Tom not a problem, it's pretty clear that this patch as written is still going to allow an unacceptable amount of information leakage, and wouldn't be committable anyway. But the problem Tom raises is so severe that there's no point in writing any more code unless and until we have a good idea what we're going to do about it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
(2010/10/14 1:52), Tom Lane wrote: > Robert Haas<robertmhaas@gmail.com> writes: >> On Wed, Oct 13, 2010 at 11:45 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> That's all true, but you have to consider how much the obstacle actually >>> gets in their way versus how painful it is on your end to create and >>> maintain the obstacle. �I don't think this proposed patch measures up >>> very well on either end of that tradeoff. > >> I think it would behoove us to try to separate concerns about this >> particular patch from concerns about the viability of the whole >> approach. Whether or not it's useful to do X is a different question >> than whether it can be done with few enough lines of code and/or >> whether this patch actually does it well. > > I think I left the wrong impression: I'm concerned about the whole > approach. I haven't even read this particular patch lately. I think > that trying to guarantee that the planner applies independent > constraints in a particular order will be expensive, fragile, and prone > to recurring security bugs no matter how it's implemented --- unless you > do it by lobotomizing query pullup/pushdown, which seems unacceptable > from a performance standpoint. > > Just to give one example of what this patch misses (probably; as I said > I haven't read it lately), what about selectivity estimation? One of > the things we like to do when we have an otherwise-unknown function is > to try it on all the histogram members and see what percentage yield > true. That might already represent enough of an information leak to be > objectionable ... and yet, if we don't do it, the consequences for > performance could be pretty horrid because we'll have to work without > any reasonable selectivity estimate at all. There are cases where this > technique isn't applied at the moment but probably should be, which is > why I characterize the leak-prevention idea as creating future security > issues: doing that is a constraint that will have to be accounted for in > every future planner change, and we are certain to miss the implications > sometimes. > Sorry, I might misread what you pointed out. Do you see oprrest/oprjoin being internally invoked as a problem? Well, I also think it is a problem, as long as they can be installed by non-superusers. But, it is a separated problem from the row-level security issues. In my opinion, origin of the matter is incorrect checks on creation of operators. It allows non-superusers to define a new operator with restriction and join estimator as long as he has EXECUTE privilege on the functions. (not EXECUTE privilege of actual user who invokes this function on estimation phase!) Then, the optimizer may internally invoke these functions without any privilege checks on either the function or the table to be estimated. If a malicious one tries to make other users to invoke a trojan-horse, he can define a trappy operator with malicious estimator functions, cannot it? I think it is a situation that we should check superuser privilege which means the specified functions have no malicious intention and equivalent to the privilege to grant 'EXECUTE' to public. Here is similar problem at conversion functions. If malicious one want to install a fake conversion function that records all the stream between server and client, it seems to me not impossible. Well, I'd like to suggest to revise the specifications of permission checks on these commands. If we can ensure these functions are not malicious, no need to care about information leaks via untrusted functions internally used. Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
On Oct 19, 2010, at 4:34 AM, KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > (2010/10/14 1:52), Tom Lane wrote: >> Robert Haas<robertmhaas@gmail.com> writes: >>> On Wed, Oct 13, 2010 at 11:45 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>>> That's all true, but you have to consider how much the obstacle actually >>>> gets in their way versus how painful it is on your end to create and >>>> maintain the obstacle. �I don't think this proposed patch measures up >>>> very well on either end of that tradeoff. >> >>> I think it would behoove us to try to separate concerns about this >>> particular patch from concerns about the viability of the whole >>> approach. Whether or not it's useful to do X is a different question >>> than whether it can be done with few enough lines of code and/or >>> whether this patch actually does it well. >> >> I think I left the wrong impression: I'm concerned about the whole >> approach. I haven't even read this particular patch lately. I think >> that trying to guarantee that the planner applies independent >> constraints in a particular order will be expensive, fragile, and prone >> to recurring security bugs no matter how it's implemented --- unless you >> do it by lobotomizing query pullup/pushdown, which seems unacceptable >> from a performance standpoint. >> >> Just to give one example of what this patch misses (probably; as I said >> I haven't read it lately), what about selectivity estimation? One of >> the things we like to do when we have an otherwise-unknown function is >> to try it on all the histogram members and see what percentage yield >> true. That might already represent enough of an information leak to be >> objectionable ... and yet, if we don't do it, the consequences for >> performance could be pretty horrid because we'll have to work without >> any reasonable selectivity estimate at all. There are cases where this >> technique isn't applied at the moment but probably should be, which is >> why I characterize the leak-prevention idea as creating future security >> issues: doing that is a constraint that will have to be accounted for in >> every future planner change, and we are certain to miss the implications >> sometimes. >> > Sorry, I might misread what you pointed out. I think you're still misreading it. Want to try a third time? > Do you see oprrest/oprjoin being internally invoked as a problem? > Well, I also think it is a problem, as long as they can be installed > by non-superusers. But, it is a separated problem from the row-level > security issues. > > In my opinion, origin of the matter is incorrect checks on creation > of operators. It allows non-superusers to define a new operator with > restriction and join estimator as long as he has EXECUTE privilege > on the functions. (not EXECUTE privilege of actual user who invokes > this function on estimation phase!) > Then, the optimizer may internally invoke these functions without > any privilege checks on either the function or the table to be > estimated. If a malicious one tries to make other users to invoke > a trojan-horse, he can define a trappy operator with malicious > estimator functions, cannot it? This is a pretty poor excuse for a Trojan horse attack. > ...Robert
(2010/10/19 21:31), Robert Haas wrote: > On Oct 19, 2010, at 4:34 AM, KaiGai Kohei<kaigai@ak.jp.nec.com> wrote: >> (2010/10/14 1:52), Tom Lane wrote: >>> Robert Haas<robertmhaas@gmail.com> writes: >>>> On Wed, Oct 13, 2010 at 11:45 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>>>> That's all true, but you have to consider how much the obstacle actually >>>>> gets in their way versus how painful it is on your end to create and >>>>> maintain the obstacle. �I don't think this proposed patch measures up >>>>> very well on either end of that tradeoff. >>> >>>> I think it would behoove us to try to separate concerns about this >>>> particular patch from concerns about the viability of the whole >>>> approach. Whether or not it's useful to do X is a different question >>>> than whether it can be done with few enough lines of code and/or >>>> whether this patch actually does it well. >>> >>> I think I left the wrong impression: I'm concerned about the whole >>> approach. I haven't even read this particular patch lately. I think >>> that trying to guarantee that the planner applies independent >>> constraints in a particular order will be expensive, fragile, and prone >>> to recurring security bugs no matter how it's implemented --- unless you >>> do it by lobotomizing query pullup/pushdown, which seems unacceptable >>> from a performance standpoint. >>> >>> Just to give one example of what this patch misses (probably; as I said >>> I haven't read it lately), what about selectivity estimation? One of >>> the things we like to do when we have an otherwise-unknown function is >>> to try it on all the histogram members and see what percentage yield >>> true. That might already represent enough of an information leak to be >>> objectionable ... and yet, if we don't do it, the consequences for >>> performance could be pretty horrid because we'll have to work without >>> any reasonable selectivity estimate at all. There are cases where this >>> technique isn't applied at the moment but probably should be, which is >>> why I characterize the leak-prevention idea as creating future security >>> issues: doing that is a constraint that will have to be accounted for in >>> every future planner change, and we are certain to miss the implications >>> sometimes. >>> >> Sorry, I might misread what you pointed out. > > I think you're still misreading it. Hmm. In my understanding, it seems to me he concerned about possible leaky estimate functions, so he mentioned the horrible performance degrading, if we don't allow to execute these functions. So, I suggested an idea that enforces all estimate functions being installed by superusers; it enables us to assume they are enough safe. > Want to try a third time? However, actually, it is still unclear for me... :-( >> Do you see oprrest/oprjoin being internally invoked as a problem? >> Well, I also think it is a problem, as long as they can be installed >> by non-superusers. But, it is a separated problem from the row-level >> security issues. >> >> In my opinion, origin of the matter is incorrect checks on creation >> of operators. It allows non-superusers to define a new operator with >> restriction and join estimator as long as he has EXECUTE privilege >> on the functions. (not EXECUTE privilege of actual user who invokes >> this function on estimation phase!) >> Then, the optimizer may internally invoke these functions without >> any privilege checks on either the function or the table to be >> estimated. If a malicious one tries to make other users to invoke >> a trojan-horse, he can define a trappy operator with malicious >> estimator functions, cannot it? > > This is a pretty poor excuse for a Trojan horse attack. > >> > > ...Robert > -- KaiGai Kohei <kaigai@kaigai.gr.jp>
KaiGai Kohei <kaigai@kaigai.gr.jp> writes: > (2010/10/19 21:31), Robert Haas wrote: >> I think you're still misreading it. > Hmm. In my understanding, it seems to me he concerned about possible leaky > estimate functions, so he mentioned the horrible performance degrading, if > we don't allow to execute these functions. No, it wasn't about estimator functions at all, it was about what we do in the absence of an estimator. > So, I suggested an idea that enforces all estimate functions being installed > by superusers; it enables us to assume they are enough safe. Even if we were willing to accept a limitation as stringent as that, preventing people from installing estimators is hardly going to cure the problem. regards, tom lane
On Wed, Oct 13, 2010 at 12:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Oct 13, 2010 at 11:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> That's all true, but you have to consider how much the obstacle actually >>> gets in their way versus how painful it is on your end to create and >>> maintain the obstacle. I don't think this proposed patch measures up >>> very well on either end of that tradeoff. > >> I think it would behoove us to try to separate concerns about this >> particular patch from concerns about the viability of the whole >> approach. Whether or not it's useful to do X is a different question >> than whether it can be done with few enough lines of code and/or >> whether this patch actually does it well. > > I think I left the wrong impression: I'm concerned about the whole > approach. I haven't even read this particular patch lately. I think > that trying to guarantee that the planner applies independent > constraints in a particular order will be expensive, fragile, and prone > to recurring security bugs no matter how it's implemented --- unless you > do it by lobotomizing query pullup/pushdown, which seems unacceptable > from a performance standpoint. > > Just to give one example of what this patch misses (probably; as I said > I haven't read it lately), what about selectivity estimation? One of > the things we like to do when we have an otherwise-unknown function is > to try it on all the histogram members and see what percentage yield > true. That might already represent enough of an information leak to be > objectionable ... and yet, if we don't do it, the consequences for > performance could be pretty horrid because we'll have to work without > any reasonable selectivity estimate at all. There are cases where this > technique isn't applied at the moment but probably should be, which is > why I characterize the leak-prevention idea as creating future security > issues: doing that is a constraint that will have to be accounted for in > every future planner change, and we are certain to miss the implications > sometimes. My concern here is that I think row-level security is important to the future of PostgreSQL. Much of the mind-share that we have comes from being feature-rich, and there seems to be no shortage of people who are looking for row-level security. Only some of those are specifically interested in SE-Linux integration; only some of those are interested in using views as an RLS mechanism; but from a high level RLS seems to be one of those things that just keeps bubbling back up, and I think we need to find a way to have it. Unfortunately, I get the impression that you think that there's a problem not only with the approach but with any approach whatsoever to that underlying problem. I actually can't think of a mechanism for attacking RLS that doesn't involve some kind of planner hackery along the lines discussed here, but if you have a better idea (other than giving up) I'm all ears. With respect to selectivity estimation, do we have a live bug there now? I'm not exactly sure in what case we do this, but I observe that I can do EXPLAIN on a query containing a function that I don't actually have permission to call. So if the planner might go off and call that function on some data (even data that I *am* allowed to see), that seems bad. After all the function might be initiate_self_destruct_sequence(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I get the impression that you think that there's a problem not only > with the approach but with any approach whatsoever to that underlying > problem. Let's just say that the approaches proposed so far have performance and/or functionality and/or code maintenance penalties that are utterly unacceptable from the standpoint of people who don't need RLS. I don't know if there is a workable solution, but I do know I've not seen one. > With respect to selectivity estimation, do we have a live bug there > now? No, I don't believe so. Given that you'd like to get the planner to call function XYZ, you could create an operator using XYZ and attach to it one of the estimation functions that will actually call the underlying function --- but you have to have call permission on the function in order to create the operator. regards, tom lane
On Wed, Oct 20, 2010 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I get the impression that you think that there's a problem not only >> with the approach but with any approach whatsoever to that underlying >> problem. > > Let's just say that the approaches proposed so far have performance > and/or functionality and/or code maintenance penalties that are utterly > unacceptable from the standpoint of people who don't need RLS. I don't > know if there is a workable solution, but I do know I've not seen one. > >> With respect to selectivity estimation, do we have a live bug there >> now? > > No, I don't believe so. Given that you'd like to get the planner to > call function XYZ, you could create an operator using XYZ and attach to > it one of the estimation functions that will actually call the > underlying function --- but you have to have call permission on the > function in order to create the operator. But suppose the operator already exists, but I don't have permission to call the underlying function... can I get the planner to call the function for me by EXPLAIN-ing the right query? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company