Thread: [RFC] A tackle to the leaky VIEWs for RLS
As it was reported before, we have an open item about leaky VIEWs for RLS. On the talk at Ottawa, Robert suggested me to post my idea prior to submit a patch. So, I'd like to explain my idea at first. Actually I'm not familiar to optimizar details, so it needs any helps from experts of optimizar. The problem was ... * Using views for row-level access control is leaky http://archives.postgresql.org/pgsql-hackers/2009-10/msg01346.php Even if a table is unvisible from certain users without views that filter a part of tuples, it can leak to users as long as they can define their own functions. It seems to me the problem can be divided into two major parts. See the following sample tables, views and functions. postgres=# CREATE TABLE t1 (a int primary key, b text); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t1_pkey"for table "t1" CREATE TABLE postgres=# CREATE TABLE t2 (x int primary key, y text); NOTICE: CREATE TABLE / PRIMARYKEY will create implicit index "t2_pkey" for table "t2" CREATE TABLE postgres=# INSERT INTO t1 VALUES (1, 'aaa'),(2, 'bbb'), (3, 'ccc'); INSERT 0 3 postgres=# INSERT INTO t2 VALUES (1, 'xxx'), (2, 'yyy'), (3, 'zzz'); INSERT 0 3 -- We assume the security policy function needs the given integer key -- is odd number to be visible for users. -- postgres=#CREATE OR REPLACE FUNCTION f_policy(int) RETURNS bool AS 'BEGIN RETURN $1 % 2 = 1; END' LANGUAGEplpgsql; CREATE FUNCTION -- We assume a malicious user defined function raises a notice with -- given arguments. It may be possible to insert itother temp tables. -- postgres=# CREATE OR REPLACE FUNCTION f_malicious(text) RETURNS bool COST 0.0001 AS'BEGIN RAISE NOTICE ''f_malicious: %'', $1; RETURN true; END;' LANGUAGE plpgsql; CREATE FUNCTION [1] The order of scan filters to be evaluated ---------------------------------------------- The first problem is an inversion of evaluation of scan filters. postgres=# CREATE OR REPLACE VIEW v1 AS SELECT * FROM t1 WHERE f_policy(a); CREATE VIEW postgres=# EXPLAIN SELECT * FROM v1 WHERE f_malicious(b); QUERY PLAN ------------------------------------------------------- Seq Scan on t1 (cost=0.00..329.80 rows=137 width=36) Filter:(f_malicious(b) AND f_policy(a)) (2 rows) postgres=# SELECT * FROM v1 WHERE f_malicious(b); NOTICE: f_malicious: aaa NOTICE: f_malicious: bbb <-- leaky contentsNOTICE: f_malicious: ccc a | b ---+----- 1 | aaa 3 | ccc (2 rows) In this case, owner of the view expects tuples within t1 shall be filtered by f_policy() functions, so tuples with even-number shall be invisible. However, the optimizar reorders evaluation of scan filters based on the cost parameter of functions and others, then f_malicious() was invoked prior to f_policy(). It is a right approach, if functions are not malicious. But user may define a malicious purpose function. The given query is internally rewritten, then subquery will be pulled up in the optimizar logic. SELECT * FROM v1 WHERE f_malicious(b); -> SELECT * FROM (SELECT * FROM t1 WHERE f_policy(a)) v1 WHERE f_malicious(b) ->SELECT * FROM t1 WHERE f_policy(a) AND f_malicious(b) During we create a scan plan, the order_qual_clauses() computes the best order to evaluate the given WHERE clause based on the cost estimation. In this case, f_malicious() has very small cost, so order_qual_clauses() decides the f_malicious() should be invoked earlier than f_policy(). In the result, ExecScan() invokes f_malicious() with contents of scanned tuples to be invisible. I have an idea that we add FuncExpr a new field (e.g nestlevel) to remember where is originally put in the query, and prevent reordering over the nest level of subqueries. In above example, f_malicious() has nestlevel=0 because it is put on the top level. But f_policy() has nestlevel=1 because it is originally put on the second level subquery. Then, the order_qual_clauses() will check nestlevel of the scan filter prior to reorder them based on the cost estimation. Even if we have multiple nestlevels, solution will be same. A FuncExpr with larger nestlevel shall be invoked earlier than others. Please note that we only focus on user defined functions. For example, it is worth to choose index-scans instead of seq-scans, when a user provides conditions which can be indexed, as follows: SELECT * FROM v1 WHERE a = 100; -> SELECT * FROM (SELECT * FROM t1 WHERE f_policy(a)) v1 WHERE a = 100; In this case, we should scan the t1 using index with the condition of 'a = 100' prior to evaluation of f_policy(). Any operators eventually invokes a function being correctly installed, but an assumption is that we can trust operators, index access method, type input/output methods, conversions and so on, because these features have to be installed by DBA (or initdb). [2] Unexpected distribution of scan filter ------------------------------------------- Here is one other situation of leaky VIEWs for RLS. postgres=# CREATE OR REPLACE VIEW v2 AS SELECT * FROM t1 JOIN t2 ON a = x WHERE f_policy(a); CREATE VIEWpostgres=# SELECT * FROM v2; a | b | x | y ---+-----+---+----- 1 | aaa | 1 | xxx 3 | ccc | 3 | zzz (2 rows) This view intends to provide a joined virtual relation with a restriction using f_policy(). In fact, it filters out tuples with even-number key. However, we can leak information of the filtered tuples with different scenarios. postgres=# SELECT * FROM v2 WHERE f_malicious(y); NOTICE: f_malicious: xxx NOTICE: f_malicious: yyy <-- leaky contentsNOTICE: f_malicious: zzz a | b | x | y ---+-----+---+----- 1 | aaa | 1 | xxx 3 | ccc | 3 | zzz (2 rows) postgres=# EXPLAIN SELECT * FROM v2 WHERE f_malicious(y); QUERY PLAN ------------------------------------------------------------------------- Nested Loop (cost=0.00..287.63 rows=410 width=72) -> Seq Scan on t2 (cost=0.00..22.30 rows=410 width=36) Filter: f_malicious(y) -> Index Scan usingt1_pkey on t1 (cost=0.00..0.63 rows=1 width=36) Index Cond: (t1.a = t2.x) Filter: f_policy(t1.a)(6 rows) We can see the f_malicious() was distributed to the Seq-Scan on t2, not the Nested-Loop, because its arguments only depends on the t2, so the optimizar tries to distribute the scan filter into the least unit of scan. IIUC, the distribute_qual_to_rels() determines what scan-plan should have what scan-filters based on dependency of the function call. For example, the second qualifier depends on both of t1 and t2, it was distributed to outside of the join. postgres=# EXPLAIN SELECT * FROM v2 WHERE f_malicious(y) and b || y != 'aaaxxx'; QUERY PLAN ------------------------------------------------------------------------- Nested Loop (cost=0.00..289.68 rows=408 width=72) Join Filter: ((t1.b || t2.y) <> 'aaaxxx'::text) -> Seq Scan on t2 (cost=0.00..22.30rows=410 width=36) Filter: f_malicious(y) -> Index Scan using t1_pkey on t1 (cost=0.00..0.63rows=1 width=36) Index Cond: (t1.a = t2.x) Filter: f_policy(t1.a) (7 rows) My idea is similar to what I proposed at [1]. It adds a new field into RelOptInfo (or other structure?) to remember the original nestlevel of the scan, then it will be compared to nestlevel of the FuncExpr. If nestlevel of the FuncExpr is smaller than nestlevel of the RelOptInfo, it prevents to distribute the FuncExpr onto the RelOptInfo, even if the function depends on only the relation of RelOptInfo. The way to handle trusted nodes are same as [1]. If user given operators depend on only one-side of join, this idea does not prevent anything. [3] Issues of the approach --------------------------- Can we find out any other scenario that malicious user defined function allows us to leak invisible tuples to be filtered out. Of course, we can never prove software being bug-free. What I want to say is that "Please point out, if I'm missing something significant scenario". How much performance impact? Is it reasonable, or not? I'm not sure whether it is really worse in performance, or not, because it affects only user defined functions, not operators. So, it seems to me the idea does not prevent plans to boost using index-scan. It seems to me the idea on [1] does not make significant regressions, because it does not change scale of the plan. But the idea on [2] may affect to scale of the plan, if user defined function filters most of tuples within one-side tables of the join. IIRC, I was suggested to distinguish VIEWs with/without security purpose. It also seems to me an option. If a certain VIEW has its priority on row level security, it seems to me reasonable to disable a part of optimization which I introduced above. Sorry for the long description. Please any comments. -- KaiGai Kohei <kaigai@ak.jp.nec.com>
On 01/06/10 11:39, KaiGai Kohei wrote: > Any operators eventually invokes a function > being correctly installed, but an assumption is that we can trust operators, > index access method, type input/output methods, conversions and so on, because > these features have to be installed by DBA (or initdb). Operators can be created by regular users. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
(2010/06/01 18:08), Heikki Linnakangas wrote: > On 01/06/10 11:39, KaiGai Kohei wrote: >> Any operators eventually invokes a function >> being correctly installed, but an assumption is that we can trust operators, >> index access method, type input/output methods, conversions and so on, because >> these features have to be installed by DBA (or initdb). > > Operators can be created by regular users. > Oops, I missed it. Indeed, operator function is not limited to C-language functions, so regular users can create it. Apart from the topic, does it seem to you reasonable direction to tackle to the leaky VIEWs problem? Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On 01/06/10 13:04, KaiGai Kohei wrote: > Oops, I missed it. Indeed, operator function is not limited to C-language > functions, so regular users can create it. > > Apart from the topic, does it seem to you reasonable direction to tackle to > the leaky VIEWs problem? Yeah, I guess it is. The general problem is that it seems like a nightmare to maintain this throughout the planner. Who knows what optimizations this affects, and do we need to hide things like row-counts in EXPLAIN output? If we try to be very strict, we can expect a stream of CVEs and security releases in the future while we find holes and plug them. On the other hand, using views to restrict access to underlying tables is a very useful feature, so I'd hate to just give up. We need to decide what level of isolation we try to accomplish. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Jun 1, 2010, at 10:39 , KaiGai Kohei wrote: > I have an idea that we add FuncExpr a new field (e.g nestlevel) to remember > where is originally put in the query, and prevent reordering over the nest > level of subqueries. > In above example, f_malicious() has nestlevel=0 because it is put on the top > level. > But f_policy() has nestlevel=1 because it is originally put on the second > level subquery. Then, the order_qual_clauses() will check nestlevel of the > scan filter prior to reorder them based on the cost estimation. > Even if we have multiple nestlevels, solution will be same. A FuncExpr with > larger nestlevel shall be invoked earlier than others. Wouldn't the information leak go away if you stuck "OFFSET 0" at the end of the view? IIRC, that is the semi-offical wayto create barriers for subquery flattening and such. best regards, Florian Pflug
2010/6/1 KaiGai Kohei <kaigai@ak.jp.nec.com>: > I have an idea that we add FuncExpr a new field (e.g nestlevel) to remember > where is originally put in the query, and prevent reordering over the nest > level of subqueries. > In above example, f_malicious() has nestlevel=0 because it is put on the top > level. > But f_policy() has nestlevel=1 because it is originally put on the second > level subquery. Then, the order_qual_clauses() will check nestlevel of the > scan filter prior to reorder them based on the cost estimation. > Even if we have multiple nestlevels, solution will be same. A FuncExpr with > larger nestlevel shall be invoked earlier than others. [...] > My idea is similar to what I proposed at [1]. It adds a new field into > RelOptInfo (or other structure?) to remember the original nestlevel of > the scan, then it will be compared to nestlevel of the FuncExpr. > If nestlevel of the FuncExpr is smaller than nestlevel of the RelOptInfo, > it prevents to distribute the FuncExpr onto the RelOptInfo, even if the > function depends on only the relation of RelOptInfo. Keep in mind that users who are NOT using a view as a security barrier don't expect it to kill performance. This approach, and particularly the second part, about preventing quals from being pushed through joins, has the potential to be a performance disaster. So I think it's absolutely critical that we don't do that except when it's necessary to prevent a security issue. On the technical side, I am pretty doubtful that the approach of adding a nestlevel to FuncExpr and RelOptInfo is the right way to go. I believe we have existing code (to handle left joins) that prevents quals from being pushed down too far by fudging the set of relations that are supposedly needed to evaluate the qual. I suspect a similar approach would work here. I think the steps here are: 1. Decide whether the view is a security barrier (perhaps, check whether the user has sufficient privs to execute the underlying query; or we could add an explicit setting). If not, stop. 2. Decide whether each qual executes potentially untrusted code (algorithm?). 3. Prevent any untrusted quals from being pushed down into view that is a security barrier. We should have a design for each of these before we start coding. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
2010/6/1 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>: > On 01/06/10 11:39, KaiGai Kohei wrote: >> Any operators eventually invokes a function >> being correctly installed, but an assumption is that we can trust operators, >> index access method, type input/output methods, conversions and so on, because >> these features have to be installed by DBA (or initdb). > > Operators can be created by regular users. So I think we don't actually have to worry about operators and functions which allow us to use an index scan. If they're used in an index definition then the definer of those functions can see the entire table anyways. The only place where this matters, at least to a first degree, is on the filter operations applied to a scan. If the view isn't owned by the current user then all the filters of the view have to be enforced first then the query filters. Heikki's point is still valid though. Consider if it's not a matter of filter ordering but rather that a filter is being pushed down inside a join. If the join is from the view then it would be unsafe to filter the rows before seeing which rows match the join... unless we can prove all the rows survive... It would really suck not to do this optimization too if for example you have a filter which filters all but a single row and then join against a large table... -- greg
2010/6/1 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>: > The general problem is that it seems like a nightmare to maintain this > throughout the planner. Who knows what optimizations this affects, and > do we need to hide things like row-counts in EXPLAIN output? If we try > to be very strict, we can expect a stream of CVEs and security releases > in the future while we find holes and plug them. On the other hand, > using views to restrict access to underlying tables is a very useful > feature, so I'd hate to just give up. We need to decide what level of > isolation we try to accomplish. I'm entirely uninspired by the idea of trying to prevent all possible leakage of information via side-channels. Even if we tried to obfuscate the explain output, the user can still potentially get some information by timing how long queries take to execute. I think if we have a hook function that can prevent certain users from running EXPLAIN altogether (and I believe this may already be the case) that's about the appropriate level of worrying about that case. I think the only thing that we can realistically prevent is allowing users to make off with the actual tuples. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Greg Stark <gsstark@mit.edu> writes: > Heikki's point is still valid though. Consider if it's not a matter of > filter ordering but rather that a filter is being pushed down inside a > join. If the join is from the view then it would be unsafe to filter > the rows before seeing which rows match the join... unless we can > prove all the rows survive... It would really suck not to do this > optimization too if for example you have a filter which filters all > but a single row and then join against a large table... Well, more generally, any restriction whatsoever that is placed on the current planner behavior in the name of security will result in catastrophic performance degradation for some queries. I agree with Robert's nearby comments that we need to be selective about which views we do this to and which functions we distrust. CREATE SECURITY VIEW, anyone? regards, tom lane
Also incidentally I'm having trouble imagining a scenario where this really matters. For it to be an issue you would have to simultaneously have a user which can't access all the data and must go through views which limit the data he can access -- and has privileges to issue DDL to create functions and operators. That seems like an unlikely combination. I've seen views used before to restrict the role accounts used by front-end applications but those accounts have no DDL privileges.
* Greg Stark (gsstark@mit.edu) wrote: > Also incidentally I'm having trouble imagining a scenario where this > really matters. For it to be an issue you would have to simultaneously > have a user which can't access all the data and must go through views > which limit the data he can access -- and has privileges to issue DDL > to create functions and operators. That seems like an unlikely > combination. I've seen views used before to restrict the role accounts > used by front-end applications but those accounts have no DDL > privileges. Erm, I have to disagree with this in general.. We don't all just build web apps. On multi-user databases, this really isn't that uncommon. I'm not saying it's an everyday kind of thing, but I don't think this issue is something we can just ignore either. Thanks, Stephen
On Tue, Jun 1, 2010 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > CREATE SECURITY VIEW, anyone? That may be the best approach, but I think it needs more than one line of exposition. The approach I proposed was to test whether the user has privileges to execute the underlying query directly without going through the view. If so, we needn't be concerned. If not, then we start thinking about which functions/operators we trust. The only disadvantage to that approach that I see is that it introduces some extra permission-checking overhead. If having an explicit notion of which views are intended to be security views enables us to reduce or eliminate that overhead, it may be worth doing. But I'm not sure that it does. A blanket rule that says "don't push untrusted quals into security views" is not going to be too satisfactory - we probably want to do that only when the view is queried by an untrusted user - the superuser, or someone else with adequate permissions, will presumably still want his quals to get pushed down. Perhaps there is some value to having a knob that goes the opposite directions and essentially says "I don't really care whether this view is leaky from a security perspective". But presumably we don't want to deliver that behavior by default and require the user to ask for a SECURITY VIEW to get something else - if anything, we'd want CREATE VIEW to create the normal (secure) version and add CREATE LEAKY VIEW to do the other thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jun 1, 2010 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> CREATE SECURITY VIEW, anyone? > That may be the best approach, but I think it needs more than one line > of exposition. The approach I proposed was to test whether the user > has privileges to execute the underlying query directly without going > through the view. If so, we needn't be concerned. If not, then we > start thinking about which functions/operators we trust. Ummm ... that makes semantics dependent on the permissions available at plan time, whereas what should matter is the permissions that exist at execution time. Maybe that's all right for this context but it doesn't seem tremendously desirable. > Perhaps there is some value to having a knob that goes the opposite > directions and essentially says "I don't really care whether this view > is leaky from a security perspective". But presumably we don't want > to deliver that behavior by default and require the user to ask for a > SECURITY VIEW to get something else - if anything, we'd want CREATE > VIEW to create the normal (secure) version and add CREATE LEAKY VIEW > to do the other thing. -1 on that. We will get far more pushback from people whose application performance suddenly went to hell than we will ever get approval from people who actually need the feature. Considering that we've survived this long with leaky views, that should definitely remain the default behavior. regards, tom lane
On Tue, Jun 1, 2010 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Jun 1, 2010 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> CREATE SECURITY VIEW, anyone? > >> That may be the best approach, but I think it needs more than one line >> of exposition. The approach I proposed was to test whether the user >> has privileges to execute the underlying query directly without going >> through the view. If so, we needn't be concerned. If not, then we >> start thinking about which functions/operators we trust. > > Ummm ... that makes semantics dependent on the permissions available at > plan time, whereas what should matter is the permissions that exist at > execution time. Maybe that's all right for this context but it doesn't > seem tremendously desirable. Ugh. I hope there's a way around that problem because AFAICS the alternative is a world of hurt. If we're not allowed to take the security context into account during planning, then we're going to have to make worst-case assumptions, which sounds really unpleasant. >> Perhaps there is some value to having a knob that goes the opposite >> directions and essentially says "I don't really care whether this view >> is leaky from a security perspective". But presumably we don't want >> to deliver that behavior by default and require the user to ask for a >> SECURITY VIEW to get something else - if anything, we'd want CREATE >> VIEW to create the normal (secure) version and add CREATE LEAKY VIEW >> to do the other thing. > > -1 on that. We will get far more pushback from people whose application > performance suddenly went to hell than we will ever get approval from > people who actually need the feature. Considering that we've survived > this long with leaky views, that should definitely remain the default > behavior. Eh, if that's the consensus, it doesn't bother me that much, but it doesn't really answer the question, either: supposing we add an explicit concept of a security view, what should its semantics be? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Tue, Jun 1, 2010 at 1:28 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jun 1, 2010 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Tue, Jun 1, 2010 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> CREATE SECURITY VIEW, anyone? >> >>> That may be the best approach, but I think it needs more than one line >>> of exposition. The approach I proposed was to test whether the user >>> has privileges to execute the underlying query directly without going >>> through the view. If so, we needn't be concerned. If not, then we >>> start thinking about which functions/operators we trust. >> >> Ummm ... that makes semantics dependent on the permissions available at >> plan time, whereas what should matter is the permissions that exist at >> execution time. Maybe that's all right for this context but it doesn't >> seem tremendously desirable. > > Ugh. I hope there's a way around that problem because AFAICS the > alternative is a world of hurt. If we're not allowed to take the > security context into account during planning, then we're going to > have to make worst-case assumptions, which sounds really unpleasant. > >>> Perhaps there is some value to having a knob that goes the opposite >>> directions and essentially says "I don't really care whether this view >>> is leaky from a security perspective". But presumably we don't want >>> to deliver that behavior by default and require the user to ask for a >>> SECURITY VIEW to get something else - if anything, we'd want CREATE >>> VIEW to create the normal (secure) version and add CREATE LEAKY VIEW >>> to do the other thing. >> >> -1 on that. We will get far more pushback from people whose application >> performance suddenly went to hell than we will ever get approval from >> people who actually need the feature. Considering that we've survived >> this long with leaky views, that should definitely remain the default >> behavior. > > Eh, if that's the consensus, it doesn't bother me that much, but it > doesn't really answer the question, either: supposing we add an > explicit concept of a security view, what should its semantics be? have you ruled out: 'create function'? :-) merlin
On Tue, Jun 1, 2010 at 4:10 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Tue, Jun 1, 2010 at 1:28 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, Jun 1, 2010 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Robert Haas <robertmhaas@gmail.com> writes: >>>> On Tue, Jun 1, 2010 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>> CREATE SECURITY VIEW, anyone? >>> >>>> That may be the best approach, but I think it needs more than one line >>>> of exposition. The approach I proposed was to test whether the user >>>> has privileges to execute the underlying query directly without going >>>> through the view. If so, we needn't be concerned. If not, then we >>>> start thinking about which functions/operators we trust. >>> >>> Ummm ... that makes semantics dependent on the permissions available at >>> plan time, whereas what should matter is the permissions that exist at >>> execution time. Maybe that's all right for this context but it doesn't >>> seem tremendously desirable. >> >> Ugh. I hope there's a way around that problem because AFAICS the >> alternative is a world of hurt. If we're not allowed to take the >> security context into account during planning, then we're going to >> have to make worst-case assumptions, which sounds really unpleasant. >> >>>> Perhaps there is some value to having a knob that goes the opposite >>>> directions and essentially says "I don't really care whether this view >>>> is leaky from a security perspective". But presumably we don't want >>>> to deliver that behavior by default and require the user to ask for a >>>> SECURITY VIEW to get something else - if anything, we'd want CREATE >>>> VIEW to create the normal (secure) version and add CREATE LEAKY VIEW >>>> to do the other thing. >>> >>> -1 on that. We will get far more pushback from people whose application >>> performance suddenly went to hell than we will ever get approval from >>> people who actually need the feature. Considering that we've survived >>> this long with leaky views, that should definitely remain the default >>> behavior. >> >> Eh, if that's the consensus, it doesn't bother me that much, but it >> doesn't really answer the question, either: supposing we add an >> explicit concept of a security view, what should its semantics be? > > have you ruled out: 'create function'? :-) You lost me... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Tue, Jun 1, 2010 at 4:57 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jun 1, 2010 at 4:10 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> have you ruled out: 'create function'? :-) > > You lost me... Well, as noted by the OP, using views for security in postgres is simply wishful thinking. This is part of a family of issues (generally not evil nor fixable) under the category of 'there is no real control over when functions in a query fire'. My point was that in cases where users expect this behavior, why not encourage them to use functions instead of views? Is there any formal expectation that views can be used to hide data in this way? Does this really have to be fixed, and if so should it be in light of the fact that our rule system is basically understood to be broken? merlin
(2010/06/01 22:16), Robert Haas wrote: > 2010/6/1 Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>: >> The general problem is that it seems like a nightmare to maintain this >> throughout the planner. Who knows what optimizations this affects, and >> do we need to hide things like row-counts in EXPLAIN output? If we try >> to be very strict, we can expect a stream of CVEs and security releases >> in the future while we find holes and plug them. On the other hand, >> using views to restrict access to underlying tables is a very useful >> feature, so I'd hate to just give up. We need to decide what level of >> isolation we try to accomplish. > > I'm entirely uninspired by the idea of trying to prevent all possible > leakage of information via side-channels. Even if we tried to > obfuscate the explain output, the user can still potentially get some > information by timing how long queries take to execute. I think if we > have a hook function that can prevent certain users from running > EXPLAIN altogether (and I believe this may already be the case) that's > about the appropriate level of worrying about that case. I think the > only thing that we can realistically prevent is allowing users to make > off with the actual tuples. > It is a good point, I think. Even if we can guess or estimate something from circumstances, it does not allow unprivileged users to read information directly. In fact, we cannot find such a functionality to prevent side-channel leaks from the certification reports of commercial RDBMS with RLS (e.g; Oracle Label Security). However, the leaky VIEWs has a different characteristic. It unintentionally allows to fetch contents of invisible tuples and move into into other tables. It means here is a data flow channel (not side channel), but it breaks restriction of security views. Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
(2010/06/01 22:09), Robert Haas wrote: > 2010/6/1 KaiGai Kohei<kaigai@ak.jp.nec.com>: >> I have an idea that we add FuncExpr a new field (e.g nestlevel) to remember >> where is originally put in the query, and prevent reordering over the nest >> level of subqueries. >> In above example, f_malicious() has nestlevel=0 because it is put on the top >> level. >> But f_policy() has nestlevel=1 because it is originally put on the second >> level subquery. Then, the order_qual_clauses() will check nestlevel of the >> scan filter prior to reorder them based on the cost estimation. >> Even if we have multiple nestlevels, solution will be same. A FuncExpr with >> larger nestlevel shall be invoked earlier than others. > [...] >> My idea is similar to what I proposed at [1]. It adds a new field into >> RelOptInfo (or other structure?) to remember the original nestlevel of >> the scan, then it will be compared to nestlevel of the FuncExpr. >> If nestlevel of the FuncExpr is smaller than nestlevel of the RelOptInfo, >> it prevents to distribute the FuncExpr onto the RelOptInfo, even if the >> function depends on only the relation of RelOptInfo. > > Keep in mind that users who are NOT using a view as a security barrier > don't expect it to kill performance. This approach, and particularly > the second part, about preventing quals from being pushed through > joins, has the potential to be a performance disaster. So I think > it's absolutely critical that we don't do that except when it's > necessary to prevent a security issue. > Yes, I agree. It is necessary to distinguish security conscious views and others. In general, only creator of the view knows its intention correctly, so I think it is reasonable suggestion to provide a hint whether we should prevent a part of optimization, or not. > On the technical side, I am pretty doubtful that the approach of > adding a nestlevel to FuncExpr and RelOptInfo is the right way to go. > I believe we have existing code (to handle left joins) that prevents > quals from being pushed down too far by fudging the set of relations > that are supposedly needed to evaluate the qual. I suspect a similar > approach would work here. > > I think the steps here are: > > 1. Decide whether the view is a security barrier (perhaps, check > whether the user has sufficient privs to execute the underlying query; > or we could add an explicit setting). If not, stop. I'm a fun of an explicit setting. Queries are optimized prior to execution stage. > 2. Decide whether each qual executes potentially untrusted code (algorithm?). A simple idea is to assume all the FuncExpr being potentially untrusted as a starting up of the fix. (Can we trust all the built-in functions? It needs to ensure they don't have any side-effects; in future versions also.) > 3. Prevent any untrusted quals from being pushed down into view that > is a security barrier. > > We should have a design for each of these before we start coding. > Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
(2010/06/02 2:28), Robert Haas wrote: > On Tue, Jun 1, 2010 at 1:02 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Robert Haas<robertmhaas@gmail.com> writes: >>> On Tue, Jun 1, 2010 at 10:57 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>>> CREATE SECURITY VIEW, anyone? >> >>> That may be the best approach, but I think it needs more than one line >>> of exposition. The approach I proposed was to test whether the user >>> has privileges to execute the underlying query directly without going >>> through the view. If so, we needn't be concerned. If not, then we >>> start thinking about which functions/operators we trust. >> >> Ummm ... that makes semantics dependent on the permissions available at >> plan time, whereas what should matter is the permissions that exist at >> execution time. Maybe that's all right for this context but it doesn't >> seem tremendously desirable. > > Ugh. I hope there's a way around that problem because AFAICS the > alternative is a world of hurt. If we're not allowed to take the > security context into account during planning, then we're going to > have to make worst-case assumptions, which sounds really unpleasant. > I was reminded that inline_set_returning_function() tries to extract a given RangeTblEntry with RTE_FUNCTION into a subquery when a few conditions are satisfied. The conditions include whether user has privileges to execute the function. It seems to me planner checks permissions, and going to have worst case assumptions, if access privilege violations. As long as we can resolve the problem that I described at [1] (order of evaluation of scan filters), it seems to me a reasonable fallback. (Although I mentioned that queries are optimized prior to execution stage...) >>> Perhaps there is some value to having a knob that goes the opposite >>> directions and essentially says "I don't really care whether this view >>> is leaky from a security perspective". But presumably we don't want >>> to deliver that behavior by default and require the user to ask for a >>> SECURITY VIEW to get something else - if anything, we'd want CREATE >>> VIEW to create the normal (secure) version and add CREATE LEAKY VIEW >>> to do the other thing. >> >> -1 on that. We will get far more pushback from people whose application >> performance suddenly went to hell than we will ever get approval from >> people who actually need the feature. Considering that we've survived >> this long with leaky views, that should definitely remain the default >> behavior. > > Eh, if that's the consensus, it doesn't bother me that much, but it > doesn't really answer the question, either: supposing we add an > explicit concept of a security view, what should its semantics be? > How about a GUC option to provide the default, like default_with_oids? Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
2010/6/1 KaiGai Kohei <kaigai@ak.jp.nec.com>: >> Eh, if that's the consensus, it doesn't bother me that much, but it >> doesn't really answer the question, either: supposing we add an >> explicit concept of a security view, what should its semantics be? > > How about a GUC option to provide the default, like default_with_oids? Bad idea. We already have enough problems with GUCs that can create security problems if they're unexpectedly set to the wrong value. We don't need any more. Anyhow, that's trivia. The real thing we need to decide here is to design the security mechanism. We can change the syntax to whatever we want very easily. Here's another thought. If we're leaning toward explicit syntax to designate security views (and I do mean IF, since only one person has signed on to that, even if it is Tom Lane!), then maybe we should think about ripping out the logic that causes regular views to be evaluated using the credentials of the view owner rather than the person selecting from it. A security view would still use that logic, plus whatever additional stuff we come up with to prevent leakage. Perhaps this would be viewed as a nasty backward compatibility break, but the upside is that we'd then be being absolutely clear that a non-security view isn't and can never be trusted to be a security barrier. Right now we're shipping something that purports to act as a barrier but really doesn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
(2010/06/02 10:52), Robert Haas wrote: > 2010/6/1 KaiGai Kohei<kaigai@ak.jp.nec.com>: >>> Eh, if that's the consensus, it doesn't bother me that much, but it >>> doesn't really answer the question, either: supposing we add an >>> explicit concept of a security view, what should its semantics be? >> >> How about a GUC option to provide the default, like default_with_oids? > > Bad idea. We already have enough problems with GUCs that can create > security problems if they're unexpectedly set to the wrong value. We > don't need any more. Anyhow, that's trivia. The real thing we need > to decide here is to design the security mechanism. We can change the > syntax to whatever we want very easily. > Indeed, syntax will be decided according to the logic. > Here's another thought. If we're leaning toward explicit syntax to > designate security views (and I do mean IF, since only one person has > signed on to that, even if it is Tom Lane!), then maybe we should > think about ripping out the logic that causes regular views to be > evaluated using the credentials of the view owner rather than the > person selecting from it. A security view would still use that logic, > plus whatever additional stuff we come up with to prevent leakage. > Perhaps this would be viewed as a nasty backward compatibility break, > but the upside is that we'd then be being absolutely clear that a > non-security view isn't and can never be trusted to be a security > barrier. Right now we're shipping something that purports to act as a > barrier but really doesn't. > Sorry, should we make clear the purpose of explicit syntax for security views being issued now? In my understanding, if security views, the planner tries to checks privileges of the person selecting it to reference underlaying tables without any ereport. If violated, the planner prevents user given quals (perhaps FuncExpr only?) to come into inside of the join scan. Otherwise, if regular views, the planner works as is. Right? I don't think we need whatever additional user visible stuff to prevent leakage except for fully optimized query plan. (Of course, it can make performance regression.) It seems to me the issue is just an order to execute user defined functions and qualifier of security views to restrict visible tuples, so I don't know whether it breaks any backward compatibility. Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
(2010/06/02 12:02), KaiGai Kohei wrote: >> Here's another thought. If we're leaning toward explicit syntax to >> designate security views (and I do mean IF, since only one person has >> signed on to that, even if it is Tom Lane!), then maybe we should >> think about ripping out the logic that causes regular views to be >> evaluated using the credentials of the view owner rather than the >> person selecting from it. A security view would still use that logic, >> plus whatever additional stuff we come up with to prevent leakage. >> Perhaps this would be viewed as a nasty backward compatibility break, >> but the upside is that we'd then be being absolutely clear that a >> non-security view isn't and can never be trusted to be a security >> barrier. Right now we're shipping something that purports to act as a >> barrier but really doesn't. >> > > Sorry, should we make clear the purpose of explicit syntax for security > views being issued now? > In my understanding, if security views, the planner tries to checks > privileges of the person selecting it to reference underlaying tables > without any ereport. If violated, the planner prevents user given > quals (perhaps FuncExpr only?) to come into inside of the join scan. > Otherwise, if regular views, the planner works as is. Right? > I'd like to check up the problem in details. We can sort out a view into three types, as follows: (1) A view which cannot be pulled up (2) A view which can be pulled up, but does not contain any joins (3) A view which can be pulled up, and contains joins. For the type (1), we don't need to handle anything special, because no external quals are unavailable to come into. For the type (2), we also don't need to handle anything special except for the evaluation order matter in ExecQual(), because it is impossible to distribute external quals into a part of relations. So, the type (3) is all we have to consider here. Right? Then, where should we distinguish a security view and others? At least, it should be decided at pull_up_subqueries() or earlier, because it merges subqueries into the upper query. In subquery_planner(), the pull_up_subqueries() is called just after inline_set_returning_functions() which makes RTE_FUNCTION entry flatten if available. It seems to me not a strange logic to check privileges on underlaying relations in pull_up_subqueries(), because inline_set_returning_functions() also checks ACL_EXECUTE here on the functions to be flatten. Then, once we can identify what is a security view and not, it shall be marked on somewhere. Maybe, FromExpr of the pulled-up subquery? In my current idea, when a qual-node that contains FuncExpr tries to reference a part of relations within a security view, its referencing relations will be expanded to whole of the security view at distribute_qual_to_rels(). Then, it will prevent a user defined function to come into inside of security views. At least, it seems to me reasonable to implement. Shall I launch to develop a proof of concept patch? Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
I summarized the series of discussion at: http://wiki.postgresql.org/wiki/RLS#Issue:_A_leaky_VIEWs_for_RLS Please point out, if I missed or misunderstood something. Thanks, (2010/06/03 11:36), KaiGai Kohei wrote: > (2010/06/02 12:02), KaiGai Kohei wrote: >>> Here's another thought. If we're leaning toward explicit syntax to >>> designate security views (and I do mean IF, since only one person has >>> signed on to that, even if it is Tom Lane!), then maybe we should >>> think about ripping out the logic that causes regular views to be >>> evaluated using the credentials of the view owner rather than the >>> person selecting from it. A security view would still use that logic, >>> plus whatever additional stuff we come up with to prevent leakage. >>> Perhaps this would be viewed as a nasty backward compatibility break, >>> but the upside is that we'd then be being absolutely clear that a >>> non-security view isn't and can never be trusted to be a security >>> barrier. Right now we're shipping something that purports to act as a >>> barrier but really doesn't. >>> >> >> Sorry, should we make clear the purpose of explicit syntax for security >> views being issued now? >> In my understanding, if security views, the planner tries to checks >> privileges of the person selecting it to reference underlaying tables >> without any ereport. If violated, the planner prevents user given >> quals (perhaps FuncExpr only?) to come into inside of the join scan. >> Otherwise, if regular views, the planner works as is. Right? >> > > I'd like to check up the problem in details. > > We can sort out a view into three types, as follows: > (1) A view which cannot be pulled up > (2) A view which can be pulled up, but does not contain any joins > (3) A view which can be pulled up, and contains joins. > > For the type (1), we don't need to handle anything special, because > no external quals are unavailable to come into. > > For the type (2), we also don't need to handle anything special > except for the evaluation order matter in ExecQual(), because it is > impossible to distribute external quals into a part of relations. > > So, the type (3) is all we have to consider here. Right? > > > Then, where should we distinguish a security view and others? > At least, it should be decided at pull_up_subqueries() or earlier, > because it merges subqueries into the upper query. > In subquery_planner(), the pull_up_subqueries() is called just after > inline_set_returning_functions() which makes RTE_FUNCTION entry flatten > if available. It seems to me not a strange logic to check privileges > on underlaying relations in pull_up_subqueries(), because > inline_set_returning_functions() also checks ACL_EXECUTE here on the > functions to be flatten. > > > Then, once we can identify what is a security view and not, it shall > be marked on somewhere. Maybe, FromExpr of the pulled-up subquery? > > > In my current idea, when a qual-node that contains FuncExpr tries to > reference a part of relations within a security view, its referencing > relations will be expanded to whole of the security view at > distribute_qual_to_rels(). > Then, it will prevent a user defined function to come into inside of > security views. > > > At least, it seems to me reasonable to implement. > Shall I launch to develop a proof of concept patch? > > Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>