Thread: [RFC] A tackle to the leaky VIEWs for RLS

[RFC] A tackle to the leaky VIEWs for RLS

From
KaiGai Kohei
Date:
As it was reported before, we have an open item about leaky VIEWs for RLS.

On the talk at Ottawa, Robert suggested me to post my idea prior to submit
a patch. So, I'd like to explain my idea at first.
Actually I'm not familiar to optimizar details, so it needs any helps from
experts of optimizar.


The problem was ...
 * Using views for row-level access control is leaky http://archives.postgresql.org/pgsql-hackers/2009-10/msg01346.php

Even if a table is unvisible from certain users without views that filter
a part of tuples, it can leak to users as long as they can define their own
functions.

It seems to me the problem can be divided into two major parts.

See the following sample tables, views and functions.
 postgres=# CREATE TABLE t1 (a int primary key, b text); NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"t1_pkey"for table "t1" CREATE TABLE postgres=# CREATE TABLE t2 (x int primary key, y text); NOTICE:  CREATE TABLE /
PRIMARYKEY will create implicit index "t2_pkey" for table "t2" CREATE TABLE postgres=# INSERT INTO t1 VALUES (1,
'aaa'),(2, 'bbb'), (3, 'ccc'); INSERT 0 3 postgres=# INSERT INTO t2 VALUES (1, 'xxx'), (2, 'yyy'), (3, 'zzz'); INSERT 0
3
 -- We assume the security policy function needs the given integer key -- is odd number to be visible for users. --
postgres=#CREATE OR REPLACE FUNCTION f_policy(int) RETURNS bool                AS 'BEGIN RETURN $1 % 2 = 1; END'
LANGUAGEplpgsql; CREATE FUNCTION
 
 -- We assume a malicious user defined function raises a notice with -- given arguments. It may be possible to insert
itother temp tables. -- postgres=# CREATE OR REPLACE FUNCTION f_malicious(text) RETURNS bool COST 0.0001
AS'BEGIN RAISE NOTICE ''f_malicious: %'', $1; RETURN true; END;' LANGUAGE plpgsql; CREATE FUNCTION
 

[1] The order of scan filters to be evaluated
----------------------------------------------
The first problem is an inversion of evaluation of scan filters.
 postgres=# CREATE OR REPLACE VIEW v1 AS SELECT * FROM t1 WHERE f_policy(a); CREATE VIEW
 postgres=# EXPLAIN SELECT * FROM v1 WHERE f_malicious(b);                       QUERY PLAN
------------------------------------------------------- Seq Scan on t1  (cost=0.00..329.80 rows=137 width=36)
Filter:(f_malicious(b) AND f_policy(a)) (2 rows)
 
 postgres=# SELECT * FROM v1 WHERE f_malicious(b); NOTICE:  f_malicious: aaa NOTICE:  f_malicious: bbb  <-- leaky
contentsNOTICE:  f_malicious: ccc  a |  b ---+-----  1 | aaa  3 | ccc (2 rows)
 

In this case, owner of the view expects tuples within t1 shall be filtered
by f_policy() functions, so tuples with even-number shall be invisible.
However, the optimizar reorders evaluation of scan filters based on the cost
parameter of functions and others, then f_malicious() was invoked prior to
f_policy(). It is a right approach, if functions are not malicious.
But user may define a malicious purpose function.

The given query is internally rewritten, then subquery will be pulled up
in the optimizar logic.
 SELECT * FROM v1 WHERE f_malicious(b); -> SELECT * FROM (SELECT * FROM t1 WHERE f_policy(a)) v1 WHERE f_malicious(b)
->SELECT * FROM t1 WHERE f_policy(a) AND f_malicious(b)
 

During we create a scan plan, the order_qual_clauses() computes the best
order to evaluate the given WHERE clause based on the cost estimation.
In this case, f_malicious() has very small cost, so order_qual_clauses()
decides the f_malicious() should be invoked earlier than f_policy().
In the result, ExecScan() invokes f_malicious() with contents of scanned
tuples to be invisible.

I have an idea that we add FuncExpr a new field (e.g nestlevel) to remember
where is originally put in the query, and prevent reordering over the nest
level of subqueries.
In above example, f_malicious() has nestlevel=0 because it is put on the top
level.
But f_policy() has nestlevel=1 because it is originally put on the second
level subquery. Then, the order_qual_clauses() will check nestlevel of the
scan filter prior to reorder them based on the cost estimation.
Even if we have multiple nestlevels, solution will be same. A FuncExpr with
larger nestlevel shall be invoked earlier than others.

Please note that we only focus on user defined functions.
For example, it is worth to choose index-scans instead of seq-scans, when
a user provides conditions which can be indexed, as follows:
 SELECT * FROM v1 WHERE a = 100; -> SELECT * FROM (SELECT * FROM t1 WHERE f_policy(a)) v1 WHERE a = 100;

In this case, we should scan the t1 using index with the condition of 'a = 100'
prior to evaluation of f_policy(). Any operators eventually invokes a function
being correctly installed, but an assumption is that we can trust operators,
index access method, type input/output methods, conversions and so on, because
these features have to be installed by DBA (or initdb).


[2] Unexpected distribution of scan filter
-------------------------------------------

Here is one other situation of leaky VIEWs for RLS.
 postgres=# CREATE OR REPLACE VIEW v2 AS                SELECT * FROM t1 JOIN t2 ON a = x WHERE f_policy(a); CREATE
VIEWpostgres=# SELECT * FROM v2;  a |  b  | x |  y ---+-----+---+-----  1 | aaa | 1 | xxx  3 | ccc | 3 | zzz (2 rows)
 

This view intends to provide a joined virtual relation with a restriction using
f_policy(). In fact, it filters out tuples with even-number key.
However, we can leak information of the filtered tuples with different scenarios.
 postgres=# SELECT * FROM v2 WHERE f_malicious(y); NOTICE:  f_malicious: xxx NOTICE:  f_malicious: yyy <-- leaky
contentsNOTICE:  f_malicious: zzz  a |  b  | x |  y ---+-----+---+-----  1 | aaa | 1 | xxx  3 | ccc | 3 | zzz (2 rows)
 
 postgres=# EXPLAIN SELECT * FROM v2 WHERE f_malicious(y);                                QUERY PLAN
------------------------------------------------------------------------- Nested Loop  (cost=0.00..287.63 rows=410
width=72)   ->  Seq Scan on t2  (cost=0.00..22.30 rows=410 width=36)          Filter: f_malicious(y)    ->  Index Scan
usingt1_pkey on t1  (cost=0.00..0.63 rows=1 width=36)          Index Cond: (t1.a = t2.x)          Filter:
f_policy(t1.a)(6 rows)
 

We can see the f_malicious() was distributed to the Seq-Scan on t2, not the
Nested-Loop, because its arguments only depends on the t2, so the optimizar
tries to distribute the scan filter into the least unit of scan.

IIUC, the distribute_qual_to_rels() determines what scan-plan should have
what scan-filters based on dependency of the function call.

For example, the second qualifier depends on both of t1 and t2, it was
distributed to outside of the join. postgres=# EXPLAIN SELECT * FROM v2 WHERE f_malicious(y) and b || y != 'aaaxxx';
                           QUERY PLAN -------------------------------------------------------------------------  Nested
Loop (cost=0.00..289.68 rows=408 width=72)    Join Filter: ((t1.b || t2.y) <> 'aaaxxx'::text)    ->  Seq Scan on t2
(cost=0.00..22.30rows=410 width=36)          Filter: f_malicious(y)    ->  Index Scan using t1_pkey on t1
(cost=0.00..0.63rows=1 width=36)          Index Cond: (t1.a = t2.x)          Filter: f_policy(t1.a) (7 rows)
 

My idea is similar to what I proposed at [1]. It adds a new field into
RelOptInfo (or other structure?) to remember the original nestlevel of
the scan, then it will be compared to nestlevel of the FuncExpr.
If nestlevel of the FuncExpr is smaller than nestlevel of the RelOptInfo,
it prevents to distribute the FuncExpr onto the RelOptInfo, even if the
function depends on only the relation of RelOptInfo.

The way to handle trusted nodes are same as [1]. If user given operators
depend on only one-side of join, this idea does not prevent anything.


[3] Issues of the approach
---------------------------
Can we find out any other scenario that malicious user defined function
allows us to leak invisible tuples to be filtered out.
Of course, we can never prove software being bug-free. What I want to say
is that "Please point out, if I'm missing something significant scenario".

How much performance impact? Is it reasonable, or not?
I'm not sure whether it is really worse in performance, or not, because
it affects only user defined functions, not operators. So, it seems to
me the idea does not prevent plans to boost using index-scan.

It seems to me the idea on [1] does not make significant regressions,
because it does not change scale of the plan. But the idea on [2] may
affect to scale of the plan, if user defined function filters most of
tuples within one-side tables of the join.

IIRC, I was suggested to distinguish VIEWs with/without security purpose.
It also seems to me an option. If a certain VIEW has its priority on row
level security, it seems to me reasonable to disable a part of optimization
which I introduced above.

Sorry for the long description.
Please any comments.
-- 
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
Heikki Linnakangas
Date:
On 01/06/10 11:39, KaiGai Kohei wrote:
> Any operators eventually invokes a function
> being correctly installed, but an assumption is that we can trust operators,
> index access method, type input/output methods, conversions and so on, because
> these features have to be installed by DBA (or initdb).

Operators can be created by regular users.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
KaiGai Kohei
Date:
(2010/06/01 18:08), Heikki Linnakangas wrote:
> On 01/06/10 11:39, KaiGai Kohei wrote:
>> Any operators eventually invokes a function
>> being correctly installed, but an assumption is that we can trust operators,
>> index access method, type input/output methods, conversions and so on, because
>> these features have to be installed by DBA (or initdb).
> 
> Operators can be created by regular users.
> 
Oops, I missed it. Indeed, operator function is not limited to C-language
functions, so regular users can create it.

Apart from the topic, does it seem to you reasonable direction to tackle to
the leaky VIEWs problem?

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
Heikki Linnakangas
Date:
On 01/06/10 13:04, KaiGai Kohei wrote:
> Oops, I missed it. Indeed, operator function is not limited to C-language
> functions, so regular users can create it.
> 
> Apart from the topic, does it seem to you reasonable direction to tackle to
> the leaky VIEWs problem?

Yeah, I guess it is.

The general problem is that it seems like a nightmare to maintain this
throughout the planner. Who knows what optimizations this affects, and
do we need to hide things like row-counts in EXPLAIN output? If we try
to be very strict, we can expect a stream of CVEs and security releases
in the future while we find holes and plug them. On the other hand,
using views to restrict access to underlying tables is a very useful
feature, so I'd hate to just give up. We need to decide what level of
isolation we try to accomplish.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
Florian Pflug
Date:
On Jun 1, 2010, at 10:39 , KaiGai Kohei wrote:
> I have an idea that we add FuncExpr a new field (e.g nestlevel) to remember
> where is originally put in the query, and prevent reordering over the nest
> level of subqueries.
> In above example, f_malicious() has nestlevel=0 because it is put on the top
> level.
> But f_policy() has nestlevel=1 because it is originally put on the second
> level subquery. Then, the order_qual_clauses() will check nestlevel of the
> scan filter prior to reorder them based on the cost estimation.
> Even if we have multiple nestlevels, solution will be same. A FuncExpr with
> larger nestlevel shall be invoked earlier than others.

Wouldn't the information leak go away if you stuck "OFFSET 0" at the end of the view? IIRC, that is the semi-offical
wayto create barriers for subquery flattening and such. 

best regards,
Florian Pflug



Re: [RFC] A tackle to the leaky VIEWs for RLS

From
Robert Haas
Date:
2010/6/1 KaiGai Kohei <kaigai@ak.jp.nec.com>:
> I have an idea that we add FuncExpr a new field (e.g nestlevel) to remember
> where is originally put in the query, and prevent reordering over the nest
> level of subqueries.
> In above example, f_malicious() has nestlevel=0 because it is put on the top
> level.
> But f_policy() has nestlevel=1 because it is originally put on the second
> level subquery. Then, the order_qual_clauses() will check nestlevel of the
> scan filter prior to reorder them based on the cost estimation.
> Even if we have multiple nestlevels, solution will be same. A FuncExpr with
> larger nestlevel shall be invoked earlier than others.
[...]
> My idea is similar to what I proposed at [1]. It adds a new field into
> RelOptInfo (or other structure?) to remember the original nestlevel of
> the scan, then it will be compared to nestlevel of the FuncExpr.
> If nestlevel of the FuncExpr is smaller than nestlevel of the RelOptInfo,
> it prevents to distribute the FuncExpr onto the RelOptInfo, even if the
> function depends on only the relation of RelOptInfo.

Keep in mind that users who are NOT using a view as a security barrier
don't expect it to kill performance.  This approach, and particularly
the second part, about preventing quals from being pushed through
joins, has the potential to be a performance disaster.  So I think
it's absolutely critical that we don't do that except when it's
necessary to prevent a security issue.

On the technical side, I am pretty doubtful that the approach of
adding a nestlevel to FuncExpr and RelOptInfo is the right way to go.
I believe we have existing code (to handle left joins) that prevents
quals from being pushed down too far by fudging the set of relations
that are supposedly needed to evaluate the qual.  I suspect a similar
approach would work here.

I think the steps here are:

1. Decide whether the view is a security barrier (perhaps, check
whether the user has sufficient privs to execute the underlying query;
or we could add an explicit setting).  If not, stop.
2. Decide whether each qual executes potentially untrusted code (algorithm?).
3. Prevent any untrusted quals from being pushed down into view that
is a security barrier.

We should have a design for each of these before we start coding.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
Greg Stark
Date:
2010/6/1 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
> On 01/06/10 11:39, KaiGai Kohei wrote:
>> Any operators eventually invokes a function
>> being correctly installed, but an assumption is that we can trust operators,
>> index access method, type input/output methods, conversions and so on, because
>> these features have to be installed by DBA (or initdb).
>
> Operators can be created by regular users.

So I think we don't actually have to worry about operators and
functions which allow us to use an index scan. If they're used in an
index definition then the definer of those functions can see the
entire table anyways.

The only place where this matters, at least to a first degree, is on
the filter operations applied to a scan. If the view isn't owned by
the current user then all the filters of the view have to be enforced
first then the query filters.

Heikki's point is still valid though. Consider if it's not a matter of
filter ordering but rather that a filter is being pushed down inside a
join. If the join is from the view then it would be unsafe to filter
the rows before seeing which rows match the join... unless we can
prove all the rows survive... It would really suck not to do this
optimization too if for example you have a filter which filters all
but a single row and then join against a large table...


-- 
greg


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
Robert Haas
Date:
2010/6/1 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
> The general problem is that it seems like a nightmare to maintain this
> throughout the planner. Who knows what optimizations this affects, and
> do we need to hide things like row-counts in EXPLAIN output? If we try
> to be very strict, we can expect a stream of CVEs and security releases
> in the future while we find holes and plug them. On the other hand,
> using views to restrict access to underlying tables is a very useful
> feature, so I'd hate to just give up. We need to decide what level of
> isolation we try to accomplish.

I'm entirely uninspired by the idea of trying to prevent all possible
leakage of information via side-channels.  Even if we tried to
obfuscate the explain output, the user can still potentially get some
information by timing how long queries take to execute.  I think if we
have a hook function that can prevent certain users from running
EXPLAIN altogether (and I believe this may already be the case) that's
about the appropriate level of worrying about that case.  I think the
only thing that we can realistically prevent is allowing users to make
off with the actual tuples.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Heikki's point is still valid though. Consider if it's not a matter of
> filter ordering but rather that a filter is being pushed down inside a
> join. If the join is from the view then it would be unsafe to filter
> the rows before seeing which rows match the join... unless we can
> prove all the rows survive... It would really suck not to do this
> optimization too if for example you have a filter which filters all
> but a single row and then join against a large table...

Well, more generally, any restriction whatsoever that is placed on
the current planner behavior in the name of security will result in
catastrophic performance degradation for some queries.  I agree with
Robert's nearby comments that we need to be selective about which
views we do this to and which functions we distrust.

CREATE SECURITY VIEW, anyone?
        regards, tom lane


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
Greg Stark
Date:
Also incidentally I'm having trouble imagining a scenario where this
really matters. For it to be an issue you would have to simultaneously
have a user which can't access all the data and must go through views
which limit the data he can access -- and has privileges to issue DDL
to create functions and operators. That seems like an unlikely
combination. I've seen views used before to restrict the role accounts
used by front-end applications but those accounts have no DDL
privileges.


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
Stephen Frost
Date:
* Greg Stark (gsstark@mit.edu) wrote:
> Also incidentally I'm having trouble imagining a scenario where this
> really matters. For it to be an issue you would have to simultaneously
> have a user which can't access all the data and must go through views
> which limit the data he can access -- and has privileges to issue DDL
> to create functions and operators. That seems like an unlikely
> combination. I've seen views used before to restrict the role accounts
> used by front-end applications but those accounts have no DDL
> privileges.

Erm, I have to disagree with this in general..  We don't all just build
web apps.  On multi-user databases, this really isn't that uncommon.
I'm not saying it's an everyday kind of thing, but I don't think this
issue is something we can just ignore either.
Thanks,
    Stephen

Re: [RFC] A tackle to the leaky VIEWs for RLS

From
Robert Haas
Date:
On Tue, Jun 1, 2010 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> CREATE SECURITY VIEW, anyone?

That may be the best approach, but I think it needs more than one line
of exposition.  The approach I proposed was to test whether the user
has privileges to execute the underlying query directly without going
through the view.  If so, we needn't be concerned.  If not, then we
start thinking about which functions/operators we trust.

The only disadvantage to that approach that I see is that it
introduces some extra permission-checking overhead.  If having an
explicit notion of which views are intended to be security views
enables us to reduce or eliminate that overhead, it may be worth
doing.  But I'm not sure that it does.  A blanket rule that says
"don't push untrusted quals into security views" is not going to be
too satisfactory - we probably want to do that only when the view is
queried by an untrusted user - the superuser, or someone else with
adequate permissions, will presumably still want his quals to get
pushed down.

Perhaps there is some value to having a knob that goes the opposite
directions and essentially says "I don't really care whether this view
is leaky from a security perspective".  But presumably we don't want
to deliver that behavior by default and require the user to ask for a
SECURITY VIEW to get something else - if anything, we'd want CREATE
VIEW to create the normal (secure) version and add CREATE LEAKY VIEW
to do the other thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jun 1, 2010 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> CREATE SECURITY VIEW, anyone?

> That may be the best approach, but I think it needs more than one line
> of exposition.  The approach I proposed was to test whether the user
> has privileges to execute the underlying query directly without going
> through the view.  If so, we needn't be concerned.  If not, then we
> start thinking about which functions/operators we trust.

Ummm ... that makes semantics dependent on the permissions available at
plan time, whereas what should matter is the permissions that exist at
execution time.  Maybe that's all right for this context but it doesn't
seem tremendously desirable.

> Perhaps there is some value to having a knob that goes the opposite
> directions and essentially says "I don't really care whether this view
> is leaky from a security perspective".  But presumably we don't want
> to deliver that behavior by default and require the user to ask for a
> SECURITY VIEW to get something else - if anything, we'd want CREATE
> VIEW to create the normal (secure) version and add CREATE LEAKY VIEW
> to do the other thing.

-1 on that.  We will get far more pushback from people whose application
performance suddenly went to hell than we will ever get approval from
people who actually need the feature.  Considering that we've survived
this long with leaky views, that should definitely remain the default
behavior.
        regards, tom lane


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
Robert Haas
Date:
On Tue, Jun 1, 2010 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Jun 1, 2010 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> CREATE SECURITY VIEW, anyone?
>
>> That may be the best approach, but I think it needs more than one line
>> of exposition.  The approach I proposed was to test whether the user
>> has privileges to execute the underlying query directly without going
>> through the view.  If so, we needn't be concerned.  If not, then we
>> start thinking about which functions/operators we trust.
>
> Ummm ... that makes semantics dependent on the permissions available at
> plan time, whereas what should matter is the permissions that exist at
> execution time.  Maybe that's all right for this context but it doesn't
> seem tremendously desirable.

Ugh.  I hope there's a way around that problem because AFAICS the
alternative is a world of hurt.  If we're not allowed to take the
security context into account during planning, then we're going to
have to make worst-case assumptions, which sounds really unpleasant.

>> Perhaps there is some value to having a knob that goes the opposite
>> directions and essentially says "I don't really care whether this view
>> is leaky from a security perspective".  But presumably we don't want
>> to deliver that behavior by default and require the user to ask for a
>> SECURITY VIEW to get something else - if anything, we'd want CREATE
>> VIEW to create the normal (secure) version and add CREATE LEAKY VIEW
>> to do the other thing.
>
> -1 on that.  We will get far more pushback from people whose application
> performance suddenly went to hell than we will ever get approval from
> people who actually need the feature.  Considering that we've survived
> this long with leaky views, that should definitely remain the default
> behavior.

Eh, if that's the consensus, it doesn't bother me that much, but it
doesn't really answer the question, either: supposing we add an
explicit concept of a security view, what should its semantics be?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
Merlin Moncure
Date:
On Tue, Jun 1, 2010 at 1:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Jun 1, 2010 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> On Tue, Jun 1, 2010 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> CREATE SECURITY VIEW, anyone?
>>
>>> That may be the best approach, but I think it needs more than one line
>>> of exposition.  The approach I proposed was to test whether the user
>>> has privileges to execute the underlying query directly without going
>>> through the view.  If so, we needn't be concerned.  If not, then we
>>> start thinking about which functions/operators we trust.
>>
>> Ummm ... that makes semantics dependent on the permissions available at
>> plan time, whereas what should matter is the permissions that exist at
>> execution time.  Maybe that's all right for this context but it doesn't
>> seem tremendously desirable.
>
> Ugh.  I hope there's a way around that problem because AFAICS the
> alternative is a world of hurt.  If we're not allowed to take the
> security context into account during planning, then we're going to
> have to make worst-case assumptions, which sounds really unpleasant.
>
>>> Perhaps there is some value to having a knob that goes the opposite
>>> directions and essentially says "I don't really care whether this view
>>> is leaky from a security perspective".  But presumably we don't want
>>> to deliver that behavior by default and require the user to ask for a
>>> SECURITY VIEW to get something else - if anything, we'd want CREATE
>>> VIEW to create the normal (secure) version and add CREATE LEAKY VIEW
>>> to do the other thing.
>>
>> -1 on that.  We will get far more pushback from people whose application
>> performance suddenly went to hell than we will ever get approval from
>> people who actually need the feature.  Considering that we've survived
>> this long with leaky views, that should definitely remain the default
>> behavior.
>
> Eh, if that's the consensus, it doesn't bother me that much, but it
> doesn't really answer the question, either: supposing we add an
> explicit concept of a security view, what should its semantics be?

have you ruled out: 'create function'? :-)

merlin


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
Robert Haas
Date:
On Tue, Jun 1, 2010 at 4:10 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Tue, Jun 1, 2010 at 1:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Jun 1, 2010 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Robert Haas <robertmhaas@gmail.com> writes:
>>>> On Tue, Jun 1, 2010 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>> CREATE SECURITY VIEW, anyone?
>>>
>>>> That may be the best approach, but I think it needs more than one line
>>>> of exposition.  The approach I proposed was to test whether the user
>>>> has privileges to execute the underlying query directly without going
>>>> through the view.  If so, we needn't be concerned.  If not, then we
>>>> start thinking about which functions/operators we trust.
>>>
>>> Ummm ... that makes semantics dependent on the permissions available at
>>> plan time, whereas what should matter is the permissions that exist at
>>> execution time.  Maybe that's all right for this context but it doesn't
>>> seem tremendously desirable.
>>
>> Ugh.  I hope there's a way around that problem because AFAICS the
>> alternative is a world of hurt.  If we're not allowed to take the
>> security context into account during planning, then we're going to
>> have to make worst-case assumptions, which sounds really unpleasant.
>>
>>>> Perhaps there is some value to having a knob that goes the opposite
>>>> directions and essentially says "I don't really care whether this view
>>>> is leaky from a security perspective".  But presumably we don't want
>>>> to deliver that behavior by default and require the user to ask for a
>>>> SECURITY VIEW to get something else - if anything, we'd want CREATE
>>>> VIEW to create the normal (secure) version and add CREATE LEAKY VIEW
>>>> to do the other thing.
>>>
>>> -1 on that.  We will get far more pushback from people whose application
>>> performance suddenly went to hell than we will ever get approval from
>>> people who actually need the feature.  Considering that we've survived
>>> this long with leaky views, that should definitely remain the default
>>> behavior.
>>
>> Eh, if that's the consensus, it doesn't bother me that much, but it
>> doesn't really answer the question, either: supposing we add an
>> explicit concept of a security view, what should its semantics be?
>
> have you ruled out: 'create function'? :-)

You lost me...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
Merlin Moncure
Date:
On Tue, Jun 1, 2010 at 4:57 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Jun 1, 2010 at 4:10 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> have you ruled out: 'create function'? :-)
>
> You lost me...

Well, as noted by the OP, using views for security in postgres is
simply wishful thinking.  This is part of a family of issues
(generally not evil nor fixable) under the category of 'there is no
real control over when functions in a query fire'.

My point was that in cases where users expect this behavior, why not
encourage them to use functions instead of views?  Is there any formal
expectation that views can be used to hide data in this way?  Does
this really have to be fixed, and if so should it be in light of the
fact that our rule system is basically understood to be broken?

merlin


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
KaiGai Kohei
Date:
(2010/06/01 22:16), Robert Haas wrote:
> 2010/6/1 Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>:
>> The general problem is that it seems like a nightmare to maintain this
>> throughout the planner. Who knows what optimizations this affects, and
>> do we need to hide things like row-counts in EXPLAIN output? If we try
>> to be very strict, we can expect a stream of CVEs and security releases
>> in the future while we find holes and plug them. On the other hand,
>> using views to restrict access to underlying tables is a very useful
>> feature, so I'd hate to just give up. We need to decide what level of
>> isolation we try to accomplish.
> 
> I'm entirely uninspired by the idea of trying to prevent all possible
> leakage of information via side-channels.  Even if we tried to
> obfuscate the explain output, the user can still potentially get some
> information by timing how long queries take to execute.  I think if we
> have a hook function that can prevent certain users from running
> EXPLAIN altogether (and I believe this may already be the case) that's
> about the appropriate level of worrying about that case.  I think the
> only thing that we can realistically prevent is allowing users to make
> off with the actual tuples.
> 
It is a good point, I think.

Even if we can guess or estimate something from circumstances, it does not
allow unprivileged users to read information directly.
In fact, we cannot find such a functionality to prevent side-channel leaks
from the certification reports of commercial RDBMS with RLS (e.g; Oracle
Label Security).

However, the leaky VIEWs has a different characteristic.
It unintentionally allows to fetch contents of invisible tuples and move
into into other tables. It means here is a data flow channel (not side channel),
but it breaks restriction of security views.

Thanks,
-- 
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
KaiGai Kohei
Date:
(2010/06/01 22:09), Robert Haas wrote:
> 2010/6/1 KaiGai Kohei<kaigai@ak.jp.nec.com>:
>> I have an idea that we add FuncExpr a new field (e.g nestlevel) to remember
>> where is originally put in the query, and prevent reordering over the nest
>> level of subqueries.
>> In above example, f_malicious() has nestlevel=0 because it is put on the top
>> level.
>> But f_policy() has nestlevel=1 because it is originally put on the second
>> level subquery. Then, the order_qual_clauses() will check nestlevel of the
>> scan filter prior to reorder them based on the cost estimation.
>> Even if we have multiple nestlevels, solution will be same. A FuncExpr with
>> larger nestlevel shall be invoked earlier than others.
> [...]
>> My idea is similar to what I proposed at [1]. It adds a new field into
>> RelOptInfo (or other structure?) to remember the original nestlevel of
>> the scan, then it will be compared to nestlevel of the FuncExpr.
>> If nestlevel of the FuncExpr is smaller than nestlevel of the RelOptInfo,
>> it prevents to distribute the FuncExpr onto the RelOptInfo, even if the
>> function depends on only the relation of RelOptInfo.
> 
> Keep in mind that users who are NOT using a view as a security barrier
> don't expect it to kill performance.  This approach, and particularly
> the second part, about preventing quals from being pushed through
> joins, has the potential to be a performance disaster.  So I think
> it's absolutely critical that we don't do that except when it's
> necessary to prevent a security issue.
> 
Yes, I agree. It is necessary to distinguish security conscious views
and others. In general, only creator of the view knows its intention
correctly, so I think it is reasonable suggestion to provide a hint
whether we should prevent a part of optimization, or not.

> On the technical side, I am pretty doubtful that the approach of
> adding a nestlevel to FuncExpr and RelOptInfo is the right way to go.
> I believe we have existing code (to handle left joins) that prevents
> quals from being pushed down too far by fudging the set of relations
> that are supposedly needed to evaluate the qual.  I suspect a similar
> approach would work here.
> 
> I think the steps here are:
> 
> 1. Decide whether the view is a security barrier (perhaps, check
> whether the user has sufficient privs to execute the underlying query;
> or we could add an explicit setting).  If not, stop.

I'm a fun of an explicit setting.
Queries are optimized prior to execution stage.

> 2. Decide whether each qual executes potentially untrusted code (algorithm?).

A simple idea is to assume all the FuncExpr being potentially untrusted
as a starting up of the fix.
(Can we trust all the built-in functions? It needs to ensure they don't
have any side-effects; in future versions also.)

> 3. Prevent any untrusted quals from being pushed down into view that
> is a security barrier.
> 
> We should have a design for each of these before we start coding.
> 

Thanks,
-- 
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
KaiGai Kohei
Date:
(2010/06/02 2:28), Robert Haas wrote:
> On Tue, Jun 1, 2010 at 1:02 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> Robert Haas<robertmhaas@gmail.com>  writes:
>>> On Tue, Jun 1, 2010 at 10:57 AM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>>>> CREATE SECURITY VIEW, anyone?
>>
>>> That may be the best approach, but I think it needs more than one line
>>> of exposition.  The approach I proposed was to test whether the user
>>> has privileges to execute the underlying query directly without going
>>> through the view.  If so, we needn't be concerned.  If not, then we
>>> start thinking about which functions/operators we trust.
>>
>> Ummm ... that makes semantics dependent on the permissions available at
>> plan time, whereas what should matter is the permissions that exist at
>> execution time.  Maybe that's all right for this context but it doesn't
>> seem tremendously desirable.
> 
> Ugh.  I hope there's a way around that problem because AFAICS the
> alternative is a world of hurt.  If we're not allowed to take the
> security context into account during planning, then we're going to
> have to make worst-case assumptions, which sounds really unpleasant.
> 
I was reminded that inline_set_returning_function() tries to extract
a given RangeTblEntry with RTE_FUNCTION into a subquery when a few
conditions are satisfied. The conditions include whether user has
privileges to execute the function.

It seems to me planner checks permissions, and going to have worst
case assumptions, if access privilege violations.

As long as we can resolve the problem that I described at [1] (order
of evaluation of scan filters), it seems to me a reasonable fallback.
(Although I mentioned that queries are optimized prior to execution stage...)

>>> Perhaps there is some value to having a knob that goes the opposite
>>> directions and essentially says "I don't really care whether this view
>>> is leaky from a security perspective".  But presumably we don't want
>>> to deliver that behavior by default and require the user to ask for a
>>> SECURITY VIEW to get something else - if anything, we'd want CREATE
>>> VIEW to create the normal (secure) version and add CREATE LEAKY VIEW
>>> to do the other thing.
>>
>> -1 on that.  We will get far more pushback from people whose application
>> performance suddenly went to hell than we will ever get approval from
>> people who actually need the feature.  Considering that we've survived
>> this long with leaky views, that should definitely remain the default
>> behavior.
> 
> Eh, if that's the consensus, it doesn't bother me that much, but it
> doesn't really answer the question, either: supposing we add an
> explicit concept of a security view, what should its semantics be?
> 

How about a GUC option to provide the default, like default_with_oids?

Thanks,
-- 
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
Robert Haas
Date:
2010/6/1 KaiGai Kohei <kaigai@ak.jp.nec.com>:
>> Eh, if that's the consensus, it doesn't bother me that much, but it
>> doesn't really answer the question, either: supposing we add an
>> explicit concept of a security view, what should its semantics be?
>
> How about a GUC option to provide the default, like default_with_oids?

Bad idea.  We already have enough problems with GUCs that can create
security problems if they're unexpectedly set to the wrong value.  We
don't need any more.  Anyhow, that's trivia.  The real thing we need
to decide here is to design the security mechanism.  We can change the
syntax to whatever we want very easily.

Here's another thought.  If we're leaning toward explicit syntax to
designate security views (and I do mean IF, since only one person has
signed on to that, even if it is Tom Lane!), then maybe we should
think about ripping out the logic that causes regular views to be
evaluated using the credentials of the view owner rather than the
person selecting from it.  A security view would still use that logic,
plus whatever additional stuff we come up with to prevent leakage.
Perhaps this would be viewed as a nasty backward compatibility break,
but the upside is that we'd then be being absolutely clear that a
non-security view isn't and can never be trusted to be a security
barrier.  Right now we're shipping something that purports to act as a
barrier but really doesn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
KaiGai Kohei
Date:
(2010/06/02 10:52), Robert Haas wrote:
> 2010/6/1 KaiGai Kohei<kaigai@ak.jp.nec.com>:
>>> Eh, if that's the consensus, it doesn't bother me that much, but it
>>> doesn't really answer the question, either: supposing we add an
>>> explicit concept of a security view, what should its semantics be?
>>
>> How about a GUC option to provide the default, like default_with_oids?
> 
> Bad idea.  We already have enough problems with GUCs that can create
> security problems if they're unexpectedly set to the wrong value.  We
> don't need any more.  Anyhow, that's trivia.  The real thing we need
> to decide here is to design the security mechanism.  We can change the
> syntax to whatever we want very easily.
> 
Indeed, syntax will be decided according to the logic.

> Here's another thought.  If we're leaning toward explicit syntax to
> designate security views (and I do mean IF, since only one person has
> signed on to that, even if it is Tom Lane!), then maybe we should
> think about ripping out the logic that causes regular views to be
> evaluated using the credentials of the view owner rather than the
> person selecting from it.  A security view would still use that logic,
> plus whatever additional stuff we come up with to prevent leakage.
> Perhaps this would be viewed as a nasty backward compatibility break,
> but the upside is that we'd then be being absolutely clear that a
> non-security view isn't and can never be trusted to be a security
> barrier.  Right now we're shipping something that purports to act as a
> barrier but really doesn't.
> 

Sorry, should we make clear the purpose of explicit syntax for security
views being issued now?
In my understanding, if security views, the planner tries to checks
privileges of the person selecting it to reference underlaying tables
without any ereport. If violated, the planner prevents user given
quals (perhaps FuncExpr only?) to come into inside of the join scan.
Otherwise, if regular views, the planner works as is. Right?

I don't think we need whatever additional user visible stuff to prevent
leakage except for fully optimized query plan. (Of course, it can make
performance regression.)
It seems to me the issue is just an order to execute user defined
functions and qualifier of security views to restrict visible tuples,
so I don't know whether it breaks any backward compatibility.

Thanks,
-- 
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
KaiGai Kohei
Date:
(2010/06/02 12:02), KaiGai Kohei wrote:
>> Here's another thought.  If we're leaning toward explicit syntax to
>> designate security views (and I do mean IF, since only one person has
>> signed on to that, even if it is Tom Lane!), then maybe we should
>> think about ripping out the logic that causes regular views to be
>> evaluated using the credentials of the view owner rather than the
>> person selecting from it.  A security view would still use that logic,
>> plus whatever additional stuff we come up with to prevent leakage.
>> Perhaps this would be viewed as a nasty backward compatibility break,
>> but the upside is that we'd then be being absolutely clear that a
>> non-security view isn't and can never be trusted to be a security
>> barrier.  Right now we're shipping something that purports to act as a
>> barrier but really doesn't.
>>
> 
> Sorry, should we make clear the purpose of explicit syntax for security
> views being issued now?
> In my understanding, if security views, the planner tries to checks
> privileges of the person selecting it to reference underlaying tables
> without any ereport. If violated, the planner prevents user given
> quals (perhaps FuncExpr only?) to come into inside of the join scan.
> Otherwise, if regular views, the planner works as is. Right?
> 

I'd like to check up the problem in details.

We can sort out a view into three types, as follows:
(1) A view which cannot be pulled up
(2) A view which can be pulled up, but does not contain any joins
(3) A view which can be pulled up, and contains joins.

For the type (1), we don't need to handle anything special, because
no external quals are unavailable to come into.

For the type (2), we also don't need to handle anything special
except for the evaluation order matter in ExecQual(), because it is
impossible to distribute external quals into a part of relations.

So, the type (3) is all we have to consider here. Right?


Then, where should we distinguish a security view and others?
At least, it should be decided at pull_up_subqueries() or earlier,
because it merges subqueries into the upper query.
In subquery_planner(), the pull_up_subqueries() is called just after
inline_set_returning_functions() which makes RTE_FUNCTION entry flatten
if available. It seems to me not a strange logic to check privileges
on underlaying relations in pull_up_subqueries(), because
inline_set_returning_functions() also checks ACL_EXECUTE here on the
functions to be flatten.


Then, once we can identify what is a security view and not, it shall
be marked on somewhere. Maybe, FromExpr of the pulled-up subquery?


In my current idea, when a qual-node that contains FuncExpr tries to
reference a part of relations within a security view, its referencing
relations will be expanded to whole of the security view at
distribute_qual_to_rels().
Then, it will prevent a user defined function to come into inside of
security views.


At least, it seems to me reasonable to implement.
Shall I launch to develop a proof of concept patch?

Thanks,
-- 
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: [RFC] A tackle to the leaky VIEWs for RLS

From
KaiGai Kohei
Date:
I summarized the series of discussion at: http://wiki.postgresql.org/wiki/RLS#Issue:_A_leaky_VIEWs_for_RLS

Please point out, if I missed or misunderstood something.

Thanks,

(2010/06/03 11:36), KaiGai Kohei wrote:
> (2010/06/02 12:02), KaiGai Kohei wrote:
>>> Here's another thought.  If we're leaning toward explicit syntax to
>>> designate security views (and I do mean IF, since only one person has
>>> signed on to that, even if it is Tom Lane!), then maybe we should
>>> think about ripping out the logic that causes regular views to be
>>> evaluated using the credentials of the view owner rather than the
>>> person selecting from it.  A security view would still use that logic,
>>> plus whatever additional stuff we come up with to prevent leakage.
>>> Perhaps this would be viewed as a nasty backward compatibility break,
>>> but the upside is that we'd then be being absolutely clear that a
>>> non-security view isn't and can never be trusted to be a security
>>> barrier.  Right now we're shipping something that purports to act as a
>>> barrier but really doesn't.
>>>
>>
>> Sorry, should we make clear the purpose of explicit syntax for security
>> views being issued now?
>> In my understanding, if security views, the planner tries to checks
>> privileges of the person selecting it to reference underlaying tables
>> without any ereport. If violated, the planner prevents user given
>> quals (perhaps FuncExpr only?) to come into inside of the join scan.
>> Otherwise, if regular views, the planner works as is. Right?
>>
> 
> I'd like to check up the problem in details.
> 
> We can sort out a view into three types, as follows:
> (1) A view which cannot be pulled up
> (2) A view which can be pulled up, but does not contain any joins
> (3) A view which can be pulled up, and contains joins.
> 
> For the type (1), we don't need to handle anything special, because
> no external quals are unavailable to come into.
> 
> For the type (2), we also don't need to handle anything special
> except for the evaluation order matter in ExecQual(), because it is
> impossible to distribute external quals into a part of relations.
> 
> So, the type (3) is all we have to consider here. Right?
> 
> 
> Then, where should we distinguish a security view and others?
> At least, it should be decided at pull_up_subqueries() or earlier,
> because it merges subqueries into the upper query.
> In subquery_planner(), the pull_up_subqueries() is called just after
> inline_set_returning_functions() which makes RTE_FUNCTION entry flatten
> if available. It seems to me not a strange logic to check privileges
> on underlaying relations in pull_up_subqueries(), because
> inline_set_returning_functions() also checks ACL_EXECUTE here on the
> functions to be flatten.
> 
> 
> Then, once we can identify what is a security view and not, it shall
> be marked on somewhere. Maybe, FromExpr of the pulled-up subquery?
> 
> 
> In my current idea, when a qual-node that contains FuncExpr tries to
> reference a part of relations within a security view, its referencing
> relations will be expanded to whole of the security view at
> distribute_qual_to_rels().
> Then, it will prevent a user defined function to come into inside of
> security views.
> 
> 
> At least, it seems to me reasonable to implement.
> Shall I launch to develop a proof of concept patch?
> 
> Thanks,


-- 
KaiGai Kohei <kaigai@ak.jp.nec.com>