Re: WIP patch (v2) for updatable security barrier views - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: WIP patch (v2) for updatable security barrier views
Date
Msg-id CAEZATCU3VcDKJpnHGFkRMrkz0axKCUH4CE_kQq3Z2HzkNhi5iA@mail.gmail.com
Whole thread Raw
In response to Re: WIP patch (v2) for updatable security barrier views  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: WIP patch (v2) for updatable security barrier views
Re: WIP patch (v2) for updatable security barrier views
List pgsql-hackers
On 13 December 2013 13:52, Craig Ringer <craig@2ndquadrant.com> wrote:
> Hi all
>
> Here's an updated version of the updatable security barrier views patch
> that's proposed as the next stage of progressing row-security toward
> useful and maintainable inclusion in core.
>
> If updatable security barrier views are available then the row-security
> code won't have to play around with remapping vars and attrs during
> query planning when it injects its subquery late. We'll simply stop the
> planner flattening an updatable security barrier view, and for
> row-security we'll dynamically inject one during rewriting when we see a
> relation that has a row-security attribute.

Hi,

I've been thinking about this some more and I don't think you can
avoid the need to remap vars and attrs.

AIUI, your modified version of rewriteTargetView() will turn an UPDATE
query with a rangetable of the form

  rtable:
    1: relation RTE (view) <- resultRelation
  fromList: [1]

into one of the form

  rtable:
    1: relation RTE (view)
    2: relation RTE (base table) <- resultRelation
  fromList: [1]

which will then get transformed later in the rewriter by
fireRIRrules() into

  rtable:
    1: subquery RTE (expanded view)
    2: relation RTE (base table) <- resultRelation
  fromList: [1]

The problem is that the subquery RTE will only expose the columns of
the view, which doesn't necessarily include all the columns from the
base table. These are required for UPDATE, for example:

create table t(a int, b int);
insert into t values(1, 2);
create view v with (security_barrier=true) as select a from t;
update v set a=a+1;
ERROR:  invalid attribute number 2

The columns actually output from the subquery are:

explain (verbose, costs off) update v set a=a+1;
                    QUERY PLAN
--------------------------------------------------
 Update on public.t
   ->  Subquery Scan on v
         Output: (v.a + 1), t.b, v.*, t.ctid, v.*
         ->  Seq Scan on public.t t_1
               Output: t_1.a

which is clearly broken.

So more code will be needed to add the right set of columns to that
subquery RTE, and that's where it will need to mess with the mapping
between columns of the view and columns of the underlying base
relation.

> [snip] Needs some
> changes in the rewriter where expand_target_list is called.

It doesn't look right to be calling expand_target_list() from the
rewriter. It looks like you are calling it before the rangetable
mangling, so all it will do is add targetlist entries for columns of
the view, not for columns of the base relation as the preceding
comment suggests. I think that explains the EXPLAIN output above.

A related problem is that the base table may be the parent of an
inheritance set, in which case attempting to add all the required
columns here in the rewriter is never going to work, because the
inheritance set hasn't been expanded yet, and so the columns of child
tables will be missed.

Normally expand_target_list() is only called from the planner, after
expand_inherited_tables(), at which point it's working with a subplan
with the appropriate parent/child relation, and so it sees the correct
set of columns.


The more I think about this, the more I think that the approach of my
original patch was neater. The idea was to have 2 new pieces of code:

1). In rewriteTargetView() decorate the target RTE with any security
    barrier quals (in the new rte->securityQuals field), instead of
    merging them with the main query's quals. So the output of this
    stage of rewriting would be something like

  rtable:
    1: relation RTE (base table) <- resultRelation
        - securityQuals = [view quals]
  fromList: [1]

2). After all rewriting is complete, scan the query and turn all
    relation RTEs with non-empty securityQuals into subquery RTEs
    (making a copy of the original RTE in the case of the result
    relation).

A future RLS patch would then be able to make use of this simply by
adding its own securityQuals to the relevant RTEs and letting (2)
expand them.

The problem with my earlier patch was that it was calling the subquery
expansion code (2) in the final stage of the rewriter. In light of the
above, that really needs to happen after expand_inherited_tables() so
that it can see the correct parent/child base relation. Another ugly
feature of my earlier patch was the changes it made to
expand_target_list() and the need to track the query's sourceRelation.
Both of those things can be avoided simply by moving the subquery
expansion code (2) to after expand_target_list(), and hence also after
expand_inherited_tables().


Attached is an updated version of my earlier patch with the subquery
expansion code (2) moved into the planner, in a new file
optimizer/prep/prepsecurity.c (it didn't seem to obviously belong in
any of the existing files) and invoked after calling
preprocess_targetlist(). It turns out that this also allows that new
code to be tidied up a bit, and it is easy for it to work out which
attributes are actually required and only include the minimum
necessary set of attributes in the subquery.

Also, since this is now all happening after inheritance expansion, the
subplan's subquery is built with just the relevant parent/child
relation, rather than the complete hierarchy. For example:

create table parent_tbl(a int);
insert into parent_tbl select * from generate_series(1,1000);
create table child_tbl(b int) inherits (parent_tbl);
insert into child_tbl select i,i*10 from generate_series(1001,2000) g(i);
create index child_tbl_idx on child_tbl(a);

create view v with (security_barrier=true)
  as select * from parent_tbl where a > 0;

explain (verbose, costs off) update v set a=a+1 where a=1500;
                                  QUERY PLAN
------------------------------------------------------------------------------
 Update on public.parent_tbl parent_tbl_2
   ->  Subquery Scan on parent_tbl
         Output: (parent_tbl.a + 1), parent_tbl.ctid
         ->  Seq Scan on public.parent_tbl parent_tbl_3
               Output: parent_tbl_3.a, parent_tbl_3.ctid
               Filter: ((parent_tbl_3.a > 0) AND (parent_tbl_3.a = 1500))
   ->  Subquery Scan on parent_tbl_1
         Output: (parent_tbl_1.a + 1), parent_tbl_1.b, parent_tbl_1.ctid
         ->  Bitmap Heap Scan on public.child_tbl
               Output: child_tbl.a, child_tbl.ctid, child_tbl.b
               Recheck Cond: ((child_tbl.a > 0) AND (child_tbl.a = 1500))
               ->  Bitmap Index Scan on child_tbl_idx
                     Index Cond: ((child_tbl.a > 0) AND (child_tbl.a = 1500))

where the second subquery is for updating child_tbl, and has a
different set of columns (and a different access path in this case).
OTOH, your patch generates the following plan for this query:

 Update on public.parent_tbl
   ->  Subquery Scan on v
         Output: (v.a + 1), v.*, parent_tbl.ctid, v.*
         ->  Append
               ->  Seq Scan on public.parent_tbl parent_tbl_1
                     Output: parent_tbl_1.a
                     Filter: ((parent_tbl_1.a > 0) AND (parent_tbl_1.a = 1500))
               ->  Bitmap Heap Scan on public.child_tbl
                     Output: child_tbl.a
                     Recheck Cond: ((child_tbl.a > 0) AND (child_tbl.a = 1500))
                     ->  Bitmap Index Scan on child_tbl_idx
                           Index Cond: ((child_tbl.a > 0) AND
(child_tbl.a = 1500))
   ->  Subquery Scan on v_1
         Output: (v_1.a + 1), b, v_1.*, ctid, v_1.*
         ->  Append
               ->  Seq Scan on public.parent_tbl parent_tbl_2
                     Output: parent_tbl_2.a
                     Filter: ((parent_tbl_2.a > 0) AND (parent_tbl_2.a = 1500))
               ->  Bitmap Heap Scan on public.child_tbl child_tbl_1
                     Output: child_tbl_1.a
                     Recheck Cond: ((child_tbl_1.a > 0) AND
(child_tbl_1.a = 1500))
                     ->  Bitmap Index Scan on child_tbl_idx
                           Index Cond: ((child_tbl_1.a > 0) AND
(child_tbl_1.a = 1500))

which is the wrong set of rows (and columns) in each subquery and it
crashes when it is run.


There is still a lot more testing to be done with my patch, so there
may well be unforeseen problems, but it feels like a cleaner, more
straightforward approach.

Thoughts?

Regards,
Dean

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: ERROR: missing chunk number 0 for toast value
Next
From: Tom Lane
Date:
Subject: Re: dynamic shared memory and locks