Thread: Proof of concept: auto updatable views
Hi, I've been playing around with the idea of supporting automatically updatable views, and I have a working proof of concept. I've taken a different approach than the previous attempts to implement this feature (e.g., http://archives.postgresql.org/pgsql-hackers/2009-01/msg01746.php), instead doing all the work in the rewriter, substituting the view for its base relation rather than attempting to auto-generate any rules or triggers. Basically what it does is this: in the first stage of query rewriting, just after any non-SELECT rules are applied, the new code kicks in - if the target relation is a view, and there were no unqualified INSTEAD rules, and there are no INSTEAD OF triggers, it tests if the view is simply updatable. If so, the target view is replaced by its base relation and columns are re-mapped. Then the remainder of the rewriting process continues, recursively handling any further non-SELECT rules or additional simply updatable views. This handles the case of views on top of views, with or without rules and/or triggers. Here's a simple example: CREATE TABLE my_table(id int primary key, val text); CREATE VIEW my_view AS SELECT * FROM my_table WHERE id > 0; then any modifications to the view get redirected to underlying table: EXPLAIN ANALYSE INSERT INTO my_view VALUES(1, 'Test row'); QUERY PLAN ------------------------------------------------------------------------------------------------ Insert on my_table (cost=0.00..0.01 rows=1 width=0) (actual time=0.208..0.208 rows=0 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=1) Total runtime: 0.327 ms (3 rows) EXPLAIN ANALYSE UPDATE my_view SET val='Updated' WHERE id=1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Update on my_table (cost=0.00..8.27 rows=1 width=10) (actual time=0.039..0.039 rows=0 loops=1) -> Index Scan using my_table_pkey on my_table (cost=0.00..8.27 rows=1 width=10) (actual time=0.014..0.015 rows=1 loops=1) Index Cond: ((id > 0) AND (id = 1)) Total runtime: 0.090 ms (4 rows) EXPLAIN ANALYSE DELETE FROM my_view; QUERY PLAN -------------------------------------------------------------------------------------------------------- Delete on my_table (cost=0.00..1.01 rows=1 width=6) (actual time=0.030..0.030 rows=0 loops=1) -> Seq Scan on my_table (cost=0.00..1.01 rows=1 width=6) (actual time=0.015..0.016 rows=1 loops=1) Filter: (id > 0) Total runtime: 0.063 ms (4 rows) The patch is currently very strict about what kinds of views can be updated (based on SQL-92), and there is no support for WITH CHECK OPTION, because I wanted to keep this patch as simple as possible. The consensus last time seemed to be that backwards compatibility should be offered through a new GUC variable to allow this feature to be disabled globally, which I've not implemented yet. I'm also aware that my new function ChangeVarAttnos() is almost identical to the function map_variable_attnos() that Tom recently added, but I couldn't immediately see a neat way to merge the two. My function handles whole-row references to the view by mapping them to a generic RowExpr based on the view definition. I don't think a ConvertRowtypeExpr can be used in this case, because the column names don't necessarily match. Obviously there's still more work to do but the early signs seem to be encouraging. Thoughts? Regards, Dean
Attachment
My thoughts on this is that it would be a very valuable feature to have, and would make Postgres views behave more like they always were intended to behave, which is indistinguishible to users from tables in behavior where all possible, and that the reverse mapping would be automatic with the DBMS being given only the view-defining SELECT, where possible. -- Darren Duncan Dean Rasheed wrote: > I've been playing around with the idea of supporting automatically > updatable views, and I have a working proof of concept. I've taken a > different approach than the previous attempts to implement this > feature (e.g., http://archives.postgresql.org/pgsql-hackers/2009-01/msg01746.php), > instead doing all the work in the rewriter, substituting the view for > its base relation rather than attempting to auto-generate any rules or > triggers. > > Basically what it does is this: in the first stage of query rewriting, > just after any non-SELECT rules are applied, the new code kicks in - > if the target relation is a view, and there were no unqualified > INSTEAD rules, and there are no INSTEAD OF triggers, it tests if the > view is simply updatable. If so, the target view is replaced by its > base relation and columns are re-mapped. Then the remainder of the > rewriting process continues, recursively handling any further > non-SELECT rules or additional simply updatable views. This handles > the case of views on top of views, with or without rules and/or > triggers. > <snip> > > Obviously there's still more work to do but the early signs seem to be > encouraging. > > Thoughts?
On Sun, Jul 1, 2012 at 6:35 PM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > I've been playing around with the idea of supporting automatically > updatable views, and I have a working proof of concept. I've taken a > different approach than the previous attempts to implement this > feature (e.g., http://archives.postgresql.org/pgsql-hackers/2009-01/msg01746.php), > instead doing all the work in the rewriter, substituting the view for > its base relation rather than attempting to auto-generate any rules or > triggers. > > Basically what it does is this: in the first stage of query rewriting, > just after any non-SELECT rules are applied, the new code kicks in - > if the target relation is a view, and there were no unqualified > INSTEAD rules, and there are no INSTEAD OF triggers, it tests if the > view is simply updatable. If so, the target view is replaced by its > base relation and columns are re-mapped. Then the remainder of the > rewriting process continues, recursively handling any further > non-SELECT rules or additional simply updatable views. This handles > the case of views on top of views, with or without rules and/or > triggers. Regrettably, I don't have time to look at this in detail right now, but please add it to the next CommitFest so it gets looked at. It sounds like it could be very cool. > The consensus last time seemed to be that backwards compatibility > should be offered through a new GUC variable to allow this feature to > be disabled globally, which I've not implemented yet. I think the backward-compatibility concerns with this approach would be much less than with the previously-proposed approach, so I'm not 100% sure we need a backward compatibility knob. If we're going to have one, a reloption would probably be a better fit than a GUC, now that views support reloptions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Jul 1, 2012 at 6:35 PM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> Basically what it does is this: in the first stage of query rewriting, >> just after any non-SELECT rules are applied, the new code kicks in - >> if the target relation is a view, and there were no unqualified >> INSTEAD rules, and there are no INSTEAD OF triggers, it tests if ... >> The consensus last time seemed to be that backwards compatibility >> should be offered through a new GUC variable to allow this feature to >> be disabled globally, which I've not implemented yet. > I think the backward-compatibility concerns with this approach would > be much less than with the previously-proposed approach, so I'm not > 100% sure we need a backward compatibility knob. If the above description is correct, the behavior is changed only in cases that previously threw errors, so I tend to agree that no "backwards compatibility knob" is needed. regards, tom lane
On 2 July 2012 21:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sun, Jul 1, 2012 at 6:35 PM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >>> Basically what it does is this: in the first stage of query rewriting, >>> just after any non-SELECT rules are applied, the new code kicks in - >>> if the target relation is a view, and there were no unqualified >>> INSTEAD rules, and there are no INSTEAD OF triggers, it tests if ... > >>> The consensus last time seemed to be that backwards compatibility >>> should be offered through a new GUC variable to allow this feature to >>> be disabled globally, which I've not implemented yet. > >> I think the backward-compatibility concerns with this approach would >> be much less than with the previously-proposed approach, so I'm not >> 100% sure we need a backward compatibility knob. > > If the above description is correct, the behavior is changed only in > cases that previously threw errors, so I tend to agree that no > "backwards compatibility knob" is needed. > Yeah, I think you're right - the default ACLs will typically give sensible behaviour. So unless users have been cavalier with the use of GRANT ALL, their existing views are only going to start becoming updatable to the owners of the views (and only then if the view owner already has write permission on the underlying table). Regards, Dean
I've been looking at this further and I have made some improvements, but also found a problem. On 1 July 2012 23:35, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > I'm also aware that my new function ChangeVarAttnos() is almost > identical to the function map_variable_attnos() that Tom recently > added, but I couldn't immediately see a neat way to merge the two. My > function handles whole-row references to the view by mapping them to a > generic RowExpr based on the view definition. I don't think a > ConvertRowtypeExpr can be used in this case, because the column names > don't necessarily match. I improved on this by reusing the existing function ResolveNew() which reduced the size of the patch. The problem, however, is that the original patch is not safe for UPDATE/DELETE on security barrier views, because it mixes the user's quals with those on the view. For example a query like UPDATE some_view SET col=... WHERE user_quals will get rewritten as UPDATE base_table SET base_col=... WHERE user_quals AND view_quals which potentially leaks data hidden by the view's quals. So I think that it needs to use a subquery to isolate user_quals from view_quals. The least invasive way I could see to do that was to record the view quals in a new security barrier quals field on the RTE, and then turn that into a subquery at the end of the rewriting process. This approach avoids the extensive changes to the rewriter that I think would otherwise be needed if the RTE were changed into a subquery near the start of the rewriting process. It is also possible that this code might be reusable in the row-level security patch by Kohei KaiGai to handle modifications to tables with RLS quals, although I haven't looked too closely at that patch yet. Another complication is that the executor appears to have no rechecking code for subqueries in nodeSubqueryscan.c, so it looks like I need to lock rows coming from the base relation in the subquery. So for example, given the following setup CREATE TABLE private_t(a int, b text); INSERT INTO private_t VALUES (1, 'Private'); INSERT INTO private_t SELECT i, 'Public '||i FROM generate_series(2,10000) g(i); CREATE INDEX private_t_a_idx ON private_t(a); ANALYSE private_t; CREATE VIEW public_v WITH (security_barrier=true) AS SELECT b AS bb, a AS aa FROM private_t WHERE a != 1; CREATE OR REPLACE FUNCTION snoop(b text) RETURNS boolean AS $$ BEGIN RAISE NOTICE 'b=%', b; RETURN false; END; $$ LANGUAGE plpgsql STRICT IMMUTABLE COST 0.000001; an update on the view will get rewritten as an update on the base table from a subquery scan as follows: EXPLAIN (costs off) UPDATE public_v SET aa=aa WHERE snoop(bb) AND aa<=2; Update on private_t -> Subquery Scan on private_t Filter: snoop(private_t.b) -> LockRows -> Index Scan using private_t_a_idx on private_t Index Cond: (a <= 2) Filter: (a <> 1) The LockRows node appears to give the expected behaviour for concurrently modified rows, but I'm not really sure if that's the right approach. None of this new code kicks in for non-security barrier views, so the kinds of plans I posted upthread remain unchanged in that case. But now a significant fraction of the patch is code added to handle security barrier views. Of course we could simply say that such views aren't updatable, but that seems like an annoying limitation if there is a feasible way round it. Thoughts? Regards, Dean
Attachment
On 12 August 2012 22:14, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > I've been looking at this further and I have made some improvements... Here's an updated WIP patch which I'll add to the next commitfest. I've added information schema updates and some new regression tests. I still need to work on some doc updates. Regards, Dean
Attachment
On 27 August 2012 20:26, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > Here's an updated WIP patch which I'll add to the next commitfest. Re-sending gzipped (apparently the mail system corrupted it last time). Regards, Dean
Attachment
On Sun, Aug 12, 2012 at 5:14 PM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > None of this new code kicks in for non-security barrier views, so the > kinds of plans I posted upthread remain unchanged in that case. But > now a significant fraction of the patch is code added to handle > security barrier views. Of course we could simply say that such views > aren't updatable, but that seems like an annoying limitation if there > is a feasible way round it. Maybe it'd be a good idea to split this into two patches: the first could implement the feature but exclude security_barrier views, and the second could lift that restriction. Just a thought. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 30 August 2012 20:05, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Aug 12, 2012 at 5:14 PM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> None of this new code kicks in for non-security barrier views, so the >> kinds of plans I posted upthread remain unchanged in that case. But >> now a significant fraction of the patch is code added to handle >> security barrier views. Of course we could simply say that such views >> aren't updatable, but that seems like an annoying limitation if there >> is a feasible way round it. > > Maybe it'd be a good idea to split this into two patches: the first > could implement the feature but exclude security_barrier views, and > the second could lift that restriction. > Yes, I think that makes sense. I should hopefully get some time to look at it over the weekend. Regards, Dean
On 31 August 2012 07:59, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On 30 August 2012 20:05, Robert Haas <robertmhaas@gmail.com> wrote: >> On Sun, Aug 12, 2012 at 5:14 PM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >>> None of this new code kicks in for non-security barrier views, so the >>> kinds of plans I posted upthread remain unchanged in that case. But >>> now a significant fraction of the patch is code added to handle >>> security barrier views. Of course we could simply say that such views >>> aren't updatable, but that seems like an annoying limitation if there >>> is a feasible way round it. >> >> Maybe it'd be a good idea to split this into two patches: the first >> could implement the feature but exclude security_barrier views, and >> the second could lift that restriction. >> > > Yes, I think that makes sense. > I should hopefully get some time to look at it over the weekend. > Here's an updated patch for the base feature (without support for security barrier views) with updated docs and regression tests. Regards, Dean