Re: What about improving the rules system we have, was Re: Rules going away - Mailing list pgsql-general

From Chris Travers
Subject Re: What about improving the rules system we have, was Re: Rules going away
Date
Msg-id CAKt_ZftQ1Ve7_VQaceMaB8j5aDi7Abm+kykW87AxHiqDpayzPg@mail.gmail.com
Whole thread Raw
In response to Re: What about improving the rules system we have, was Re: Rules going away  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
First, thanks for your thoughtful reply.

On Wed, Sep 28, 2011 at 9:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>
> You're certainly right that unexpected multiple evaluations of volatile
> expressions is the first thing that bites people.  (I don't believe
> that's restricted to DO ALSO vs INSTEAD though.)

I am having trouble thinking of practical uses where this would be a
problem.  I may simply lack imagination though.

>I think there are
> also some fairly serious performance issues for large tables, stemming
> from the rule system's tendency to create large joins under-the-hood for
> any rewritten UPDATE or DELETE.  Not sure what else.

I have run into amusing order or operations conditions on
delete/update rules, but those were all DO INSTEAD instead of DO ALSO.
>
> Maybe.  The advantage of something-thats-not-a-rule is that it would
> dodge all issues of backwards compatibility, since we could just leave
> the rule system behaving as-is until we were ready to remove it
> altogether.  If we try to tweak the semantics of rules then we're likely
> to break things for people who are using them now.  But having said
> that, it's not unreasonable to think about it and try to scope out
> exactly what we might tweak.

One option for a replacement would be to allow triggers on views,
provided that no actual insert or update occurs to the underlying
pseudotable.  Then rules could be deprecated for writing data.
>
>> [ proposal involving a "constant NEW tuple" ]
>
> Not sure this specific proposal makes any sense at all.  IMO the only
> real advantage that rules have over triggers is that they work on a
> set-operation basis not a tuple-by-tuple basis.  Don't see how to
> preserve that characteristic while redefining NEW as a static tuple.
> (IOW, the example you gave is the least interesting possible case.
> Think about how an ON INSERT rule would rewrite an INSERT ... SELECT,
> or in general how you'd rewrite cases that process many tuples.)

It's more of a question of order of operations.  I guess I was trying
to start with a simple example.  In a more complex example, like
INSERT.... SELECT (or better yet, writable common table expressions)
you'd basically have three stages logically (none of which is
necessarily guaranteed to be there if it is not applicable to simpler
cases)....

1)  Initial selection
2)  Insert tuple calculation, if applicable
3)  Actual tuple insert plan

So, suppose we have a totally unrealistic explanatory example:

CREATE TABLE a (id int);
CREATE TABLE b (id int);
CREATE TABLE c (a_id int, b_id int);
CREATE RULE a1 AS ON INSERT TO a DO ALSO INSERT INTO b values (in.id);
CREATE RULE a2 AS ON INSERT TO b DO ALSO INSERT INTO c values (new.id, in.id);
INSERT INTO a(id) select round(random() * 1000) from generate_series(1, 10);

Now, currently, something kinda funny happens with using NEW in this
example instead of IN:
in every row in c, a_id = b_id, but these values do not match a or b
tables.  In other words, for every insert, random() gets executed
three times and three different values get inserted into four columns.
  In other words we get random values which are local to each record
of each subquery, but not local to each value.

So currently I think this rewrites to something like:
INSERT INTO a (id) select round(random() * 1000) as randnum from
generate_series(1, 10);
INSERT INTO b (id) select round(random() * 1000) as randnum from
generate_series(1, 10);
INSERT INTO c (a_id, b_id) SELECT randnum, randnum from (select
round(random() * 1000) as randnum from generate_series(1, 10));

In Pseudocode (back to differentiating in vs new here), I am
suggesting something like:

WITH new (randnum) AS (select round(random() * 1000) as randnum from
generate_series(1, 10))
DO
(INSERT INTO a(id) select randnum from new;
INSERT INTO b(id) select round(random() * 1000) from new;
INSERT INTO c(a_id, b_id) select  randnum, round(random() * 1000) from new;
);

Does this make sense?

Best Wishes,
Chris Travers

pgsql-general by date:

Previous
From: Venkat Balaji
Date:
Subject: : Looking for a PostgreSQL book
Next
From: Rob Sargent
Date:
Subject: Re: Rules going away