Thread: What about improving the rules system we have, was Re: Rules going away
On Wed, Sep 28, 2011 at 7:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think the true state of affairs is this: rules have a lot of > surprising behaviors, and if we could think of something that works more > straightforwardly, we'd love to replace them. But I think we'd have to > have the "something" in place before we consider deprecating rules. > At the moment we don't even have a glimmer of a design, so David's > statement is many years premature. > > I guess my question is what it would take to eliminate some of the very surprising behaviors. It seems to me that the cases of surprising behaviors are actually pretty limited (moreso than table inheritance for example). The only surprises I can think of come with DO ALSO rules which involve expressions which should not be re-executed (nextval being a good example). DO INSTEAD rules work extremely well, IMO, and I can't think of any cases where there surprises which are particularly easy to get bitten by there. It seems to me that a lot of problems with DO ALSO rules are issues which happen when lexical ambiguity hits implementation (do you mean DO ALSO nextval() again or take the value from the previous nextval() and feed it into this rule?). Instead of considering deprecating and replacing rules with something else, I guess I am wondering if the issues re fundamental or issues of implementation. It seems to me we might get more mileage out of pondering a Rules 2.0 approach, discussing the specifications, etc. that we would in looking for an alternative. I figure if we start this discussion then folks may have a basis for considering alternatives down the line. If we never discuss it however, then folks who want something more predictable and straightforward have nothing to consider. So I guess I'd start by asking the community a couple of questions: 1) Are there surprising behaviors in DO INSTEAD rules? 2) Should we assume that there are cases dependent on existing behaviors? If it were up to me I would change the rules spec in a couple of clearly defined ways and then look at how to tweak what we have to implement those changes. In particular I would: 1) Redefine NEW so that it is post-initial-expression evaluation and therefore remains constant at the input of the rule's query plan. 2) Add a new IN tuple which is pre-initial-expression evaluation and therefore does not stay constant as a guarantee. So for example, suppose I have the following tables which are used to stage data coming into an accounting system. CREATE TABLE my_transactions1 (id int not null unique default nextval('my_transactions1_id_seq'), post_on date not null, credit_account int not null references credit_accounts(id), source_id text not null unique, total_amount numeric); CREATE TABLE my_transactions2 (id int not null unique default nextval('my_transactions1_id_seq'), post_on date not null, credit_account int not null references credit_accounts(id), source_id text not null unique, total_amount numeric); If I: CREATE RULE insert_deduction AS ON INSERT TO my_transactions1 DO ALSO INSERT INTO my_transactions2 (id, post_on, credit_account, source_id, total_amount) VALUES (IN.id, get_deduction_account(NEW.credit_account), get_deduction_source(NEW.source), NEW.total_amount * -1); Then I would expect nextval() to be executed twice, while NEW would behave as it does in triggers. I don't know how feasible it is to implement such a thing, and certainly it would break backwards compatibility for at least some users. But I don't think it would be any worse than outright replacing the rules system. Best Wishes, Chris Travers
Chris Travers <chris.travers@gmail.com> writes: > On Wed, Sep 28, 2011 at 7:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think the true state of affairs is this: rules have a lot of >> surprising behaviors, and if we could think of something that works more >> straightforwardly, we'd love to replace them. > I guess my question is what it would take to eliminate some of the > very surprising behaviors. You're certainly right that unexpected multiple evaluations of volatile expressions is the first thing that bites people. (I don't believe that's restricted to DO ALSO vs INSTEAD though.) I think there are also some fairly serious performance issues for large tables, stemming from the rule system's tendency to create large joins under-the-hood for any rewritten UPDATE or DELETE. Not sure what else. > Instead of considering deprecating and replacing rules with something > else, I guess I am wondering if the issues re fundamental or issues of > implementation. It seems to me we might get more mileage out of > pondering a Rules 2.0 approach, discussing the specifications, etc. > that we would in looking for an alternative. Maybe. The advantage of something-thats-not-a-rule is that it would dodge all issues of backwards compatibility, since we could just leave the rule system behaving as-is until we were ready to remove it altogether. If we try to tweak the semantics of rules then we're likely to break things for people who are using them now. But having said that, it's not unreasonable to think about it and try to scope out exactly what we might tweak. > [ proposal involving a "constant NEW tuple" ] Not sure this specific proposal makes any sense at all. IMO the only real advantage that rules have over triggers is that they work on a set-operation basis not a tuple-by-tuple basis. Don't see how to preserve that characteristic while redefining NEW as a static tuple. (IOW, the example you gave is the least interesting possible case. Think about how an ON INSERT rule would rewrite an INSERT ... SELECT, or in general how you'd rewrite cases that process many tuples.) regards, tom lane
In article <4116.1317226367@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> writes: > Not sure this specific proposal makes any sense at all. IMO the only > real advantage that rules have over triggers is that they work on a > set-operation basis not a tuple-by-tuple basis. Isn't that what statement-level triggers are for, at least in other DB systems? How about telling PostgreSQL's statement-level triggers something about the set of rows they affect?
Re: What about improving the rules system we have, was Re: Rules going away
From
Merlin Moncure
Date:
On Wed, Sep 28, 2011 at 11:46 AM, Harald Fuchs <hari.fuchs@gmail.com> wrote: > In article <4116.1317226367@sss.pgh.pa.us>, > Tom Lane <tgl@sss.pgh.pa.us> writes: > >> Not sure this specific proposal makes any sense at all. IMO the only >> real advantage that rules have over triggers is that they work on a >> set-operation basis not a tuple-by-tuple basis. > > Isn't that what statement-level triggers are for, at least in other DB > systems? How about telling PostgreSQL's statement-level triggers > something about the set of rows they affect? in theory that would be nice, but they just don't work that way. you don't even have access to the SQL statement firing the trigger IIRC. that said, with some thought you could work an 'after' trigger into a set level operation, say, by rigging something around now(). now that we have view triggers (not that rules ever really worked for updating views anyways), even notwithstanding the minor limitations of triggers of rules vs triggers, I personally find the RULE feature to be useless and dangerous. I'd vote for immediately deprecating it without hesitation. merlin
Re: What about improving the rules system we have, was Re: Rules going away
From
Chris Travers
Date:
First, thanks for your thoughtful reply. On Wed, Sep 28, 2011 at 9:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > You're certainly right that unexpected multiple evaluations of volatile > expressions is the first thing that bites people. (I don't believe > that's restricted to DO ALSO vs INSTEAD though.) I am having trouble thinking of practical uses where this would be a problem. I may simply lack imagination though. >I think there are > also some fairly serious performance issues for large tables, stemming > from the rule system's tendency to create large joins under-the-hood for > any rewritten UPDATE or DELETE. Not sure what else. I have run into amusing order or operations conditions on delete/update rules, but those were all DO INSTEAD instead of DO ALSO. > > Maybe. The advantage of something-thats-not-a-rule is that it would > dodge all issues of backwards compatibility, since we could just leave > the rule system behaving as-is until we were ready to remove it > altogether. If we try to tweak the semantics of rules then we're likely > to break things for people who are using them now. But having said > that, it's not unreasonable to think about it and try to scope out > exactly what we might tweak. One option for a replacement would be to allow triggers on views, provided that no actual insert or update occurs to the underlying pseudotable. Then rules could be deprecated for writing data. > >> [ proposal involving a "constant NEW tuple" ] > > Not sure this specific proposal makes any sense at all. IMO the only > real advantage that rules have over triggers is that they work on a > set-operation basis not a tuple-by-tuple basis. Don't see how to > preserve that characteristic while redefining NEW as a static tuple. > (IOW, the example you gave is the least interesting possible case. > Think about how an ON INSERT rule would rewrite an INSERT ... SELECT, > or in general how you'd rewrite cases that process many tuples.) It's more of a question of order of operations. I guess I was trying to start with a simple example. In a more complex example, like INSERT.... SELECT (or better yet, writable common table expressions) you'd basically have three stages logically (none of which is necessarily guaranteed to be there if it is not applicable to simpler cases).... 1) Initial selection 2) Insert tuple calculation, if applicable 3) Actual tuple insert plan So, suppose we have a totally unrealistic explanatory example: CREATE TABLE a (id int); CREATE TABLE b (id int); CREATE TABLE c (a_id int, b_id int); CREATE RULE a1 AS ON INSERT TO a DO ALSO INSERT INTO b values (in.id); CREATE RULE a2 AS ON INSERT TO b DO ALSO INSERT INTO c values (new.id, in.id); INSERT INTO a(id) select round(random() * 1000) from generate_series(1, 10); Now, currently, something kinda funny happens with using NEW in this example instead of IN: in every row in c, a_id = b_id, but these values do not match a or b tables. In other words, for every insert, random() gets executed three times and three different values get inserted into four columns. In other words we get random values which are local to each record of each subquery, but not local to each value. So currently I think this rewrites to something like: INSERT INTO a (id) select round(random() * 1000) as randnum from generate_series(1, 10); INSERT INTO b (id) select round(random() * 1000) as randnum from generate_series(1, 10); INSERT INTO c (a_id, b_id) SELECT randnum, randnum from (select round(random() * 1000) as randnum from generate_series(1, 10)); In Pseudocode (back to differentiating in vs new here), I am suggesting something like: WITH new (randnum) AS (select round(random() * 1000) as randnum from generate_series(1, 10)) DO (INSERT INTO a(id) select randnum from new; INSERT INTO b(id) select round(random() * 1000) from new; INSERT INTO c(a_id, b_id) select randnum, round(random() * 1000) from new; ); Does this make sense? Best Wishes, Chris Travers