Thread: bug? rules fail to cascade after NOT IN
We have verified this problem under both 7.3.2 and the CVS tip. The attached example is far simpler than the actual code in our application, but may nevertheless benefit from some explanation. We have several tables with two ON INSERT rules: [TABLE policy_accounts] | | ON INSERT V RULE executor_active <- [TABLE accounts_on_hold] | V [TABLE policy_accounts_active] | | ON INSERT V RULE executor_hamlet <- [TABLE policy_hamlet_atoms] | V [TABLE account_instances] The accounts_on_hold lists user accounts for which no changes or updates should currently be made; so the the executor_active rule copies new rows from policy_accounts to policy_accounts_active only for accounts which are not listed in accounts_on_hold. Our system manages computer accounts across a university campus. Since a given account might exist on several machines, we have given the name `account instance' to the idea of an account on a particular system - so if the account `tgl' were given access to both a Linux interactive machine and a web email server, we would say that two `instances' of the account currently existed. The policy_hamlet_atoms table lists, for each policy, which systems the policy grants access to. So when a new row appears in policy_accounts_active, linking an account to a new policy, the executor_hamlet rule examines the policy_hamlet_atoms table and creates any new account_instances that are necessary. (Thus hamlet decides which account instances are to be, or not to be.) Our problem is that if we add a NOT IN clause to the executor_active rule to prevent duplicate inserts into the policy_accounts_active table, then the executor_hamlet rule will not fire for rows inserted by the executor_active rule. While we are beginning to learn how to read parse trees, we are not yet proficient enough to see what is going on here. Two test cases are attached. The `broken' test case has the NOT IN clause, and incorrectly does *not* create an account_instances row when executor_active does an INSERT to policy_accounts_active. The test code then goes on to delete and re-insert the row by hand, showing that executor_hamlet *will* fire if the INSERT to policy_accounts_active is not done through executor_active. The `working' test case omits the AND (account, policy) NOT IN (SELECT account, policy FROM policy_accounts_active) condition from the end of executor_active, which magically makes the executor_hamlet rule start firing as it should. Any help or correction of our techniques will be appreciated! -- Brandon Craig Rhodes http://www.rhodesmill.org/brandon Georgia Tech brandon@oit.gatech.edu
Attachment
Brandon Craig Rhodes <brandon@oit.gatech.edu> writes: > The `working' test case omits the > AND (account, policy) NOT IN > (SELECT account, policy FROM policy_accounts_active) > condition from the end of executor_active, which magically makes the > executor_hamlet rule start firing as it should. I don't think this is a bug. The executor_hamlet rule fires after the executor_active rule does; therefore the (account, policy) pair *has already been inserted into policy_accounts_active*, and will be found when executor_hamlet re-executes the select to look for it. My advice to you is to use triggers, not rules, for pushing data from one table to another; especially when you need logic this complex to decide what to do. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Brandon Craig Rhodes <brandon@oit.gatech.edu> writes: > > The `working' test case omits the > > AND (account, policy) NOT IN > > (SELECT account, policy FROM policy_accounts_active) > > condition from the end of executor_active, which magically makes the > > executor_hamlet rule start firing as it should. > > I don't think this is a bug. The executor_hamlet rule fires after the > executor_active rule does; therefore the (account, policy) pair *has > already been inserted into policy_accounts_active*, and will be found > when executor_hamlet re-executes the select to look for it. This was for me neither intuitive nor clear from the documentation; I had assumed that the NEW relation of a rule always contained the same rows that were inserted, updated, or deleted by the query that wound up invoking the rule. If I understand your assertion, NEW does *not* in fact refer strictly to the rows that (in this case) were INSERTed by the upstream query; rather, NEW refers to a re-invocation - a copy or re-execution - of the query which produced the NEW rows themselves. So if the query which produced the NEW rows has side effects which affect its own outcome when it is re-evaluated, then the rule will not be working on the same rows at all? This is also troubling because it may mean that rules are less efficient than I had imagined. Having assumed that rules following (say) an INSERT used for NEW exactly the same set of rows that were inserted, then PostgreSQL could get away with only executing that query once and using the resulting rows for every rule fired by the INSERT. But if the original relation is entirely re-executed with the possibility of a different outcome then it is not clear that several rules could all share the results of the original query, vastly reducing the efficiency of several rules cascaded from one another. Forgive me for complaining about something that has probably been a fundamental part of the design since the first Postgres, but in my misunderstanding I had been looking forward to a cascade of rules cleanly and efficiently sharing the same set of, say, one thousand INSERTed rows as they propagated its results through our tables. > My advice to you is to use triggers, not rules, for pushing data > from one table to another; especially when you need logic this > complex to decide what to do. We had hoped, by casading about a dozen rules through tables of this sort, to allow PostgreSQL to optimize our entire dataflow as a whole rather than making dozens or hundreds or little trigger invocations when sets of accounts are created or deleted. Thanks so much for spending some time helping us to think straight, :-) -- Brandon Craig Rhodes http://www.rhodesmill.org/brandon Georgia Tech brandon@oit.gatech.edu
Brandon Craig Rhodes <brandon@oit.gatech.edu> writes: > If I understand your assertion, NEW does *not* in fact refer strictly > to the rows that (in this case) were INSERTed by the upstream query; > rather, NEW refers to a re-invocation - a copy or re-execution - of > the query which produced the NEW rows themselves. That's about the size of it. A rule is a macro, and so is NEW (or OLD). While rule-as-macro works beautifully for views, I've never been entirely satisfied with it for updating queries. Hardly anyone is able to wrap their minds around the behavior, and all too often the only workable solution is to use triggers instead --- which, as you say, could be a performance loss when many rows have to be processed. Even if it's not a performance loss, one comes away wondering whether the rule system is really doing the most useful thing. It would take a rather fundamental redesign of the rule system to do differently, though. Are you volunteering? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > While rule-as-macro works beautifully for views, I've never been > entirely satisfied with it for updating queries. ... It would take a > rather fundamental redesign of the rule system to do differently, > though. Are you volunteering? From what I have seen of the planner, its plans look like a funnel - operator results are repeatedly combined until the result narrows to exactly the rows specified by the user. But if NEW and OLD are to truly become the tuples that were inserted, updated, or deleted, then plans might being to look like trees that, after combining into a trunk, start branching out again to form roots - because the resulting rows will not only be, say, inserted into a table, but might be shared between several rules that will then have their own operations to perform upon the result. And yes, I am volunteering. :-) -- Brandon Craig Rhodes http://www.rhodesmill.org/brandon Georgia Tech brandon@oit.gatech.edu