Thread: bug? rules fail to cascade after NOT IN

bug? rules fail to cascade after NOT IN

From
Brandon Craig Rhodes
Date:
We have verified this problem under both 7.3.2 and the CVS tip.

The attached example is far simpler than the actual code in our
application, but may nevertheless benefit from some explanation.  We
have several tables with two ON INSERT rules:

    [TABLE policy_accounts]
              |
              | ON INSERT
              V
      RULE executor_active  <-  [TABLE accounts_on_hold]
              |
              V
 [TABLE policy_accounts_active]
              |
              | ON INSERT
              V
      RULE executor_hamlet  <-  [TABLE policy_hamlet_atoms]
              |
              V
   [TABLE account_instances]

The accounts_on_hold lists user accounts for which no changes or
updates should currently be made; so the the executor_active rule
copies new rows from policy_accounts to policy_accounts_active only
for accounts which are not listed in accounts_on_hold.

Our system manages computer accounts across a university campus.
Since a given account might exist on several machines, we have given
the name `account instance' to the idea of an account on a particular
system - so if the account `tgl' were given access to both a Linux
interactive machine and a web email server, we would say that two
`instances' of the account currently existed.

The policy_hamlet_atoms table lists, for each policy, which systems
the policy grants access to.  So when a new row appears in
policy_accounts_active, linking an account to a new policy, the
executor_hamlet rule examines the policy_hamlet_atoms table and
creates any new account_instances that are necessary.

(Thus hamlet decides which account instances are to be, or not to be.)

Our problem is that if we add a NOT IN clause to the executor_active
rule to prevent duplicate inserts into the policy_accounts_active
table, then the executor_hamlet rule will not fire for rows inserted
by the executor_active rule.  While we are beginning to learn how to
read parse trees, we are not yet proficient enough to see what is
going on here.

Two test cases are attached.

The `broken' test case has the NOT IN clause, and incorrectly does
*not* create an account_instances row when executor_active does an
INSERT to policy_accounts_active.  The test code then goes on to
delete and re-insert the row by hand, showing that executor_hamlet
*will* fire if the INSERT to policy_accounts_active is not done
through executor_active.

The `working' test case omits the

    AND (account, policy) NOT IN
     (SELECT account, policy FROM policy_accounts_active)

condition from the end of executor_active, which magically makes the
executor_hamlet rule start firing as it should.

Any help or correction of our techniques will be appreciated!


--
Brandon Craig Rhodes                         http://www.rhodesmill.org/brandon
Georgia Tech                                            brandon@oit.gatech.edu

Attachment

Re: bug? rules fail to cascade after NOT IN

From
Tom Lane
Date:
Brandon Craig Rhodes <brandon@oit.gatech.edu> writes:
> The `working' test case omits the
>     AND (account, policy) NOT IN
>      (SELECT account, policy FROM policy_accounts_active)
> condition from the end of executor_active, which magically makes the
> executor_hamlet rule start firing as it should.

I don't think this is a bug.  The executor_hamlet rule fires after the
executor_active rule does; therefore the (account, policy) pair *has
already been inserted into policy_accounts_active*, and will be found
when executor_hamlet re-executes the select to look for it.

My advice to you is to use triggers, not rules, for pushing data from
one table to another; especially when you need logic this complex to
decide what to do.
        regards, tom lane


Re: bug? rules fail to cascade after NOT IN

From
Brandon Craig Rhodes
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Brandon Craig Rhodes <brandon@oit.gatech.edu> writes:
> > The `working' test case omits the
> >     AND (account, policy) NOT IN
> >      (SELECT account, policy FROM policy_accounts_active)
> > condition from the end of executor_active, which magically makes the
> > executor_hamlet rule start firing as it should.
> 
> I don't think this is a bug.  The executor_hamlet rule fires after the
> executor_active rule does; therefore the (account, policy) pair *has
> already been inserted into policy_accounts_active*, and will be found
> when executor_hamlet re-executes the select to look for it.

This was for me neither intuitive nor clear from the documentation; I
had assumed that the NEW relation of a rule always contained the same
rows that were inserted, updated, or deleted by the query that wound
up invoking the rule.

If I understand your assertion, NEW does *not* in fact refer strictly
to the rows that (in this case) were INSERTed by the upstream query;
rather, NEW refers to a re-invocation - a copy or re-execution - of
the query which produced the NEW rows themselves.  So if the query
which produced the NEW rows has side effects which affect its own
outcome when it is re-evaluated, then the rule will not be working on
the same rows at all?

This is also troubling because it may mean that rules are less
efficient than I had imagined.  Having assumed that rules following
(say) an INSERT used for NEW exactly the same set of rows that were
inserted, then PostgreSQL could get away with only executing that
query once and using the resulting rows for every rule fired by the
INSERT.  But if the original relation is entirely re-executed with the
possibility of a different outcome then it is not clear that several
rules could all share the results of the original query, vastly
reducing the efficiency of several rules cascaded from one another.

Forgive me for complaining about something that has probably been a
fundamental part of the design since the first Postgres, but in my
misunderstanding I had been looking forward to a cascade of rules
cleanly and efficiently sharing the same set of, say, one thousand
INSERTed rows as they propagated its results through our tables.

> My advice to you is to use triggers, not rules, for pushing data
> from one table to another; especially when you need logic this
> complex to decide what to do.

We had hoped, by casading about a dozen rules through tables of this
sort, to allow PostgreSQL to optimize our entire dataflow as a whole
rather than making dozens or hundreds or little trigger invocations
when sets of accounts are created or deleted.

Thanks so much for spending some time helping us to think straight, :-)
-- 
Brandon Craig Rhodes                         http://www.rhodesmill.org/brandon
Georgia Tech                                            brandon@oit.gatech.edu


Re: bug? rules fail to cascade after NOT IN

From
Tom Lane
Date:
Brandon Craig Rhodes <brandon@oit.gatech.edu> writes:
> If I understand your assertion, NEW does *not* in fact refer strictly
> to the rows that (in this case) were INSERTed by the upstream query;
> rather, NEW refers to a re-invocation - a copy or re-execution - of
> the query which produced the NEW rows themselves.

That's about the size of it.  A rule is a macro, and so is NEW (or OLD).

While rule-as-macro works beautifully for views, I've never been
entirely satisfied with it for updating queries.  Hardly anyone is able
to wrap their minds around the behavior, and all too often the only
workable solution is to use triggers instead --- which, as you say,
could be a performance loss when many rows have to be processed.  Even
if it's not a performance loss, one comes away wondering whether the
rule system is really doing the most useful thing.

It would take a rather fundamental redesign of the rule system to do
differently, though.  Are you volunteering?
        regards, tom lane


Re: bug? rules fail to cascade after NOT IN

From
Brandon Craig Rhodes
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> While rule-as-macro works beautifully for views, I've never been
> entirely satisfied with it for updating queries. ... It would take a
> rather fundamental redesign of the rule system to do differently,
> though.  Are you volunteering?

From what I have seen of the planner, its plans look like a funnel -
operator results are repeatedly combined until the result narrows to
exactly the rows specified by the user.  But if NEW and OLD are to
truly become the tuples that were inserted, updated, or deleted, then
plans might being to look like trees that, after combining into a
trunk, start branching out again to form roots - because the resulting
rows will not only be, say, inserted into a table, but might be shared
between several rules that will then have their own operations to
perform upon the result.

And yes, I am volunteering. :-)

-- 
Brandon Craig Rhodes                         http://www.rhodesmill.org/brandon
Georgia Tech                                            brandon@oit.gatech.edu