Thread: [BUGS] BUG #14808: V10-beta4, backend abort
The following bug has been logged on the website: Bug reference: 14808 Logged by: Philippe BEAUDOIN Email address: phb07@apra.asso.fr PostgreSQL version: 10beta4 Operating system: Linux Description: Hi all, While continuing to play with transition tables in statement level trigger, I have encountered what looks like a backend abort. I have been able to reproduce the case with the following simple script: #!/bin/sh export PGHOST=localhost export PGPORT=5410 dropdb test createdb test psql test <<*EOF* \set ON_ERROR_STOP on CREATE OR REPLACE FUNCTION create_tbl(grpdef_schema TEXT, grpdef_tblseq TEXT) RETURNS VOID LANGUAGE plpgsql SECURITY DEFINER AS \$_create_tbl\$ DECLARE v_fullTableName TEXT; v_logTableName TEXT; v_logFnctName TEXT; v_colList1 TEXT; v_colList2 TEXT; v_colList3 TEXT; v_colList4 TEXT; BEGIN -- build the different name for table, trigger, functions,... v_fullTableName = grpdef_schema || '.' || grpdef_tblseq; v_logTableName = grpdef_tblseq || '_log'; v_logFnctName = grpdef_tblseq || '_log_idx'; -- build the tables's columns lists SELECT string_agg('tbl.' || col_name, ','), string_agg('o.' || col_name ||' AS ' || col_name_o || ', n.' || col_name || ' AS ' || col_name_n, ','), string_agg('r.' || col_name_o, ','), string_agg('r.' || col_name_n,',') INTO v_colList1, v_colList2, v_colList3, v_colList4 FROM ( SELECT quote_ident(attname) AS col_name,quote_ident('o_' || attname) AS col_name_o, quote_ident('n_' || attname) AS col_name_n FROM pg_catalog.pg_attribute WHERE attrelid = v_fullTableName::regclass AND attnum > 0 AND NOT attisdropped ORDER BY attnum) AS t; -- create the log table: it looks like the application table, with some additional technical columns EXECUTE 'DROP TABLE IF EXISTS ' || v_logTableName; EXECUTE 'CREATE TABLE ' || v_logTableName || ' (LIKE ' || v_fullTableName || ') '; EXECUTE 'ALTER TABLE ' || v_logTableName || ' ADDCOLUMN verb VARCHAR(3),' || ' ADD COLUMN tuple VARCHAR(3)'; -- create the log function EXECUTE 'CREATE OR REPLACE FUNCTION ' || v_logFnctName || '() RETURNS TRIGGER AS \$logfnct\$' || 'DECLARE' || ' r RECORD;' || 'BEGIN' || ' IF (TG_OP= ''DELETE'') THEN' || ' INSERT INTO ' || v_logTableName || ' SELECT ' || v_colList1 || ', ''DEL'', ''OLD'' FROM old_table tbl;' || ' ELSIF (TG_OP = ''INSERT'') THEN' || ' INSERT INTO ' ||v_logTableName || ' SELECT ' || v_colList1 || ', ''INS'', ''NEW'' FROM new_table tbl;' || ' ELSIF (TG_OP = ''UPDATE'') THEN' || ' FOR r IN' || ' WITH' || ' o AS (SELECT ' || v_colList1 || ', row_number() OVER () AS ord FROM old_table tbl' || ' ),' || ' n AS (SELECT ' || v_colList1 || ', row_number()OVER () AS ord FROM new_table tbl' || ' )' || ' SELECT ' || v_colList2 || ' FROM o JOIN nUSING(ord)' || ' LOOP' || ' INSERT INTO ' || v_logTableName || ' SELECT ' ||v_colList3 || ', ''UPD'', ''OLD'';' || ' INSERT INTO ' || v_logTableName || ' SELECT ' ||v_colList4 || ', ''UPD'', ''NEW'';' || ' END LOOP;' || ' END IF;' || ' RETURN NULL;' ||'END;' || '\$logfnct\$ LANGUAGE plpgsql SECURITY DEFINER;'; -- creation of the log trigger on the application table, using the previously created log function EXECUTE 'CREATE TRIGGER insert_log_trg' || ' AFTER INSERT ON ' || v_fullTableName|| ' REFERENCING NEW TABLE AS new_table' || ' FOR EACH STATEMENT EXECUTE PROCEDURE ' || v_logFnctName || '()'; EXECUTE 'CREATE TRIGGER update_log_trg' || ' AFTER UPDATE ON ' || v_fullTableName || ' REFERENCING OLD TABLE AS old_table NEW TABLE AS new_table' || ' FOR EACH STATEMENT EXECUTE PROCEDURE ' || v_logFnctName || '()'; EXECUTE 'CREATE TRIGGER delete_log_trg' || ' AFTER DELETE ON ' || v_fullTableName || ' REFERENCING OLD TABLE AS old_table' || ' FOR EACH STATEMENT EXECUTE PROCEDURE ' || v_logFnctName || '()'; RETURN; END; \$_create_tbl\$; CREATE TABLE myTbl1 ( col11 INT NOT NULL, col12 TEXT , col13 TEXT , PRIMARY KEY (col11) ); CREATE TABLE myTbl3 ( col41 INT NOT NULL, col44 INT , PRIMARY KEY (col41), FOREIGN KEY (col44) REFERENCESmyTbl1 (col11) ON DELETE CASCADE ON UPDATE SET NULL ); select create_tbl('public','mytbl1'); select create_tbl('public','mytbl3'); insert into myTbl1 select i, 'ABC', 'abc' from generate_series (1,10100) as i; update myTbl1 set col13=E'\\034'::bytea where col11 <= 500; delete from myTbl1 where col11 > 10000; *EOF* As a result, the last DELETE statement fails. I get: CREATE FUNCTION CREATE TABLE CREATE TABLE NOTICE: table "mytbl1_log" does not exist, skippingcreate_tbl ------------ (1 row) NOTICE: table "mytbl3_log" does not exist, skippingcreate_tbl ------------ (1 row) INSERT 0 1101 UPDATE 0 server closed the connection unexpectedlyThis probably means the server terminated abnormallybefore or while processing therequest. connection to server was lost The postgresql.conf file has default parameters, except: listen_addresses = '*' port = 5410 max_prepared_transactions 5 logging_collector = on track_functions = all track_commit_timestamp = on Best regards. Philippe Beaudoin. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Sat, Sep 9, 2017 at 3:48 PM, <phb07@apra.asso.fr> wrote: > INSERT 0 1101 > UPDATE 0 > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > connection to server was lost Crash is confirmed here so I am adding an open item. I have not dug into the issue seriously, but at short glance we have a code path trying to access something that has already been free'd: #0 0x0000561bfe0f0959 in tuplestore_tuple_count (state=0x7f7f7f7f7f7f7f7f) at tuplestore.c:548 548 return state->tuples; (gdb) bt #0 0x0000561bfe0f0959 in tuplestore_tuple_count (state=0x7f7f7f7f7f7f7f7f) at tuplestore.c:548 #1 0x0000561bfdd8ca22 in SPI_register_trigger_data (tdata=0x7ffc92083860) at spi.c:2764 #2 0x00007f7075da7156 in plpgsql_exec_trigger (func=0x561bfe7bc5a8, trigdata=0x7ffc92083860) at pl_exec.c:692 #3 0x00007f7075da08e7 in plpgsql_call_handler (fcinfo=0x7ffc920833d0) at pl_handler.c:24 (gdb) p state $1 = (Tuplestorestate *) 0x7f7f7f7f7f7f7f7 -- Michael -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Michael Paquier <michael.paquier@gmail.com> writes: > Crash is confirmed here so I am adding an open item. I have not dug > into the issue seriously, but at short glance we have a code path > trying to access something that has already been free'd: I think this can be blamed on commit c46c0e52. What is happening is that the AFTER triggers are queuing more triggers, which have TransitionCaptureStates associated, but ExecEndModifyTable thinks it can DestroyTransitionCaptureState unconditionally. When the subsidiary triggers eventually get executed, their ats_transition_capture pointers are pointing at garbage. My first instinct is to get rid of DestroyTransitionCaptureState altogether, on the grounds that the TransitionCaptureState will go away at transaction cleanup and we can't really get rid of it any sooner than that. The other question that seems pretty relevant is why the subsidiary triggers, which are the constraint triggers associated with the tables' foreign keys, are getting queued with TransitionCaptureState pointers in the first place. This seems horridly expensive and unnecessary. It also directly contradicts the claim in MakeTransitionCaptureState that constraint triggers cannot have transition tables. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
I wrote: > What is happening is that the AFTER triggers are queuing more triggers, > which have TransitionCaptureStates associated, but > ExecEndModifyTable thinks it can DestroyTransitionCaptureState > unconditionally. When the subsidiary triggers eventually get executed, > their ats_transition_capture pointers are pointing at garbage. On closer inspection, the issue is specific to TCS-using AFTER triggers that are being fired as a result of foreign key enforcement triggers. In the example, each row deleted from myTbl1 causes firing of the ON DELETE CASCADE enforcement trigger, which will execute a DELETE against myTbl3. That won't delete anything (since myTbl3 is empty) but we nonetheless queue a firing of the TCS-using AFTER STATEMENT trigger for myTbl3. Now, the ExecEndModifyTable instance for the DELETE supposes that all TCS-using triggers must have been fired during ExecutorFinish; but *that doesn't happen if EXEC_FLAG_SKIP_TRIGGERS is set*, which it is because ri_PerformCheck tells SPI not to fire triggers. I do not recall the exact details of why that is, but I believe the intention is to fire RI-triggered triggers at the end of the outer statement rather than exposing the individual RI action as a statement. I made a quick hack to not delete the TCS if EXEC_FLAG_SKIP_TRIGGERS is set, as attached. The example succeeds given this. However, note that the AFTER STATEMENT trigger will be fired once per myTbl1 row deletion, each time seeing a transition table consisting of only the myTbl3 rows that went away as a result of that one single row deletion. I'm not sure that this is per the letter or spirit of the SQL spec. If it isn't, I don't think there is much we can do about it for v10. In v11 or later, we could think about somehow merging the transition tables for all the RI actions triggered by one statement. This might well need to go along with rewriting the RI framework to use statement triggers and TCS state itself. I think people had already muttered about doing that. > The other question that seems pretty relevant is why the subsidiary > triggers, which are the constraint triggers associated with the > tables' foreign keys, are getting queued with TransitionCaptureState > pointers in the first place. This seems horridly expensive and > unnecessary. It also directly contradicts the claim in > MakeTransitionCaptureState that constraint triggers cannot have > transition tables. The other part of the attached patch tweaks AfterTriggerSaveEvent to not store an ats_transition_capture pointer for a deferrable trigger event. This doesn't have anything directly to do with the current bug, because although the RI triggers are being stored with such pointers, they aren't actually dereferencing them. However, it seems like a good idea anyway, to (1) ensure that we don't have dangling pointers in the trigger queue, and (2) possibly allow more merging of shared trigger states. regards, tom lane diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index bd84778..49586a3 100644 *** a/src/backend/executor/nodeModifyTable.c --- b/src/backend/executor/nodeModifyTable.c *************** ExecEndModifyTable(ModifyTableState *nod *** 2318,2325 **** { int i; ! /* Free transition tables */ ! if (node->mt_transition_capture != NULL) DestroyTransitionCaptureState(node->mt_transition_capture); /* --- 2318,2331 ---- { int i; ! /* ! * Free transition tables, unless this query is being run in ! * EXEC_FLAG_SKIP_TRIGGERS mode, which means that it may have queued AFTER ! * triggers that won't be run till later. In that case we'll just leak ! * the transition tables till end of (sub)transaction. ! */ ! if (node->mt_transition_capture != NULL && ! !(node->ps.state->es_top_eflags & EXEC_FLAG_SKIP_TRIGGERS)) DestroyTransitionCaptureState(node->mt_transition_capture); /* diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c index da0850b..bbfbc06 100644 *** a/src/backend/commands/trigger.c --- b/src/backend/commands/trigger.c *************** AfterTriggerSaveEvent(EState *estate, Re *** 5474,5480 **** new_shared.ats_tgoid = trigger->tgoid; new_shared.ats_relid = RelationGetRelid(rel); new_shared.ats_firing_id = 0; ! new_shared.ats_transition_capture = transition_capture; afterTriggerAddEvent(&afterTriggers.query_stack[afterTriggers.query_depth], &new_event, &new_shared); --- 5474,5482 ---- new_shared.ats_tgoid = trigger->tgoid; new_shared.ats_relid = RelationGetRelid(rel); new_shared.ats_firing_id = 0; ! /* deferrable triggers cannot access transition data */ ! new_shared.ats_transition_capture = ! trigger->tgdeferrable ? NULL : transition_capture; afterTriggerAddEvent(&afterTriggers.query_stack[afterTriggers.query_depth], &new_event, &new_shared); -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
I wrote: > Now, the ExecEndModifyTable instance for the DELETE supposes that > all TCS-using triggers must have been fired during ExecutorFinish; > but *that doesn't happen if EXEC_FLAG_SKIP_TRIGGERS is set*, which > it is because ri_PerformCheck tells SPI not to fire triggers. In view of the fact that 10rc1 wraps tomorrow, there doesn't seem to be time to do anything better about this than my quick hack. So I went ahead and pushed that, with a regression test case. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, Sep 11, 2017 at 7:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> Now, the ExecEndModifyTable instance for the DELETE supposes that >> all TCS-using triggers must have been fired during ExecutorFinish; >> but *that doesn't happen if EXEC_FLAG_SKIP_TRIGGERS is set*, which >> it is because ri_PerformCheck tells SPI not to fire triggers. > > In view of the fact that 10rc1 wraps tomorrow, there doesn't seem > to be time to do anything better about this than my quick hack. > So I went ahead and pushed that, with a regression test case. Thank you for the testing and report Philippe, and for the analysis and fix Tom. @@ -2318,8 +2318,14 @@ ExecEndModifyTable(ModifyTableState *node){ int i; - /* Free transition tables */ - if (node->mt_transition_capture != NULL) + /* + * Free transition tables, unless this query is being run in + * EXEC_FLAG_SKIP_TRIGGERS mode, which means that it may have queued AFTER + * triggers that won't be run till later. In that case we'll just leak + * the transition tables till end of (sub)transaction. + */ + if (node->mt_transition_capture != NULL && + !(node->ps.state->es_top_eflags & EXEC_FLAG_SKIP_TRIGGERS)) DestroyTransitionCaptureState(node->mt_transition_capture); As an idea for a v11 patch, I wonder if it would be better to count references instead of leaking the TCS as soon as fk-on-cascade triggers enter the picture. The ModifyTable node would release its reference in ExecEndModifyTable(), and the queued events' references would be counted too. I briefly considered a scheme like that before proposing 501ed02c, but concluded, as it turns out incorrectly, that there was no way for a trigger referencing node->mt_transition_capture to fire after that point. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Thomas Munro <thomas.munro@enterprisedb.com> writes: > As an idea for a v11 patch, I wonder if it would be better to count > references instead of leaking the TCS as soon as fk-on-cascade > triggers enter the picture. I thought of reference counting as well, but I think it's not really necessary if we organize things correctly. The key problem here IMO is that the transition tables need to be attached to a trigger execution context, rather than supposing that they can be completely managed by ModifyTable plan nodes. I dug around in the SQL spec and convinced myself that indeed it does require all RI updates triggered by a single statement to be presented in a single transition table. Concretely, it says A trigger execution context consists of a set of state changes. Within a trigger execution context, each statechange is uniquely identified by a trigger event, a subject table, and a column list. The trigger eventcan be DELETE, INSERT, or UPDATE. A state change SC contains a set of transitions, a set of statement-level triggers considered as executed for SC, and a set of row-level triggers, each paired with the set of rows inSC for which it is considered as executed. Note the "uniquely identified" bit --- you aren't allowed to have multiple SCs for the same table and same kind of event, at least up to the bit about column lists. I've not fully wrapped my head around the column list part of it, but certainly all effects of a particular RI constraint will have the same column list. Now, the lifespan of a trigger execution context is one SQL-statement, but that's one user-executed SQL-statement --- the queries that we gin up for RI enforcement are an implementation detail that don't get their own context. (The part of the spec that defines RI actions seems pretty clear that the actions insert state changes into the active statement's trigger execution context, not create their own context.) It's also interesting in this context to re-read commit 9cb840976, which is what created the "skip trigger" business. That exhibits a practical problem that you hit if you don't do it like this. So ISTM that basically we need trigger.c to own the transition tables. ModifyTable, instead of just creating a TCS, needs to ask trigger.c for a TCS relevant to its target table and command type (and column list if we decide we need to bother with that). trigger.c would discard TCSes during AfterTriggerEndQuery, where it closes out a given query_depth level. This actually seems like it might not be that big a change, other than the issue of what do the column lists mean exactly. Maybe we should try to get it done for v10, rather than shipping known-non-spec-compliant behavior. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Tom Lane wrote: > This actually seems like it might not be that big a change, other than > the issue of what do the column lists mean exactly. Maybe we should try > to get it done for v10, rather than shipping known-non-spec-compliant > behavior. I think this means we need to abort RC1 today and get another beta out. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
I wrote: > Note the "uniquely identified" bit --- you aren't allowed to have multiple > SCs for the same table and same kind of event, at least up to the bit > about column lists. I've not fully wrapped my head around the column list > part of it, but certainly all effects of a particular RI constraint will > have the same column list. After further study of the standard, it seems that the column list business is meant to allow identification of which UPDATE triggers with column lists ("AFTER UPDATE OF columnList" syntax) are supposed to be triggered. The spec is confusing because they describe this in a way that would be impossibly inefficient if implemented verbatim. They say that, given a row UPDATE affecting a certain set of columns, you're supposed to create state changes labeled with every possibly relevant trigger column set: — Let OC be the set of column names identifying the columns [being updated] — Let PSC be the set consisting of the empty set and every subset of the set of column names of [the target table] that hasat least one column that is in OC - [ create a state change for each element of PSC ] Then an update trigger is triggered by a particular state change if its column list exactly matches that state change's list. This seems like a remarkably stupid way to go about it; you'd end up with many state changes that correspond to no existing trigger and are never of any use. However, if I'm reading it right, there is a property of this that is very significant in the context of RI updates. For an ordinary UPDATE SQL command, all the row updates have the same set of target columns, but that's not necessarily true for all the RI updates that a single SQL command could trigger. If there's more than one RI constraint between two tables then an update on the referenced table could trigger sets of updates that affect overlapping, but not identical, sets of rows in the referencing tables --- and those sets would have different sets of target columns. So a given column-specific trigger might be interested in some or all of the RI updates. And if it is interested, and is a statement trigger, it is supposed to be fired just once with a transition table containing all the rows it is interested in. In other words, UPDATE triggers with different column lists potentially need to see different transition tables, and any given row that was updated might need to appear in some subset of those tables. This seems like rather a mess to implement. I wonder whether I'm reading it right, and whether other DBMSes actually implement it like that. I think that what might be a good short-term solution is to refuse creation of column-specific triggers with transition tables (ie, if you ask for a transition table then you can only say AFTER UPDATE not AFTER UPDATE OF columnList). Then, all TT-using triggers are interested in all modified rows and we don't have to distinguish different column lists for the purposes of transition tables. Then the problem reduces to one TCS per target table and event type, which doesn't seem too hard to do. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Sat, Sep 9, 2017 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > My first instinct is to get rid of DestroyTransitionCaptureState > altogether, on the grounds that the TransitionCaptureState will > go away at transaction cleanup and we can't really get rid of it > any sooner than that. End of transaction, or end of query? I'm not sure what happens when triggers are deferred, but I think there are a lot of cases when we want to throw away the tuplestore immediately, not hold on to it for the rest of the transaction. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Sep 9, 2017 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> My first instinct is to get rid of DestroyTransitionCaptureState >> altogether, on the grounds that the TransitionCaptureState will >> go away at transaction cleanup and we can't really get rid of it >> any sooner than that. > End of transaction, or end of query? I'm not sure what happens when > triggers are deferred, but I think there are a lot of cases when we > want to throw away the tuplestore immediately, not hold on to it for > the rest of the transaction. As things stand, it's end of subtransaction, because the TCSes are allocated in CurTransactionContext. See the argument in MakeTransitionCaptureState. And yes, this is inefficient. The quick-hack patch I committed yesterday only pays the price if you have RI triggers cascading changes into a table that also has triggers-with-transition-tables, but I can certainly believe that it could get expensive in such a case. The fix proposal discussed downthread should fix the inefficiency as well as the spec compliance problem. But personally I'm far more worried about the latter. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Tue, Sep 12, 2017 at 5:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> Note the "uniquely identified" bit --- you aren't allowed to have multiple >> SCs for the same table and same kind of event, at least up to the bit >> about column lists. I've not fully wrapped my head around the column list >> part of it, but certainly all effects of a particular RI constraint will >> have the same column list. Aside from the RI case, the other user visible change in behaviour will be for statements that update the same table via multiple ModifyTable nodes (wCTEs). Our regression test has: with wcte as (insert into table1 values (42)) insert into table2 values ('hello world'); ... which demonstrates the fix for the original complaint that table1 and table2 earlier tried to use the same transition table (and crashed). A new variant inserting into table1 twice would show the difference. Today we get: postgres=# with wcte as (insert into table1 values (42)) insert into table1 values (43); NOTICE: trigger = table1_trig, new table = (43,) NOTICE: trigger = table1_trig, new table = (42,) INSERT 0 1 Presumably with your change there will be a single transition table for inserts into table, holding both (42,) and (43,). But will we fire the trigger once or twice? There is something fishy about making it fire twice but show the same tuples to both invocations (for example, it might break Kevin's proposed counting algorithm which this feature is intended to support), but firing only once requires some new inter-node co-ordination. > In other words, UPDATE triggers with different column lists potentially > need to see different transition tables, and any given row that was > updated might need to appear in some subset of those tables. > > This seems like rather a mess to implement. I wonder whether I'm > reading it right, and whether other DBMSes actually implement it > like that. I guess the alternative is storing extra per-tuple metadata, transferring more work to the reader. The DB2 documentation has this example[1]: CREATE TRIGGER REORDER AFTER UPDATE OF ON_HAND, MAX_STOCKED ON PARTS REFERENCING NEW_TABLE AS NTABLE FOR EACH STATEMENTMODE DB2SQL BEGIN ATOMIC SELECT ISSUE_SHIP_REQUEST(MAX_STOCKED - ON_HAND, PARTNO) FROM NTABLE WHERE (ON_HAND < 0.10 * MAX_STOCKED); END I can't find any explicit discussion of whether this trigger could ever see a transition row that results from an update that didn't name ON_HAND or MAX_STOCKED. I don't have DB2 access and I'm not sure how I'd test that... maybe with a self-referencing fk declared ON UPDATE CASCADE? Thanks to prodding from Peter Geoghegan we tackled the question of whether the <trigger action time> clause controls just when the trigger fires or also which transition tuples it sees. By looking at some wording relating to MERGE we concluded it must be both, culminating in commit 8c55244a which separates the UPDATEs and INSERTs resulting from INSERT ... ON CONFLICT DO UPDATE. That had the annoying result that we had to ban the use of (non-standard) "OR" in <trigger action time> when transition tables are in play. This FOR UPDATE OF ... transition business seems sort of similar, but would create arbitrarily many transition tables and require the executor to write into all of them, or perhaps store extra meta data along with captured rows for later filtering during scan. > I think that what might be a good short-term solution is to refuse > creation of column-specific triggers with transition tables (ie, > if you ask for a transition table then you can only say AFTER UPDATE > not AFTER UPDATE OF columnList). Then, all TT-using triggers are > interested in all modified rows and we don't have to distinguish > different column lists for the purposes of transition tables. > Then the problem reduces to one TCS per target table and event type, > which doesn't seem too hard to do. +1 Presumably would-be authors of triggers-with-transition-tables that would fire only AFTER UPDATE OF foo already have to deal with the possibility that you updated foo to the same value. So I don't think too much is lost, except perhaps some efficiency. Thinking a bit harder about whether you might have semantic (rather than performance) reasons to use AFTER UPDATE OF ... with subset TTs, it occurs to me that there may be implications for transition tables with inheritance. We decided to disallow transition tables on non-partition inheritance children anyway (see 501ed02c), but DB2 supports the equivalent. It has a system of inheritance ("typed tables", possibly conforming to SQL:1999 though I've never looked into that) but has different rules about when row triggers and statement triggers fire when you run DML statements on a table hierarchy. Don't quote me but it's something like our rules plus (1) row triggers of all supertables of an affected table also fire (unless created with CREATE TRIGGER ... ONLY), and (2) statement triggers of affected subtables also fire. Implementing that for our transition tables would probably require more tuplestores and/or dynamic tuple conversion and filtering during later scan. Perhaps AFTER UPDATE OF column_that_only_this_child_and_its_children_have would fire for direct and subtable updates but not via-the-supertable updates. That is currently completely irrelevant due to our set of supported features: different firing rules, and prohibition on children with transition tables. Some related topics might return soon when people get more experience with partitions and start wanting to declare row triggers on partitioned tables (perhaps including foreign key checks) or implement Kevin's clever batch-mode foreign key check concept. [1] https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/sqlref/src/tpc/db2z_sql_createtrigger.html -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Thomas Munro <thomas.munro@enterprisedb.com> writes: > Aside from the RI case, the other user visible change in behaviour > will be for statements that update the same table via multiple > ModifyTable nodes (wCTEs). Our regression test has: > with wcte as (insert into table1 values (42)) > insert into table2 values ('hello world'); > ... which demonstrates the fix for the original complaint that table1 > and table2 earlier tried to use the same transition table (and > crashed). A new variant inserting into table1 twice would show the > difference. Today we get: > postgres=# with wcte as (insert into table1 values (42)) > insert into table1 values (43); > NOTICE: trigger = table1_trig, new table = (43,) > NOTICE: trigger = table1_trig, new table = (42,) > INSERT 0 1 > Presumably with your change there will be a single transition table > for inserts into table, holding both (42,) and (43,). But will we > fire the trigger once or twice? Not necessarily. That would be true only if we allow the WCTE to share trigger context with the outer query, which I think it does not today. I've not checked the code, but presumably if we fire the trigger twice right now, that means there are separate trigger contexts, ie somebody is calling AfterTriggerBeginQuery/AfterTriggerEndQuery around the WCTE. If not, maybe we could make it do so. OTOH, looking at the text of the spec, I think it's darn hard to justify the behavior shown above. The reason that the RI case would share trigger context with the outer query is that we'd *not* call AfterTriggerBeginQuery/AfterTriggerEndQuery around the RI query, which would be driven by the same skip_trigger logic that exists today. >> This seems like rather a mess to implement. I wonder whether I'm >> reading it right, and whether other DBMSes actually implement it >> like that. > I guess the alternative is storing extra per-tuple metadata, > transferring more work to the reader. I really don't want any more per-tuple state. Adding the TCS link was costly enough in terms of how big the tuple queue storage is. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Tue, Sep 12, 2017 at 12:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > OTOH, looking at the text of > the spec, I think it's darn hard to justify the behavior shown above. Yeah. I assume we always fired statement triggers for each separate instance of the same table mentioned in a wCTE since they were invented. I just confirmed that that is the case in 9.6. That may not have been in the spirit of the spec, but it's hard to say because the spec doesn't have wCTEs IIUC and it mattered less because they didn't receive any data. Now that they can optionally see data resulting from modifications, it seems pretty hard to use this feature to build anything that consumes the transition data and has to be reliable (matview state, replication-like systems etc) if we make any choice other than (1) each instance of a given table fires a statement trigger separately and sees only the rows it touched, or (2) the statement trigger is fired once for all instances of a table, and sees all the transition tuples. Based on the SQL spec excerpts you've highlighted, I suppose it seems likely that if the spec had wCTEs it would expect them to work like (2). -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Thomas Munro <thomas.munro@enterprisedb.com> writes: > Aside from the RI case, the other user visible change in behaviour > will be for statements that update the same table via multiple > ModifyTable nodes (wCTEs). Our regression test has: > with wcte as (insert into table1 values (42)) > insert into table2 values ('hello world'); > ... which demonstrates the fix for the original complaint that table1 > and table2 earlier tried to use the same transition table (and > crashed). BTW, as I'm digging around in trigger.c, I can't help noticing that it provides a single "fdw_tuplestore" per trigger query level (a/k/a trigger execution context). I've not tried to test this, but it sure looks like a wCTE like your example above, directed at two separate foreign tables with triggers, would fail for exactly the same reason. That'd be a bug of pretty long standing. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Thu, Sep 14, 2017 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: >> Aside from the RI case, the other user visible change in behaviour >> will be for statements that update the same table via multiple >> ModifyTable nodes (wCTEs). Our regression test has: > >> with wcte as (insert into table1 values (42)) >> insert into table2 values ('hello world'); > >> ... which demonstrates the fix for the original complaint that table1 >> and table2 earlier tried to use the same transition table (and >> crashed). > > BTW, as I'm digging around in trigger.c, I can't help noticing that > it provides a single "fdw_tuplestore" per trigger query level (a/k/a > trigger execution context). I've not tried to test this, but it > sure looks like a wCTE like your example above, directed at two > separate foreign tables with triggers, would fail for exactly the > same reason. That'd be a bug of pretty long standing. I had the impression that that fdw_tuplestore was doing something a bit sneaky that actually works out OK: tuples get enqueued and later dequeued in exactly the same sequence as the after row trigger events that need them, so even though it seems to violate at least the POLA if not the spirit of tuplestores by storing tuples of potentially different types in one tuplestore, nothing bad should happen. I suppose it was by copying that coding that Kevin finished up with the initial bug that wCTEs mix stuff from different wCTEs and it all blows up, because it has no similar sequencing trick: triggers with transition tables were seeing all of them, and they weren't even guaranteed to be of the right type. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Thu, Sep 14, 2017 at 11:00 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Thu, Sep 14, 2017 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> BTW, as I'm digging around in trigger.c, I can't help noticing that >> it provides a single "fdw_tuplestore" per trigger query level (a/k/a >> trigger execution context). I've not tried to test this, but it >> sure looks like a wCTE like your example above, directed at two >> separate foreign tables with triggers, would fail for exactly the >> same reason. That'd be a bug of pretty long standing. > > I had the impression that that fdw_tuplestore was doing something a > bit sneaky that actually works out OK: tuples get enqueued and later > dequeued in exactly the same sequence as the after row trigger events > that need them, so even though it seems to violate at least the POLA > if not the spirit of tuplestores by storing tuples of potentially > different types in one tuplestore, nothing bad should happen. I > suppose it was by copying that coding that Kevin finished up with the > initial bug that wCTEs mix stuff from different wCTEs and it all blows > up, because it has no similar sequencing trick: triggers with > transition tables were seeing all of them, and they weren't even > guaranteed to be of the right type. Incidentally, understanding that made me wonder why we don't have a binary chunk-oriented in-memory-up-to-some-size-then-spill-to-disk spooling mechanism that could be used for the trigger queue itself (which currently doesn't know how to spill to disk and therefore can take your server out), including holding these tuple images directly (instead of spilling just the tuples in synchronised order with the in-memory trigger queue). -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Thomas Munro <thomas.munro@enterprisedb.com> writes: > On Thu, Sep 14, 2017 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> BTW, as I'm digging around in trigger.c, I can't help noticing that >> it provides a single "fdw_tuplestore" per trigger query level (a/k/a >> trigger execution context). I've not tried to test this, but it >> sure looks like a wCTE like your example above, directed at two >> separate foreign tables with triggers, would fail for exactly the >> same reason. That'd be a bug of pretty long standing. > I had the impression that that fdw_tuplestore was doing something a > bit sneaky that actually works out OK: tuples get enqueued and later > dequeued in exactly the same sequence as the after row trigger events > that need them, so even though it seems to violate at least the POLA > if not the spirit of tuplestores by storing tuples of potentially > different types in one tuplestore, nothing bad should happen. Oh? Now my fear level is up to 11, because it is completely trivial to cause triggers to fire in a different order than they were enqueued. All you need is a mix of deferrable and nondeferrable triggers. In fact, it also seems entirely broken that a per-query-level tuplestore is being used at all, because deferrable triggers might not get fired until some outer query level. [ Pokes around... ] Hm, looks like we get around that by forbidding constraint triggers on foreign tables, but I don't see anything in the CREATE TRIGGER man page saying that there's such a prohibition. And there's certainly no comments in the source code explaining this rickety set of requirements :-( regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Thomas Munro <thomas.munro@enterprisedb.com> writes: > Incidentally, understanding that made me wonder why we don't have a > binary chunk-oriented in-memory-up-to-some-size-then-spill-to-disk > spooling mechanism that could be used for the trigger queue itself > (which currently doesn't know how to spill to disk and therefore can > take your server out), including holding these tuple images directly > (instead of spilling just the tuples in synchronised order with the > in-memory trigger queue). The past discussions about spilling the trigger queue have generally concluded that by the time your event list was long enough to cause serious pain, you already had a query that was never gonna complete. That may be getting less true as time goes on, but I'm not sure --- seems like RAM capacity is growing faster than CPU speed. Anyway, that's why it never got done. Given the addition of transition tables, I suspect there will be even less motivation to fix it: the right thing to do with mass updates will be to use a TT with an after-statement trigger, and that fixes it by putting the bulk data into a spillable tuplestore. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Tue, Sep 12, 2017 at 1:22 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Tue, Sep 12, 2017 at 12:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> OTOH, looking at the text of >> the spec, I think it's darn hard to justify the behavior shown above. > > Yeah. I assume we always fired statement triggers for each separate > instance of the same table mentioned in a wCTE since they were > invented. I just confirmed that that is the case in 9.6. That may > not have been in the spirit of the spec, but it's hard to say because > the spec doesn't have wCTEs IIUC and it mattered less because they > didn't receive any data. > > Now that they can optionally see data resulting from modifications, it > seems pretty hard to use this feature to build anything that consumes > the transition data and has to be reliable (matview state, > replication-like systems etc) if we make any choice other than (1) > each instance of a given table fires a statement trigger separately > and sees only the rows it touched, or (2) the statement trigger is > fired once for all instances of a table, and sees all the transition > tuples. Based on the SQL spec excerpts you've highlighted, I suppose > it seems likely that if the spec had wCTEs it would expect them to > work like (2). So I guess there are about 3 parts to this puzzle: 1. Merging the transition tables when there are multiple wCTEs referencing the same table. Here's one idea: Rename MakeTransitionCaptureState() to GetTransitionCaptureState() and use a hash table keyed by table OID in afterTriggers.transition_capture_states[afterTriggers.query_depth] to find the TCS for the given TriggerDesc or create it if not found, so that all wCTEs find the same TransitionCaptureState object. The all existing callers continue to do what they're doing now, but they'll be sharing TCSs appropriately with other things in the plan. Note that TransitionCaptureState already holds tuplestores for each operation (INSERT, UPDATE, DELETE) so the OID of the table alone is a suitable key for the hash table (assuming we are ignoring the column-list part of the spec as you suggested). 2. Hiding the fact that we implement fk CASCADE using another level of queries. Perhaps we could arrange for afterTriggers.transition_capture_states[afterTriggers.query_depth] to point to the same hash table as query_depth - 1, so that the effects of statements at this implementation-internal level appear to the user as part of the the level below? 3. Merging the invocation after statement firing so that if you updated the same table directly and also via a wCTE and also indirectly via fk ON DELETE/UPDATE trigger then you still only get one invocation of the after statement trigger. Not sure exactly how... perhaps using a flag in the TransitionCaptureState to prevent multiple firings. As I argued in the above-quoted email, if we've merged the transition tables then we'll need to merge the trigger firing too or it won't be possible to make coherent integrity, summary, replication etc systems using TT triggers (even though that's a user-visible change in after statement firing behaviour for wCTEs compared to earlier releases). Does this make any sense? -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Fri, Sep 15, 2017 at 8:43 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Note that > TransitionCaptureState already holds tuplestores for each operation > (INSERT, UPDATE, DELETE) Erm, that's not quite true -- it only separates INSERT and UPDATE for now. It would need to be true, so it would need to gain one more to have the full set. > ... perhaps using a flag in the TransitionCaptureState to prevent multiple > firings. Of course that would need to be per-trigger, not just one flag per TCS. Also the change in firing rules for multiply-referenced tables would apply also when there are no TTs involved, so perhaps TCS is not a good place for that state. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Attached is a draft patch for this. Thomas Munro <thomas.munro@enterprisedb.com> writes: > 1. Merging the transition tables when there are multiple wCTEs > referencing the same table. Here's one idea: Rename > MakeTransitionCaptureState() to GetTransitionCaptureState() and use a > hash table keyed by table OID in > afterTriggers.transition_capture_states[afterTriggers.query_depth] to > find the TCS for the given TriggerDesc or create it if not found, so > that all wCTEs find the same TransitionCaptureState object. The all > existing callers continue to do what they're doing now, but they'll be > sharing TCSs appropriately with other things in the plan. Note that > TransitionCaptureState already holds tuplestores for each operation > (INSERT, UPDATE, DELETE) so the OID of the table alone is a suitable > key for the hash table (assuming we are ignoring the column-list part > of the spec as you suggested). It seems unsafe to merge the TCS objects themselves, because the callers assume that they can munge the tcs_map and tcs_original_insert_tuple fields freely without regard for any other callers. So as I have it, we still have a TCS for each caller, but the TCSes point at tuplestores that can be shared across multiple callers for the same event type. The tuplestores themselves are managed by the AfterTrigger data structures. Also, because the TCS structs are just throwaway per-caller data, it's uncool to reference them in the trigger event lists. So I replaced ats_transition_capture with two pointers to the actual tuplestores. That bloats AfterTriggerSharedData a bit but I think it's okay; we don't expect a lot of those structs in a normal query. I chose to make the persistent state (AfterTriggersTableData) independent for each operation type. We could have done that differently perhaps, but it seemed more complicated and less likely to match the spec's semantics. The INSERT ON CONFLICT UPDATE mess is handled by creating two separate TCSes with two different underlying AfterTriggersTableData structs. The insertion tuplestore sees only the inserted tuples, the update tuplestores see only the updated-pre-existing tuples. That adds a little code to nodeModifyTable but it seems conceptually much cleaner. > 2. Hiding the fact that we implement fk CASCADE using another level > of queries. Perhaps we could arrange for > afterTriggers.transition_capture_states[afterTriggers.query_depth] to > point to the same hash table as query_depth - 1, so that the effects > of statements at this implementation-internal level appear to the user > as part of the the level below? That already happens, because query_depth doesn't increment for an FK enforcement query --- we never call AfterTriggerBegin/EndQuery for it. > 3. Merging the invocation after statement firing so that if you > updated the same table directly and also via a wCTE and also > indirectly via fk ON DELETE/UPDATE trigger then you still only get one > invocation of the after statement trigger. Not sure exactly how... What I did here was to use the AfterTriggersTableData structs to hold a flag saying we'd already queued statement triggers for this rel and cmdType. There's probably more than one way to do that, but this seemed convenient. One thing I don't like too much about that is that it means there are cases where the single statement trigger firing would occur before some AFTER ROW trigger firings. Not sure if we promise anything about the ordering in the docs. It looks quite expensive/complicated to try to make it always happen afterwards, though, and it might well be totally impossible if triggers cause more table updates to occur. Because MakeTransitionCaptureState now depends on the trigger query level being active, I had to relocate the AfterTriggerBeginQuery calls to occur before it. In passing, I refactored the AfterTriggers data structures a bit so that we don't need to do as many palloc calls to manage them. Instead of several independent arrays there's now one array of structs. regards, tom lane diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index cfa3f05..c6fa445 100644 *** a/src/backend/commands/copy.c --- b/src/backend/commands/copy.c *************** CopyFrom(CopyState cstate) *** 2432,2443 **** /* Triggers might need a slot as well */ estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate); /* * If there are any triggers with transition tables on the named relation, * we need to be prepared to capture transition tuples. */ cstate->transition_capture = ! MakeTransitionCaptureState(cstate->rel->trigdesc); /* * If the named relation is a partitioned table, initialize state for --- 2432,2448 ---- /* Triggers might need a slot as well */ estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate); + /* Prepare to catch AFTER triggers. */ + AfterTriggerBeginQuery(); + /* * If there are any triggers with transition tables on the named relation, * we need to be prepared to capture transition tuples. */ cstate->transition_capture = ! MakeTransitionCaptureState(cstate->rel->trigdesc, ! RelationGetRelid(cstate->rel), ! CMD_INSERT); /* * If the named relation is a partitioned table, initialize state for *************** CopyFrom(CopyState cstate) *** 2513,2521 **** bufferedTuples = palloc(MAX_BUFFERED_TUPLES * sizeof(HeapTuple)); } - /* Prepare to catch AFTER triggers. */ - AfterTriggerBeginQuery(); - /* * Check BEFORE STATEMENT insertion triggers. It's debatable whether we * should do this for COPY, since it's not really an "INSERT" statement as --- 2518,2523 ---- diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c index 269c9e1..ee2ff04 100644 *** a/src/backend/commands/trigger.c --- b/src/backend/commands/trigger.c *************** CreateTrigger(CreateTrigStmt *stmt, cons *** 234,239 **** --- 234,244 ---- RelationGetRelationName(rel)), errdetail("Foreign tables cannot have TRUNCATE triggers."))); + /* + * We disallow constraint triggers to protect the assumption that + * triggers on FKs can't be deferred. See notes with AfterTriggers + * data structures, below. + */ if (stmt->isconstraint) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), *************** CreateTrigger(CreateTrigStmt *stmt, cons *** 418,423 **** --- 423,448 ---- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("transition tables cannot be specified for triggers with more than one event"))); + /* + * We currently don't allow column-specific triggers with + * transition tables. Per spec, that seems to require + * accumulating separate transition tables for each combination of + * columns, which is a lot of work for a rather marginal feature. + */ + if (stmt->columns != NIL) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("transition tables cannot be specified for triggers with column lists"))); + + /* + * We disallow constraint triggers with transition tables, to + * protect the assumption that such triggers can't be deferred. + * See notes with AfterTriggers data structures, below. + * + * Currently this is enforced by the grammar, so just Assert here. + */ + Assert(!stmt->isconstraint); + if (tt->isNew) { if (!(TRIGGER_FOR_INSERT(tgtype) || *************** FindTriggerIncompatibleWithInheritance(T *** 2086,2181 **** } /* - * Make a TransitionCaptureState object from a given TriggerDesc. The - * resulting object holds the flags which control whether transition tuples - * are collected when tables are modified, and the tuplestores themselves. - * Note that we copy the flags from a parent table into this struct (rather - * than using each relation's TriggerDesc directly) so that we can use it to - * control the collection of transition tuples from child tables. - * - * If there are no triggers with transition tables configured for 'trigdesc', - * then return NULL. - * - * The resulting object can be passed to the ExecAR* functions. The caller - * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing - * with child tables. - */ - TransitionCaptureState * - MakeTransitionCaptureState(TriggerDesc *trigdesc) - { - TransitionCaptureState *state = NULL; - - if (trigdesc != NULL && - (trigdesc->trig_delete_old_table || trigdesc->trig_update_old_table || - trigdesc->trig_update_new_table || trigdesc->trig_insert_new_table)) - { - MemoryContext oldcxt; - ResourceOwner saveResourceOwner; - - /* - * Normally DestroyTransitionCaptureState should be called after - * executing all AFTER triggers for the current statement. - * - * To handle error cleanup, TransitionCaptureState and the tuplestores - * it contains will live in the current [sub]transaction's memory - * context. Likewise for the current resource owner, because we also - * want to clean up temporary files spilled to disk by the tuplestore - * in that scenario. This scope is sufficient, because AFTER triggers - * with transition tables cannot be deferred (only constraint triggers - * can be deferred, and constraint triggers cannot have transition - * tables). The AFTER trigger queue may contain pointers to this - * TransitionCaptureState, but any such entries will be processed or - * discarded before the end of the current [sub]transaction. - * - * If a future release allows deferred triggers with transition - * tables, we'll need to reconsider the scope of the - * TransitionCaptureState object. - */ - oldcxt = MemoryContextSwitchTo(CurTransactionContext); - saveResourceOwner = CurrentResourceOwner; - - state = (TransitionCaptureState *) - palloc0(sizeof(TransitionCaptureState)); - state->tcs_delete_old_table = trigdesc->trig_delete_old_table; - state->tcs_update_old_table = trigdesc->trig_update_old_table; - state->tcs_update_new_table = trigdesc->trig_update_new_table; - state->tcs_insert_new_table = trigdesc->trig_insert_new_table; - PG_TRY(); - { - CurrentResourceOwner = CurTransactionResourceOwner; - if (trigdesc->trig_delete_old_table || trigdesc->trig_update_old_table) - state->tcs_old_tuplestore = tuplestore_begin_heap(false, false, work_mem); - if (trigdesc->trig_insert_new_table) - state->tcs_insert_tuplestore = tuplestore_begin_heap(false, false, work_mem); - if (trigdesc->trig_update_new_table) - state->tcs_update_tuplestore = tuplestore_begin_heap(false, false, work_mem); - } - PG_CATCH(); - { - CurrentResourceOwner = saveResourceOwner; - PG_RE_THROW(); - } - PG_END_TRY(); - CurrentResourceOwner = saveResourceOwner; - MemoryContextSwitchTo(oldcxt); - } - - return state; - } - - void - DestroyTransitionCaptureState(TransitionCaptureState *tcs) - { - if (tcs->tcs_insert_tuplestore != NULL) - tuplestore_end(tcs->tcs_insert_tuplestore); - if (tcs->tcs_update_tuplestore != NULL) - tuplestore_end(tcs->tcs_update_tuplestore); - if (tcs->tcs_old_tuplestore != NULL) - tuplestore_end(tcs->tcs_old_tuplestore); - pfree(tcs); - } - - /* * Call a trigger function. * * trigdata: trigger descriptor. --- 2111,2116 ---- *************** TriggerEnabled(EState *estate, ResultRel *** 3338,3346 **** * during the current transaction tree. (BEFORE triggers are fired * immediately so we don't need any persistent state about them.) The struct * and most of its subsidiary data are kept in TopTransactionContext; however ! * the individual event records are kept in a separate sub-context. This is ! * done mainly so that it's easy to tell from a memory context dump how much ! * space is being eaten by trigger events. * * Because the list of pending events can grow large, we go to some * considerable effort to minimize per-event memory consumption. The event --- 3273,3283 ---- * during the current transaction tree. (BEFORE triggers are fired * immediately so we don't need any persistent state about them.) The struct * and most of its subsidiary data are kept in TopTransactionContext; however ! * some data that can be discarded sooner appears in the CurTransactionContext ! * of the relevant subtransaction. Also, the individual event records are ! * kept in a separate sub-context of TopTransactionContext. This is done ! * mainly so that it's easy to tell from a memory context dump how much space ! * is being eaten by trigger events. * * Because the list of pending events can grow large, we go to some * considerable effort to minimize per-event memory consumption. The event *************** typedef SetConstraintStateData *SetConst *** 3400,3405 **** --- 3337,3349 ---- * tuple(s). This permits storing tuples once regardless of the number of * row-level triggers on a foreign table. * + * Note that we need triggers on foreign tables to be fired in exactly the + * order they were queued, so that the tuples come out of the tuplestore in + * the right order. To ensure that, we forbid deferrable (constraint) + * triggers on foreign tables. This also ensures that such triggers do not + * get deferred into outer trigger query levels, meaning that it's okay to + * destroy the tuplestore at the end of the query level. + * * Statement-level triggers always bear AFTER_TRIGGER_1CTID, though they * require no ctid field. We lack the flag bit space to neatly represent that * distinct case, and it seems unlikely to be worth much trouble. *************** typedef struct AfterTriggerSharedData *** 3433,3439 **** Oid ats_tgoid; /* the trigger's ID */ Oid ats_relid; /* the relation it's on */ CommandId ats_firing_id; /* ID for firing cycle */ ! TransitionCaptureState *ats_transition_capture; } AfterTriggerSharedData; typedef struct AfterTriggerEventData *AfterTriggerEvent; --- 3377,3384 ---- Oid ats_tgoid; /* the trigger's ID */ Oid ats_relid; /* the relation it's on */ CommandId ats_firing_id; /* ID for firing cycle */ ! Tuplestorestate *ats_old_tuplestore; /* possible OLD transition table */ ! Tuplestorestate *ats_new_tuplestore; /* possible NEW transition table */ } AfterTriggerSharedData; typedef struct AfterTriggerEventData *AfterTriggerEvent; *************** typedef struct AfterTriggerEventList *** 3529,3588 **** * query_depth is the current depth of nested AfterTriggerBeginQuery calls * (-1 when the stack is empty). * ! * query_stack[query_depth] is a list of AFTER trigger events queued by the ! * current query (and the query_stack entries below it are lists of trigger ! * events queued by calling queries). None of these are valid until the ! * matching AfterTriggerEndQuery call occurs. At that point we fire ! * immediate-mode triggers, and append any deferred events to the main events ! * list. * ! * fdw_tuplestores[query_depth] is a tuplestore containing the foreign tuples ! * needed for the current query. * ! * maxquerydepth is just the allocated length of query_stack and the ! * tuplestores. * ! * state_stack is a stack of pointers to saved copies of the SET CONSTRAINTS ! * state data; each subtransaction level that modifies that state first * saves a copy, which we use to restore the state if we abort. * ! * events_stack is a stack of copies of the events head/tail pointers, * which we use to restore those values during subtransaction abort. * ! * depth_stack is a stack of copies of subtransaction-start-time query_depth, * which we similarly use to clean up at subtransaction abort. * ! * firing_stack is a stack of copies of subtransaction-start-time ! * firing_counter. We use this to recognize which deferred triggers were ! * fired (or marked for firing) within an aborted subtransaction. * * We use GetCurrentTransactionNestLevel() to determine the correct array ! * index in these stacks. maxtransdepth is the number of allocated entries in ! * each stack. (By not keeping our own stack pointer, we can avoid trouble * in cases where errors during subxact abort cause multiple invocations * of AfterTriggerEndSubXact() at the same nesting depth.) */ typedef struct AfterTriggersData { CommandId firing_counter; /* next firing ID to assign */ SetConstraintState state; /* the active S C state */ AfterTriggerEventList events; /* deferred-event list */ - int query_depth; /* current query list index */ - AfterTriggerEventList *query_stack; /* events pending from each query */ - Tuplestorestate **fdw_tuplestores; /* foreign tuples for one row from - * each query */ - int maxquerydepth; /* allocated len of above array */ MemoryContext event_cxt; /* memory context for events, if any */ ! /* these fields are just for resetting at subtrans abort: */ ! SetConstraintState *state_stack; /* stacked S C states */ ! AfterTriggerEventList *events_stack; /* stacked list pointers */ ! int *depth_stack; /* stacked query_depths */ ! CommandId *firing_stack; /* stacked firing_counters */ ! int maxtransdepth; /* allocated len of above arrays */ } AfterTriggersData; static AfterTriggersData afterTriggers; static void AfterTriggerExecute(AfterTriggerEvent event, --- 3474,3579 ---- * query_depth is the current depth of nested AfterTriggerBeginQuery calls * (-1 when the stack is empty). * ! * query_stack[query_depth] is the per-query-level data, including these fields: * ! * events is a list of AFTER trigger events queued by the current query. ! * None of these are valid until the matching AfterTriggerEndQuery call ! * occurs. At that point we fire immediate-mode triggers, and append any ! * deferred events to the main events list. * ! * fdw_tuplestore is a tuplestore containing the foreign-table tuples ! * needed by events queued by the current query. (Note: we use just one ! * tuplestore even though more than one foreign table might be involved. ! * This is okay because tuplestores don't really care what's in the tuples ! * they store; but it's possible that someday it'd break.) * ! * tables is a List of AfterTriggersTableData structs for target tables ! * of the current query (see below). ! * ! * maxquerydepth is just the allocated length of query_stack. ! * ! * trans_stack holds per-subtransaction data, including these fields: ! * ! * state is NULL or a pointer to a saved copy of the SET CONSTRAINTS ! * state data. Each subtransaction level that modifies that state first * saves a copy, which we use to restore the state if we abort. * ! * events is a copy of the events head/tail pointers, * which we use to restore those values during subtransaction abort. * ! * query_depth is the subtransaction-start-time value of query_depth, * which we similarly use to clean up at subtransaction abort. * ! * firing_counter is the subtransaction-start-time value of firing_counter. ! * We use this to recognize which deferred triggers were fired (or marked ! * for firing) within an aborted subtransaction. * * We use GetCurrentTransactionNestLevel() to determine the correct array ! * index in trans_stack. maxtransdepth is the number of allocated entries in ! * trans_stack. (By not keeping our own stack pointer, we can avoid trouble * in cases where errors during subxact abort cause multiple invocations * of AfterTriggerEndSubXact() at the same nesting depth.) + * + * We create an AfterTriggersTableData struct for each target table of the + * current query, and each operation mode (INSERT/UPDATE/DELETE), that has + * either transition tables or AFTER STATEMENT triggers. This is used to + * hold the relevant transition tables, as well as a flag showing whether + * we already queued the AFTER STATEMENT triggers. We need the flag so + * that cases like multiple FK enforcement sub-queries targeting the same + * table don't fire such triggers more than once. These structs, along with + * the transition table tuplestores, live in the (sub)transaction's + * CurTransactionContext. That's sufficient lifespan because we don't allow + * transition tables to be used by deferrable triggers, so they only need + * to survive until AfterTriggerEndQuery. */ + typedef struct AfterTriggersQueryData AfterTriggersQueryData; + typedef struct AfterTriggersTransData AfterTriggersTransData; + typedef struct AfterTriggersTableData AfterTriggersTableData; + typedef struct AfterTriggersData { CommandId firing_counter; /* next firing ID to assign */ SetConstraintState state; /* the active S C state */ AfterTriggerEventList events; /* deferred-event list */ MemoryContext event_cxt; /* memory context for events, if any */ ! /* per-query-level data: */ ! AfterTriggersQueryData *query_stack; /* array of structs shown above */ ! int query_depth; /* current index in above array */ ! int maxquerydepth; /* allocated len of above array */ ! /* per-subtransaction-level data: */ ! AfterTriggersTransData *trans_stack; /* array of structs shown above */ ! int maxtransdepth; /* allocated len of above array */ } AfterTriggersData; + struct AfterTriggersQueryData + { + AfterTriggerEventList events; /* events pending from this query */ + Tuplestorestate *fdw_tuplestore; /* foreign tuples for said events */ + List *tables; /* list of AfterTriggersTableData */ + }; + + struct AfterTriggersTransData + { + /* these fields are just for resetting at subtrans abort: */ + SetConstraintState state; /* saved S C state, or NULL if not yet saved */ + AfterTriggerEventList events; /* saved list pointer */ + int query_depth; /* saved query_depth */ + CommandId firing_counter; /* saved firing_counter */ + }; + + struct AfterTriggersTableData + { + /* relid + cmdType form the lookup key for these structs: */ + Oid relid; /* target table's OID */ + CmdType cmdType; /* event type, CMD_INSERT/UPDATE/DELETE */ + bool closed; /* true when no longer OK to add tuples */ + bool stmt_trig_done; /* did we already queue stmt-level triggers? */ + Tuplestorestate *old_tuplestore; /* "old" transition table, if any */ + Tuplestorestate *new_tuplestore; /* "new" transition table, if any */ + }; + static AfterTriggersData afterTriggers; static void AfterTriggerExecute(AfterTriggerEvent event, *************** static void AfterTriggerExecute(AfterTri *** 3591,3598 **** Instrumentation *instr, MemoryContext per_tuple_context, TupleTableSlot *trig_tuple_slot1, ! TupleTableSlot *trig_tuple_slot2, ! TransitionCaptureState *transition_capture); static SetConstraintState SetConstraintStateCreate(int numalloc); static SetConstraintState SetConstraintStateCopy(SetConstraintState state); static SetConstraintState SetConstraintStateAddItem(SetConstraintState state, --- 3582,3591 ---- Instrumentation *instr, MemoryContext per_tuple_context, TupleTableSlot *trig_tuple_slot1, ! TupleTableSlot *trig_tuple_slot2); ! static AfterTriggersTableData *GetAfterTriggersTableData(Oid relid, ! CmdType cmdType); ! static void AfterTriggerFreeQuery(AfterTriggersQueryData *qs); static SetConstraintState SetConstraintStateCreate(int numalloc); static SetConstraintState SetConstraintStateCopy(SetConstraintState state); static SetConstraintState SetConstraintStateAddItem(SetConstraintState state, *************** static SetConstraintState SetConstraintS *** 3600,3628 **** /* ! * Gets a current query transition tuplestore and initializes it if necessary. */ static Tuplestorestate * ! GetTriggerTransitionTuplestore(Tuplestorestate **tss) { Tuplestorestate *ret; ! ret = tss[afterTriggers.query_depth]; if (ret == NULL) { MemoryContext oldcxt; ResourceOwner saveResourceOwner; /* ! * Make the tuplestore valid until end of transaction. This is the ! * allocation lifespan of the associated events list, but we really * only need it until AfterTriggerEndQuery(). */ ! oldcxt = MemoryContextSwitchTo(TopTransactionContext); saveResourceOwner = CurrentResourceOwner; PG_TRY(); { ! CurrentResourceOwner = TopTransactionResourceOwner; ret = tuplestore_begin_heap(false, false, work_mem); } PG_CATCH(); --- 3593,3621 ---- /* ! * Get the FDW tuplestore for the current trigger query level, creating it ! * if necessary. */ static Tuplestorestate * ! GetCurrentFDWTuplestore(void) { Tuplestorestate *ret; ! ret = afterTriggers.query_stack[afterTriggers.query_depth].fdw_tuplestore; if (ret == NULL) { MemoryContext oldcxt; ResourceOwner saveResourceOwner; /* ! * Make the tuplestore valid until end of subtransaction. We really * only need it until AfterTriggerEndQuery(). */ ! oldcxt = MemoryContextSwitchTo(CurTransactionContext); saveResourceOwner = CurrentResourceOwner; PG_TRY(); { ! CurrentResourceOwner = CurTransactionResourceOwner; ret = tuplestore_begin_heap(false, false, work_mem); } PG_CATCH(); *************** GetTriggerTransitionTuplestore(Tuplestor *** 3634,3640 **** CurrentResourceOwner = saveResourceOwner; MemoryContextSwitchTo(oldcxt); ! tss[afterTriggers.query_depth] = ret; } return ret; --- 3627,3633 ---- CurrentResourceOwner = saveResourceOwner; MemoryContextSwitchTo(oldcxt); ! afterTriggers.query_stack[afterTriggers.query_depth].fdw_tuplestore = ret; } return ret; *************** afterTriggerAddEvent(AfterTriggerEventLi *** 3780,3786 **** if (newshared->ats_tgoid == evtshared->ats_tgoid && newshared->ats_relid == evtshared->ats_relid && newshared->ats_event == evtshared->ats_event && ! newshared->ats_transition_capture == evtshared->ats_transition_capture && newshared->ats_firing_id == 0) break; } --- 3773,3780 ---- if (newshared->ats_tgoid == evtshared->ats_tgoid && newshared->ats_relid == evtshared->ats_relid && newshared->ats_event == evtshared->ats_event && ! newshared->ats_old_tuplestore == evtshared->ats_old_tuplestore && ! newshared->ats_new_tuplestore == evtshared->ats_new_tuplestore && newshared->ats_firing_id == 0) break; } *************** AfterTriggerExecute(AfterTriggerEvent ev *** 3892,3899 **** FmgrInfo *finfo, Instrumentation *instr, MemoryContext per_tuple_context, TupleTableSlot *trig_tuple_slot1, ! TupleTableSlot *trig_tuple_slot2, ! TransitionCaptureState *transition_capture) { AfterTriggerShared evtshared = GetTriggerSharedData(event); Oid tgoid = evtshared->ats_tgoid; --- 3886,3892 ---- FmgrInfo *finfo, Instrumentation *instr, MemoryContext per_tuple_context, TupleTableSlot *trig_tuple_slot1, ! TupleTableSlot *trig_tuple_slot2) { AfterTriggerShared evtshared = GetTriggerSharedData(event); Oid tgoid = evtshared->ats_tgoid; *************** AfterTriggerExecute(AfterTriggerEvent ev *** 3934,3942 **** { case AFTER_TRIGGER_FDW_FETCH: { ! Tuplestorestate *fdw_tuplestore = ! GetTriggerTransitionTuplestore ! (afterTriggers.fdw_tuplestores); if (!tuplestore_gettupleslot(fdw_tuplestore, true, false, trig_tuple_slot1)) --- 3927,3933 ---- { case AFTER_TRIGGER_FDW_FETCH: { ! Tuplestorestate *fdw_tuplestore = GetCurrentFDWTuplestore(); if (!tuplestore_gettupleslot(fdw_tuplestore, true, false, trig_tuple_slot1)) *************** AfterTriggerExecute(AfterTriggerEvent ev *** 4008,4043 **** /* * Set up the tuplestore information. */ ! LocTriggerData.tg_oldtable = LocTriggerData.tg_newtable = NULL; ! if (transition_capture != NULL) ! { ! if (LocTriggerData.tg_trigger->tgoldtable) ! LocTriggerData.tg_oldtable = transition_capture->tcs_old_tuplestore; ! if (LocTriggerData.tg_trigger->tgnewtable) ! { ! /* ! * Currently a trigger with transition tables may only be defined ! * for a single event type (here AFTER INSERT or AFTER UPDATE, but ! * not AFTER INSERT OR ...). ! */ ! Assert((TRIGGER_FOR_INSERT(LocTriggerData.tg_trigger->tgtype) != 0) ^ ! (TRIGGER_FOR_UPDATE(LocTriggerData.tg_trigger->tgtype) != 0)); ! /* ! * Show either the insert or update new tuple images, depending on ! * which event type the trigger was registered for. A single ! * statement may have produced both in the case of INSERT ... ON ! * CONFLICT ... DO UPDATE, and in that case the event determines ! * which tuplestore the trigger sees as the NEW TABLE. ! */ ! if (TRIGGER_FOR_INSERT(LocTriggerData.tg_trigger->tgtype)) ! LocTriggerData.tg_newtable = ! transition_capture->tcs_insert_tuplestore; ! else ! LocTriggerData.tg_newtable = ! transition_capture->tcs_update_tuplestore; ! } ! } /* * Setup the remaining trigger information --- 3999,4013 ---- /* * Set up the tuplestore information. */ ! if (LocTriggerData.tg_trigger->tgoldtable) ! LocTriggerData.tg_oldtable = evtshared->ats_old_tuplestore; ! else ! LocTriggerData.tg_oldtable = NULL; ! if (LocTriggerData.tg_trigger->tgnewtable) ! LocTriggerData.tg_newtable = evtshared->ats_new_tuplestore; ! else ! LocTriggerData.tg_newtable = NULL; /* * Setup the remaining trigger information *************** afterTriggerInvokeEvents(AfterTriggerEve *** 4245,4252 **** * won't try to re-fire it. */ AfterTriggerExecute(event, rel, trigdesc, finfo, instr, ! per_tuple_context, slot1, slot2, ! evtshared->ats_transition_capture); /* * Mark the event as done. --- 4215,4221 ---- * won't try to re-fire it. */ AfterTriggerExecute(event, rel, trigdesc, finfo, instr, ! per_tuple_context, slot1, slot2); /* * Mark the event as done. *************** afterTriggerInvokeEvents(AfterTriggerEve *** 4296,4301 **** --- 4265,4433 ---- } + /* + * GetAfterTriggersTableData + * + * Find or create an AfterTriggersTableData struct for the specified + * trigger event (relation + operation type). Ignore existing structs + * marked "closed"; we don't want to put any additional tuples into them, + * nor change their triggers-fired flags. + * + * Note: the AfterTriggersTableData list is allocated in the current + * (sub)transaction's CurTransactionContext. This is OK because + * we don't need it to live past AfterTriggerEndQuery. + */ + static AfterTriggersTableData * + GetAfterTriggersTableData(Oid relid, CmdType cmdType) + { + AfterTriggersTableData *table; + AfterTriggersQueryData *qs; + MemoryContext oldcxt; + ListCell *lc; + + /* Caller should have ensured query_depth is OK. */ + Assert(afterTriggers.query_depth >= 0 && + afterTriggers.query_depth < afterTriggers.maxquerydepth); + qs = &afterTriggers.query_stack[afterTriggers.query_depth]; + + foreach(lc, qs->tables) + { + table = (AfterTriggersTableData *) lfirst(lc); + if (table->relid == relid && table->cmdType == cmdType && + !table->closed) + return table; + } + + oldcxt = MemoryContextSwitchTo(CurTransactionContext); + + table = (AfterTriggersTableData *) palloc0(sizeof(AfterTriggersTableData)); + table->relid = relid; + table->cmdType = cmdType; + qs->tables = lappend(qs->tables, table); + + MemoryContextSwitchTo(oldcxt); + + return table; + } + + + /* + * MakeTransitionCaptureState + * + * Make a TransitionCaptureState object for the given TriggerDesc, target + * relation, and operation type. The TCS object holds all the state needed + * to decide whether to capture tuples in transition tables. + * + * If there are no triggers in 'trigdesc' that request relevant transition + * tables, then return NULL. + * + * The resulting object can be passed to the ExecAR* functions. The caller + * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing + * with child tables. + * + * Note that we copy the flags from a parent table into this struct (rather + * than subsequently using the relation's TriggerDesc directly) so that we can + * use it to control collection of transition tuples from child tables. + * + * Per SQL spec, all operations of the same kind (INSERT/UPDATE/DELETE) + * on the same table during one query should share one transition table. + * Therefore, the Tuplestores are owned by an AfterTriggersTableData struct + * looked up using the table OID + CmdType, and are merely referenced by + * the TransitionCaptureState objects we hand out to callers. + */ + TransitionCaptureState * + MakeTransitionCaptureState(TriggerDesc *trigdesc, Oid relid, CmdType cmdType) + { + TransitionCaptureState *state; + bool need_old, + need_new; + AfterTriggersTableData *table; + MemoryContext oldcxt; + ResourceOwner saveResourceOwner; + + if (trigdesc == NULL) + return NULL; + + /* Detect which table(s) we need. */ + switch (cmdType) + { + case CMD_INSERT: + need_old = false; + need_new = trigdesc->trig_insert_new_table; + break; + case CMD_UPDATE: + need_old = trigdesc->trig_update_old_table; + need_new = trigdesc->trig_update_new_table; + break; + case CMD_DELETE: + need_old = trigdesc->trig_delete_old_table; + need_new = false; + break; + default: + elog(ERROR, "unexpected CmdType: %d", (int) cmdType); + need_old = need_new = false; /* keep compiler quiet */ + break; + } + if (!need_old && !need_new) + return NULL; + + /* Check state, like AfterTriggerSaveEvent. */ + if (afterTriggers.query_depth < 0) + elog(ERROR, "MakeTransitionCaptureState() called outside of query"); + + /* Be sure we have enough space to record events at this query depth. */ + if (afterTriggers.query_depth >= afterTriggers.maxquerydepth) + AfterTriggerEnlargeQueryState(); + + /* + * Find or create an AfterTriggersTableData struct to hold the + * tuplestore(s). If there's a matching struct but it's marked closed, + * ignore it; we need a newer one. + * + * Note: the AfterTriggersTableData list, as well as the tuplestores, are + * allocated in the current (sub)transaction's CurTransactionContext, and + * the tuplestores are managed by the (sub)transaction's resource owner. + * This is sufficient lifespan because we do not allow triggers using + * transition tables to be deferrable; they will be fired during + * AfterTriggerEndQuery, after which it's okay to delete the data. + */ + table = GetAfterTriggersTableData(relid, cmdType); + + /* Now create required tuplestore(s), if we don't have them already. */ + oldcxt = MemoryContextSwitchTo(CurTransactionContext); + saveResourceOwner = CurrentResourceOwner; + PG_TRY(); + { + CurrentResourceOwner = CurTransactionResourceOwner; + if (need_old && table->old_tuplestore == NULL) + table->old_tuplestore = tuplestore_begin_heap(false, false, work_mem); + if (need_new && table->new_tuplestore == NULL) + table->new_tuplestore = tuplestore_begin_heap(false, false, work_mem); + } + PG_CATCH(); + { + CurrentResourceOwner = saveResourceOwner; + PG_RE_THROW(); + } + PG_END_TRY(); + CurrentResourceOwner = saveResourceOwner; + MemoryContextSwitchTo(oldcxt); + + /* Now build the TransitionCaptureState struct, in caller's context */ + state = (TransitionCaptureState *) palloc0(sizeof(TransitionCaptureState)); + state->tcs_delete_old_table = trigdesc->trig_delete_old_table; + state->tcs_update_old_table = trigdesc->trig_update_old_table; + state->tcs_update_new_table = trigdesc->trig_update_new_table; + state->tcs_insert_new_table = trigdesc->trig_insert_new_table; + if (need_old) + state->tcs_old_tuplestore = table->old_tuplestore; + if (need_new) + state->tcs_new_tuplestore = table->new_tuplestore; + + return state; + } + + /* ---------- * AfterTriggerBeginXact() * *************** AfterTriggerBeginXact(void) *** 4319,4332 **** */ Assert(afterTriggers.state == NULL); Assert(afterTriggers.query_stack == NULL); - Assert(afterTriggers.fdw_tuplestores == NULL); Assert(afterTriggers.maxquerydepth == 0); Assert(afterTriggers.event_cxt == NULL); Assert(afterTriggers.events.head == NULL); ! Assert(afterTriggers.state_stack == NULL); ! Assert(afterTriggers.events_stack == NULL); ! Assert(afterTriggers.depth_stack == NULL); ! Assert(afterTriggers.firing_stack == NULL); Assert(afterTriggers.maxtransdepth == 0); } --- 4451,4460 ---- */ Assert(afterTriggers.state == NULL); Assert(afterTriggers.query_stack == NULL); Assert(afterTriggers.maxquerydepth == 0); Assert(afterTriggers.event_cxt == NULL); Assert(afterTriggers.events.head == NULL); ! Assert(afterTriggers.trans_stack == NULL); Assert(afterTriggers.maxtransdepth == 0); } *************** AfterTriggerBeginQuery(void) *** 4362,4370 **** void AfterTriggerEndQuery(EState *estate) { - AfterTriggerEventList *events; - Tuplestorestate *fdw_tuplestore; - /* Must be inside a query, too */ Assert(afterTriggers.query_depth >= 0); --- 4490,4495 ---- *************** AfterTriggerEndQuery(EState *estate) *** 4393,4430 **** * will instead fire any triggers in a dedicated query level. Foreign key * enforcement triggers do add to the current query level, thanks to their * passing fire_triggers = false to SPI_execute_snapshot(). Other ! * C-language triggers might do likewise. Be careful here: firing a ! * trigger could result in query_stack being repalloc'd, so we can't save ! * its address across afterTriggerInvokeEvents calls. * * If we find no firable events, we don't have to increment * firing_counter. */ for (;;) { ! events = &afterTriggers.query_stack[afterTriggers.query_depth]; ! if (afterTriggerMarkEvents(events, &afterTriggers.events, true)) { CommandId firing_id = afterTriggers.firing_counter++; /* OK to delete the immediate events after processing them */ ! if (afterTriggerInvokeEvents(events, firing_id, estate, true)) break; /* all fired */ } else break; } ! /* Release query-local storage for events, including tuplestore if any */ ! fdw_tuplestore = afterTriggers.fdw_tuplestores[afterTriggers.query_depth]; ! if (fdw_tuplestore) { ! tuplestore_end(fdw_tuplestore); ! afterTriggers.fdw_tuplestores[afterTriggers.query_depth] = NULL; } - afterTriggerFreeEventList(&afterTriggers.query_stack[afterTriggers.query_depth]); ! afterTriggers.query_depth--; } --- 4518,4616 ---- * will instead fire any triggers in a dedicated query level. Foreign key * enforcement triggers do add to the current query level, thanks to their * passing fire_triggers = false to SPI_execute_snapshot(). Other ! * C-language triggers might do likewise. * * If we find no firable events, we don't have to increment * firing_counter. */ for (;;) { ! AfterTriggersQueryData *qs; ! ListCell *lc; ! ! /* ! * Firing a trigger could result in query_stack being repalloc'd, so ! * we must recalculate qs after each afterTriggerInvokeEvents call. ! */ ! qs = &afterTriggers.query_stack[afterTriggers.query_depth]; ! ! /* ! * Before each firing cycle, mark all existing transition tables ! * "closed", so that their contents can't change anymore. If someone ! * causes more updates, they'll go into new transition tables. ! */ ! foreach(lc, qs->tables) ! { ! AfterTriggersTableData *table = (AfterTriggersTableData *) lfirst(lc); ! ! table->closed = true; ! } ! ! if (afterTriggerMarkEvents(&qs->events, &afterTriggers.events, true)) { CommandId firing_id = afterTriggers.firing_counter++; /* OK to delete the immediate events after processing them */ ! if (afterTriggerInvokeEvents(&qs->events, firing_id, estate, true)) break; /* all fired */ } else break; } ! /* Release query-level-local storage, including tuplestores if any */ ! AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]); ! ! afterTriggers.query_depth--; ! } ! ! ! /* ! * AfterTriggerFreeQuery ! * Release subsidiary storage for a trigger query level. ! * This includes closing down tuplestores. ! * Note: it's important for this to be safe if interrupted by an error ! * and then called again for the same query level. ! */ ! static void ! AfterTriggerFreeQuery(AfterTriggersQueryData *qs) ! { ! Tuplestorestate *ts; ! List *tables; ! ListCell *lc; ! ! /* Drop the trigger events */ ! afterTriggerFreeEventList(&qs->events); ! ! /* Drop FDW tuplestore if any */ ! ts = qs->fdw_tuplestore; ! qs->fdw_tuplestore = NULL; ! if (ts) ! tuplestore_end(ts); ! ! /* Release per-table subsidiary storage */ ! tables = qs->tables; ! foreach(lc, tables) { ! AfterTriggersTableData *table = (AfterTriggersTableData *) lfirst(lc); ! ! ts = table->old_tuplestore; ! table->old_tuplestore = NULL; ! if (ts) ! tuplestore_end(ts); ! ts = table->new_tuplestore; ! table->new_tuplestore = NULL; ! if (ts) ! tuplestore_end(ts); } ! /* ! * Now free the AfterTriggersTableData structs and list cells. Reset list ! * pointer first; if list_free_deep somehow gets an error, better to leak ! * that storage than have an infinite loop. ! */ ! qs->tables = NIL; ! list_free_deep(tables); } *************** AfterTriggerEndXact(bool isCommit) *** 4521,4530 **** * large, we let the eventual reset of TopTransactionContext free the * memory instead of doing it here. */ ! afterTriggers.state_stack = NULL; ! afterTriggers.events_stack = NULL; ! afterTriggers.depth_stack = NULL; ! afterTriggers.firing_stack = NULL; afterTriggers.maxtransdepth = 0; --- 4707,4713 ---- * large, we let the eventual reset of TopTransactionContext free the * memory instead of doing it here. */ ! afterTriggers.trans_stack = NULL; afterTriggers.maxtransdepth = 0; *************** AfterTriggerEndXact(bool isCommit) *** 4534,4540 **** * memory here. */ afterTriggers.query_stack = NULL; - afterTriggers.fdw_tuplestores = NULL; afterTriggers.maxquerydepth = 0; afterTriggers.state = NULL; --- 4717,4722 ---- *************** AfterTriggerBeginSubXact(void) *** 4553,4600 **** int my_level = GetCurrentTransactionNestLevel(); /* ! * Allocate more space in the stacks if needed. (Note: because the * minimum nest level of a subtransaction is 2, we waste the first couple ! * entries of each array; not worth the notational effort to avoid it.) */ while (my_level >= afterTriggers.maxtransdepth) { if (afterTriggers.maxtransdepth == 0) { ! MemoryContext old_cxt; ! ! old_cxt = MemoryContextSwitchTo(TopTransactionContext); ! ! #define DEFTRIG_INITALLOC 8 ! afterTriggers.state_stack = (SetConstraintState *) ! palloc(DEFTRIG_INITALLOC * sizeof(SetConstraintState)); ! afterTriggers.events_stack = (AfterTriggerEventList *) ! palloc(DEFTRIG_INITALLOC * sizeof(AfterTriggerEventList)); ! afterTriggers.depth_stack = (int *) ! palloc(DEFTRIG_INITALLOC * sizeof(int)); ! afterTriggers.firing_stack = (CommandId *) ! palloc(DEFTRIG_INITALLOC * sizeof(CommandId)); ! afterTriggers.maxtransdepth = DEFTRIG_INITALLOC; ! ! MemoryContextSwitchTo(old_cxt); } else { ! /* repalloc will keep the stacks in the same context */ int new_alloc = afterTriggers.maxtransdepth * 2; ! afterTriggers.state_stack = (SetConstraintState *) ! repalloc(afterTriggers.state_stack, ! new_alloc * sizeof(SetConstraintState)); ! afterTriggers.events_stack = (AfterTriggerEventList *) ! repalloc(afterTriggers.events_stack, ! new_alloc * sizeof(AfterTriggerEventList)); ! afterTriggers.depth_stack = (int *) ! repalloc(afterTriggers.depth_stack, ! new_alloc * sizeof(int)); ! afterTriggers.firing_stack = (CommandId *) ! repalloc(afterTriggers.firing_stack, ! new_alloc * sizeof(CommandId)); afterTriggers.maxtransdepth = new_alloc; } } --- 4735,4762 ---- int my_level = GetCurrentTransactionNestLevel(); /* ! * Allocate more space in the trans_stack if needed. (Note: because the * minimum nest level of a subtransaction is 2, we waste the first couple ! * entries of the array; not worth the notational effort to avoid it.) */ while (my_level >= afterTriggers.maxtransdepth) { if (afterTriggers.maxtransdepth == 0) { ! /* Arbitrarily initialize for max of 8 subtransaction levels */ ! afterTriggers.trans_stack = (AfterTriggersTransData *) ! MemoryContextAlloc(TopTransactionContext, ! 8 * sizeof(AfterTriggersTransData)); ! afterTriggers.maxtransdepth = 8; } else { ! /* repalloc will keep the stack in the same context */ int new_alloc = afterTriggers.maxtransdepth * 2; ! afterTriggers.trans_stack = (AfterTriggersTransData *) ! repalloc(afterTriggers.trans_stack, ! new_alloc * sizeof(AfterTriggersTransData)); afterTriggers.maxtransdepth = new_alloc; } } *************** AfterTriggerBeginSubXact(void) *** 4604,4613 **** * is not saved until/unless changed. Likewise, we don't make a * per-subtransaction event context until needed. */ ! afterTriggers.state_stack[my_level] = NULL; ! afterTriggers.events_stack[my_level] = afterTriggers.events; ! afterTriggers.depth_stack[my_level] = afterTriggers.query_depth; ! afterTriggers.firing_stack[my_level] = afterTriggers.firing_counter; } /* --- 4766,4775 ---- * is not saved until/unless changed. Likewise, we don't make a * per-subtransaction event context until needed. */ ! afterTriggers.trans_stack[my_level].state = NULL; ! afterTriggers.trans_stack[my_level].events = afterTriggers.events; ! afterTriggers.trans_stack[my_level].query_depth = afterTriggers.query_depth; ! afterTriggers.trans_stack[my_level].firing_counter = afterTriggers.firing_counter; } /* *************** AfterTriggerEndSubXact(bool isCommit) *** 4631,4700 **** { Assert(my_level < afterTriggers.maxtransdepth); /* If we saved a prior state, we don't need it anymore */ ! state = afterTriggers.state_stack[my_level]; if (state != NULL) pfree(state); /* this avoids double pfree if error later: */ ! afterTriggers.state_stack[my_level] = NULL; Assert(afterTriggers.query_depth == ! afterTriggers.depth_stack[my_level]); } else { /* * Aborting. It is possible subxact start failed before calling * AfterTriggerBeginSubXact, in which case we mustn't risk touching ! * stack levels that aren't there. */ if (my_level >= afterTriggers.maxtransdepth) return; /* ! * Release any event lists from queries being aborted, and restore * query_depth to its pre-subxact value. This assumes that a * subtransaction will not add events to query levels started in a * earlier transaction state. */ ! while (afterTriggers.query_depth > afterTriggers.depth_stack[my_level]) { if (afterTriggers.query_depth < afterTriggers.maxquerydepth) ! { ! Tuplestorestate *ts; ! ! ts = afterTriggers.fdw_tuplestores[afterTriggers.query_depth]; ! if (ts) ! { ! tuplestore_end(ts); ! afterTriggers.fdw_tuplestores[afterTriggers.query_depth] = NULL; ! } ! ! afterTriggerFreeEventList(&afterTriggers.query_stack[afterTriggers.query_depth]); ! } ! afterTriggers.query_depth--; } Assert(afterTriggers.query_depth == ! afterTriggers.depth_stack[my_level]); /* * Restore the global deferred-event list to its former length, * discarding any events queued by the subxact. */ afterTriggerRestoreEventList(&afterTriggers.events, ! &afterTriggers.events_stack[my_level]); /* * Restore the trigger state. If the saved state is NULL, then this * subxact didn't save it, so it doesn't need restoring. */ ! state = afterTriggers.state_stack[my_level]; if (state != NULL) { pfree(afterTriggers.state); afterTriggers.state = state; } /* this avoids double pfree if error later: */ ! afterTriggers.state_stack[my_level] = NULL; /* * Scan for any remaining deferred events that were marked DONE or IN --- 4793,4850 ---- { Assert(my_level < afterTriggers.maxtransdepth); /* If we saved a prior state, we don't need it anymore */ ! state = afterTriggers.trans_stack[my_level].state; if (state != NULL) pfree(state); /* this avoids double pfree if error later: */ ! afterTriggers.trans_stack[my_level].state = NULL; Assert(afterTriggers.query_depth == ! afterTriggers.trans_stack[my_level].query_depth); } else { /* * Aborting. It is possible subxact start failed before calling * AfterTriggerBeginSubXact, in which case we mustn't risk touching ! * trans_stack levels that aren't there. */ if (my_level >= afterTriggers.maxtransdepth) return; /* ! * Release query-level storage for queries being aborted, and restore * query_depth to its pre-subxact value. This assumes that a * subtransaction will not add events to query levels started in a * earlier transaction state. */ ! while (afterTriggers.query_depth > afterTriggers.trans_stack[my_level].query_depth) { if (afterTriggers.query_depth < afterTriggers.maxquerydepth) ! AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]); afterTriggers.query_depth--; } Assert(afterTriggers.query_depth == ! afterTriggers.trans_stack[my_level].query_depth); /* * Restore the global deferred-event list to its former length, * discarding any events queued by the subxact. */ afterTriggerRestoreEventList(&afterTriggers.events, ! &afterTriggers.trans_stack[my_level].events); /* * Restore the trigger state. If the saved state is NULL, then this * subxact didn't save it, so it doesn't need restoring. */ ! state = afterTriggers.trans_stack[my_level].state; if (state != NULL) { pfree(afterTriggers.state); afterTriggers.state = state; } /* this avoids double pfree if error later: */ ! afterTriggers.trans_stack[my_level].state = NULL; /* * Scan for any remaining deferred events that were marked DONE or IN *************** AfterTriggerEndSubXact(bool isCommit) *** 4704,4710 **** * (This essentially assumes that the current subxact includes all * subxacts started after it.) */ ! subxact_firing_id = afterTriggers.firing_stack[my_level]; for_each_event_chunk(event, chunk, afterTriggers.events) { AfterTriggerShared evtshared = GetTriggerSharedData(event); --- 4854,4860 ---- * (This essentially assumes that the current subxact includes all * subxacts started after it.) */ ! subxact_firing_id = afterTriggers.trans_stack[my_level].firing_counter; for_each_event_chunk(event, chunk, afterTriggers.events) { AfterTriggerShared evtshared = GetTriggerSharedData(event); *************** AfterTriggerEnlargeQueryState(void) *** 4740,4751 **** { int new_alloc = Max(afterTriggers.query_depth + 1, 8); ! afterTriggers.query_stack = (AfterTriggerEventList *) MemoryContextAlloc(TopTransactionContext, ! new_alloc * sizeof(AfterTriggerEventList)); ! afterTriggers.fdw_tuplestores = (Tuplestorestate **) ! MemoryContextAllocZero(TopTransactionContext, ! new_alloc * sizeof(Tuplestorestate *)); afterTriggers.maxquerydepth = new_alloc; } else --- 4890,4898 ---- { int new_alloc = Max(afterTriggers.query_depth + 1, 8); ! afterTriggers.query_stack = (AfterTriggersQueryData *) MemoryContextAlloc(TopTransactionContext, ! new_alloc * sizeof(AfterTriggersQueryData)); afterTriggers.maxquerydepth = new_alloc; } else *************** AfterTriggerEnlargeQueryState(void) *** 4755,4781 **** int new_alloc = Max(afterTriggers.query_depth + 1, old_alloc * 2); ! afterTriggers.query_stack = (AfterTriggerEventList *) repalloc(afterTriggers.query_stack, ! new_alloc * sizeof(AfterTriggerEventList)); ! afterTriggers.fdw_tuplestores = (Tuplestorestate **) ! repalloc(afterTriggers.fdw_tuplestores, ! new_alloc * sizeof(Tuplestorestate *)); ! /* Clear newly-allocated slots for subsequent lazy initialization. */ ! memset(afterTriggers.fdw_tuplestores + old_alloc, ! 0, (new_alloc - old_alloc) * sizeof(Tuplestorestate *)); afterTriggers.maxquerydepth = new_alloc; } ! /* Initialize new query lists to empty */ while (init_depth < afterTriggers.maxquerydepth) { ! AfterTriggerEventList *events; ! events = &afterTriggers.query_stack[init_depth]; ! events->head = NULL; ! events->tail = NULL; ! events->tailfree = NULL; ++init_depth; } --- 4902,4923 ---- int new_alloc = Max(afterTriggers.query_depth + 1, old_alloc * 2); ! afterTriggers.query_stack = (AfterTriggersQueryData *) repalloc(afterTriggers.query_stack, ! new_alloc * sizeof(AfterTriggersQueryData)); afterTriggers.maxquerydepth = new_alloc; } ! /* Initialize new array entries to empty */ while (init_depth < afterTriggers.maxquerydepth) { ! AfterTriggersQueryData *qs = &afterTriggers.query_stack[init_depth]; ! qs->events.head = NULL; ! qs->events.tail = NULL; ! qs->events.tailfree = NULL; ! qs->fdw_tuplestore = NULL; ! qs->tables = NIL; ++init_depth; } *************** AfterTriggerSetState(ConstraintsSetStmt *** 4873,4881 **** * save it so it can be restored if the subtransaction aborts. */ if (my_level > 1 && ! afterTriggers.state_stack[my_level] == NULL) { ! afterTriggers.state_stack[my_level] = SetConstraintStateCopy(afterTriggers.state); } --- 5015,5023 ---- * save it so it can be restored if the subtransaction aborts. */ if (my_level > 1 && ! afterTriggers.trans_stack[my_level].state == NULL) { ! afterTriggers.trans_stack[my_level].state = SetConstraintStateCopy(afterTriggers.state); } *************** AfterTriggerPendingOnRel(Oid relid) *** 5184,5190 **** */ for (depth = 0; depth <= afterTriggers.query_depth && depth < afterTriggers.maxquerydepth; depth++) { ! for_each_event_chunk(event, chunk, afterTriggers.query_stack[depth]) { AfterTriggerShared evtshared = GetTriggerSharedData(event); --- 5326,5332 ---- */ for (depth = 0; depth <= afterTriggers.query_depth && depth < afterTriggers.maxquerydepth; depth++) { ! for_each_event_chunk(event, chunk, afterTriggers.query_stack[depth].events) { AfterTriggerShared evtshared = GetTriggerSharedData(event); *************** AfterTriggerSaveEvent(EState *estate, Re *** 5229,5235 **** TriggerDesc *trigdesc = relinfo->ri_TrigDesc; AfterTriggerEventData new_event; AfterTriggerSharedData new_shared; ! char relkind = relinfo->ri_RelationDesc->rd_rel->relkind; int tgtype_event; int tgtype_level; int i; --- 5371,5378 ---- TriggerDesc *trigdesc = relinfo->ri_TrigDesc; AfterTriggerEventData new_event; AfterTriggerSharedData new_shared; ! char relkind = rel->rd_rel->relkind; ! AfterTriggersTableData *table; int tgtype_event; int tgtype_level; int i; *************** AfterTriggerSaveEvent(EState *estate, Re *** 5284,5293 **** Tuplestorestate *new_tuplestore; Assert(newtup != NULL); ! if (event == TRIGGER_EVENT_INSERT) ! new_tuplestore = transition_capture->tcs_insert_tuplestore; ! else ! new_tuplestore = transition_capture->tcs_update_tuplestore; if (original_insert_tuple != NULL) tuplestore_puttuple(new_tuplestore, original_insert_tuple); --- 5427,5433 ---- Tuplestorestate *new_tuplestore; Assert(newtup != NULL); ! new_tuplestore = transition_capture->tcs_new_tuplestore; if (original_insert_tuple != NULL) tuplestore_puttuple(new_tuplestore, original_insert_tuple); *************** AfterTriggerSaveEvent(EState *estate, Re *** 5316,5321 **** --- 5456,5464 ---- * The event code will be used both as a bitmask and an array offset, so * validation is important to make sure we don't walk off the edge of our * arrays. + * + * Also, if we're considering statement-level triggers, check whether we + * already queued them for this event set, and return if so. */ switch (event) { *************** AfterTriggerSaveEvent(EState *estate, Re *** 5334,5339 **** --- 5477,5487 ---- Assert(newtup == NULL); ItemPointerSetInvalid(&(new_event.ate_ctid1)); ItemPointerSetInvalid(&(new_event.ate_ctid2)); + table = GetAfterTriggersTableData(RelationGetRelid(rel), + CMD_INSERT); + if (table->stmt_trig_done) + return; + table->stmt_trig_done = true; } break; case TRIGGER_EVENT_DELETE: *************** AfterTriggerSaveEvent(EState *estate, Re *** 5351,5356 **** --- 5499,5509 ---- Assert(newtup == NULL); ItemPointerSetInvalid(&(new_event.ate_ctid1)); ItemPointerSetInvalid(&(new_event.ate_ctid2)); + table = GetAfterTriggersTableData(RelationGetRelid(rel), + CMD_DELETE); + if (table->stmt_trig_done) + return; + table->stmt_trig_done = true; } break; case TRIGGER_EVENT_UPDATE: *************** AfterTriggerSaveEvent(EState *estate, Re *** 5368,5373 **** --- 5521,5531 ---- Assert(newtup == NULL); ItemPointerSetInvalid(&(new_event.ate_ctid1)); ItemPointerSetInvalid(&(new_event.ate_ctid2)); + table = GetAfterTriggersTableData(RelationGetRelid(rel), + CMD_UPDATE); + if (table->stmt_trig_done) + return; + table->stmt_trig_done = true; } break; case TRIGGER_EVENT_TRUNCATE: *************** AfterTriggerSaveEvent(EState *estate, Re *** 5407,5415 **** { if (fdw_tuplestore == NULL) { ! fdw_tuplestore = ! GetTriggerTransitionTuplestore ! (afterTriggers.fdw_tuplestores); new_event.ate_flags = AFTER_TRIGGER_FDW_FETCH; } else --- 5565,5571 ---- { if (fdw_tuplestore == NULL) { ! fdw_tuplestore = GetCurrentFDWTuplestore(); new_event.ate_flags = AFTER_TRIGGER_FDW_FETCH; } else *************** AfterTriggerSaveEvent(EState *estate, Re *** 5474,5484 **** new_shared.ats_tgoid = trigger->tgoid; new_shared.ats_relid = RelationGetRelid(rel); new_shared.ats_firing_id = 0; ! /* deferrable triggers cannot access transition data */ ! new_shared.ats_transition_capture = ! trigger->tgdeferrable ? NULL : transition_capture; ! afterTriggerAddEvent(&afterTriggers.query_stack[afterTriggers.query_depth], &new_event, &new_shared); } --- 5630,5648 ---- new_shared.ats_tgoid = trigger->tgoid; new_shared.ats_relid = RelationGetRelid(rel); new_shared.ats_firing_id = 0; ! /* deferrable triggers cannot access transition tables */ ! if (trigger->tgdeferrable || transition_capture == NULL) ! { ! new_shared.ats_old_tuplestore = NULL; ! new_shared.ats_new_tuplestore = NULL; ! } ! else ! { ! new_shared.ats_old_tuplestore = transition_capture->tcs_old_tuplestore; ! new_shared.ats_new_tuplestore = transition_capture->tcs_new_tuplestore; ! } ! afterTriggerAddEvent(&afterTriggers.query_stack[afterTriggers.query_depth].events, &new_event, &new_shared); } diff --git a/src/backend/executor/README b/src/backend/executor/README index a004506..b3e74aa 100644 *** a/src/backend/executor/README --- b/src/backend/executor/README *************** This is a sketch of control flow for ful *** 241,251 **** CreateExecutorState creates per-query context switch to per-query context to run ExecInitNode ExecInitNode --- recursively scans plan tree CreateExprContext creates per-tuple context ExecInitExpr - AfterTriggerBeginQuery ExecutorRun ExecProcNode --- recursively called in per-query context --- 241,251 ---- CreateExecutorState creates per-query context switch to per-query context to run ExecInitNode + AfterTriggerBeginQuery ExecInitNode --- recursively scans plan tree CreateExprContext creates per-tuple context ExecInitExpr ExecutorRun ExecProcNode --- recursively called in per-query context diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 4b594d4..6255cc6 100644 *** a/src/backend/executor/execMain.c --- b/src/backend/executor/execMain.c *************** standard_ExecutorStart(QueryDesc *queryD *** 252,268 **** estate->es_instrument = queryDesc->instrument_options; /* - * Initialize the plan state tree - */ - InitPlan(queryDesc, eflags); - - /* * Set up an AFTER-trigger statement context, unless told not to, or * unless it's EXPLAIN-only mode (when ExecutorFinish won't be called). */ if (!(eflags & (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY))) AfterTriggerBeginQuery(); MemoryContextSwitchTo(oldcontext); } --- 252,268 ---- estate->es_instrument = queryDesc->instrument_options; /* * Set up an AFTER-trigger statement context, unless told not to, or * unless it's EXPLAIN-only mode (when ExecutorFinish won't be called). */ if (!(eflags & (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY))) AfterTriggerBeginQuery(); + /* + * Initialize the plan state tree + */ + InitPlan(queryDesc, eflags); + MemoryContextSwitchTo(oldcontext); } diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 49586a3..845c409 100644 *** a/src/backend/executor/nodeModifyTable.c --- b/src/backend/executor/nodeModifyTable.c *************** ExecInsert(ModifyTableState *mtstate, *** 343,348 **** --- 343,351 ---- mtstate->mt_transition_capture->tcs_map = NULL; } } + if (mtstate->mt_oc_transition_capture != NULL) + mtstate->mt_oc_transition_capture->tcs_map = + mtstate->mt_transition_tupconv_maps[leaf_part_index]; /* * We might need to convert from the parent rowtype to the partition *************** lreplace:; *** 1158,1163 **** --- 1161,1168 ---- /* AFTER ROW UPDATE Triggers */ ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple, tuple, recheckIndexes, + mtstate->operation == CMD_INSERT ? + mtstate->mt_oc_transition_capture : mtstate->mt_transition_capture); list_free(recheckIndexes); *************** fireASTriggers(ModifyTableState *node) *** 1444,1450 **** if (node->mt_onconflict == ONCONFLICT_UPDATE) ExecASUpdateTriggers(node->ps.state, resultRelInfo, ! node->mt_transition_capture); ExecASInsertTriggers(node->ps.state, resultRelInfo, node->mt_transition_capture); break; --- 1449,1455 ---- if (node->mt_onconflict == ONCONFLICT_UPDATE) ExecASUpdateTriggers(node->ps.state, resultRelInfo, ! node->mt_oc_transition_capture); ExecASInsertTriggers(node->ps.state, resultRelInfo, node->mt_transition_capture); break; *************** ExecSetupTransitionCaptureState(ModifyTa *** 1474,1487 **** /* Check for transition tables on the directly targeted relation. */ mtstate->mt_transition_capture = ! MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc); /* * If we found that we need to collect transition tuples then we may also * need tuple conversion maps for any children that have TupleDescs that ! * aren't compatible with the tuplestores. */ ! if (mtstate->mt_transition_capture != NULL) { ResultRelInfo *resultRelInfos; int numResultRelInfos; --- 1479,1502 ---- /* Check for transition tables on the directly targeted relation. */ mtstate->mt_transition_capture = ! MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc, ! RelationGetRelid(targetRelInfo->ri_RelationDesc), ! mtstate->operation); ! if (mtstate->operation == CMD_INSERT && ! mtstate->mt_onconflict == ONCONFLICT_UPDATE) ! mtstate->mt_oc_transition_capture = ! MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc, ! RelationGetRelid(targetRelInfo->ri_RelationDesc), ! CMD_UPDATE); /* * If we found that we need to collect transition tuples then we may also * need tuple conversion maps for any children that have TupleDescs that ! * aren't compatible with the tuplestores. (We can share these maps ! * between the regular and ON CONFLICT cases.) */ ! if (mtstate->mt_transition_capture != NULL || ! mtstate->mt_oc_transition_capture != NULL) { ResultRelInfo *resultRelInfos; int numResultRelInfos; *************** ExecSetupTransitionCaptureState(ModifyTa *** 1522,1531 **** /* * Install the conversion map for the first plan for UPDATE and DELETE * operations. It will be advanced each time we switch to the next ! * plan. (INSERT operations set it every time.) */ ! mtstate->mt_transition_capture->tcs_map = ! mtstate->mt_transition_tupconv_maps[0]; } } --- 1537,1548 ---- /* * Install the conversion map for the first plan for UPDATE and DELETE * operations. It will be advanced each time we switch to the next ! * plan. (INSERT operations set it every time, so we need not update ! * mtstate->mt_oc_transition_capture here.) */ ! if (mtstate->mt_transition_capture) ! mtstate->mt_transition_capture->tcs_map = ! mtstate->mt_transition_tupconv_maps[0]; } } *************** ExecModifyTable(PlanState *pstate) *** 1629,1641 **** estate->es_result_relation_info = resultRelInfo; EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan, node->mt_arowmarks[node->mt_whichplan]); if (node->mt_transition_capture != NULL) { - /* Prepare to convert transition tuples from this child. */ Assert(node->mt_transition_tupconv_maps != NULL); node->mt_transition_capture->tcs_map = node->mt_transition_tupconv_maps[node->mt_whichplan]; } continue; } else --- 1646,1664 ---- estate->es_result_relation_info = resultRelInfo; EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan, node->mt_arowmarks[node->mt_whichplan]); + /* Prepare to convert transition tuples from this child. */ if (node->mt_transition_capture != NULL) { Assert(node->mt_transition_tupconv_maps != NULL); node->mt_transition_capture->tcs_map = node->mt_transition_tupconv_maps[node->mt_whichplan]; } + if (node->mt_oc_transition_capture != NULL) + { + Assert(node->mt_transition_tupconv_maps != NULL); + node->mt_oc_transition_capture->tcs_map = + node->mt_transition_tupconv_maps[node->mt_whichplan]; + } continue; } else *************** ExecInitModifyTable(ModifyTable *node, E *** 1934,1941 **** mtstate->mt_partition_tuple_slot = partition_tuple_slot; } ! /* Build state for collecting transition tuples */ ! ExecSetupTransitionCaptureState(mtstate, estate); /* * Initialize any WITH CHECK OPTION constraints if needed. --- 1957,1968 ---- mtstate->mt_partition_tuple_slot = partition_tuple_slot; } ! /* ! * Build state for collecting transition tuples. This requires having a ! * valid trigger query context, so skip it in explain-only mode. ! */ ! if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY)) ! ExecSetupTransitionCaptureState(mtstate, estate); /* * Initialize any WITH CHECK OPTION constraints if needed. *************** ExecEndModifyTable(ModifyTableState *nod *** 2319,2334 **** int i; /* - * Free transition tables, unless this query is being run in - * EXEC_FLAG_SKIP_TRIGGERS mode, which means that it may have queued AFTER - * triggers that won't be run till later. In that case we'll just leak - * the transition tables till end of (sub)transaction. - */ - if (node->mt_transition_capture != NULL && - !(node->ps.state->es_top_eflags & EXEC_FLAG_SKIP_TRIGGERS)) - DestroyTransitionCaptureState(node->mt_transition_capture); - - /* * Allow any FDWs to shut down */ for (i = 0; i < node->mt_nplans; i++) --- 2346,2351 ---- diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h index aeb363f..c6a5684 100644 *** a/src/include/commands/trigger.h --- b/src/include/commands/trigger.h *************** typedef struct TriggerData *** 43,49 **** /* * The state for capturing old and new tuples into transition tables for a ! * single ModifyTable node. */ typedef struct TransitionCaptureState { --- 43,54 ---- /* * The state for capturing old and new tuples into transition tables for a ! * single ModifyTable node (or other operation source, e.g. copy.c). ! * ! * This is per-caller to avoid conflicts in setting tcs_map or ! * tcs_original_insert_tuple. Note, however, that the pointed-to ! * tuplestores may be shared across multiple callers; they are managed ! * by trigger.c. */ typedef struct TransitionCaptureState { *************** typedef struct TransitionCaptureState *** 60,66 **** * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the * new and old tuples from a child table's format to the format of the * relation named in a query so that it is compatible with the transition ! * tuplestores. */ TupleConversionMap *tcs_map; --- 65,71 ---- * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the * new and old tuples from a child table's format to the format of the * relation named in a query so that it is compatible with the transition ! * tuplestores. The caller must store the conversion map here if so. */ TupleConversionMap *tcs_map; *************** typedef struct TransitionCaptureState *** 74,90 **** HeapTuple tcs_original_insert_tuple; /* ! * The tuplestores backing the transition tables. We use separate ! * tuplestores for INSERT and UPDATE, because INSERT ... ON CONFLICT ... ! * DO UPDATE causes INSERT and UPDATE triggers to fire and needs a way to ! * keep track of the new tuple images resulting from the two cases ! * separately. We only need a single old image tuplestore, because there ! * is no statement that can both update and delete at the same time. */ ! Tuplestorestate *tcs_old_tuplestore; /* for DELETE and UPDATE old ! * images */ ! Tuplestorestate *tcs_insert_tuplestore; /* for INSERT new images */ ! Tuplestorestate *tcs_update_tuplestore; /* for UPDATE new images */ } TransitionCaptureState; /* --- 79,91 ---- HeapTuple tcs_original_insert_tuple; /* ! * The tuplestore(s) into which to insert tuples. Either may be NULL if ! * not needed for the operation type. */ ! Tuplestorestate *tcs_old_tuplestore; /* for DELETE and UPDATE-old ! * tuples */ ! Tuplestorestate *tcs_new_tuplestore; /* for INSERT and UPDATE-new ! * tuples */ } TransitionCaptureState; /* *************** extern void RelationBuildTriggers(Relati *** 174,181 **** extern TriggerDesc *CopyTriggerDesc(TriggerDesc *trigdesc); extern const char *FindTriggerIncompatibleWithInheritance(TriggerDesc *trigdesc); ! extern TransitionCaptureState *MakeTransitionCaptureState(TriggerDesc *trigdesc); ! extern void DestroyTransitionCaptureState(TransitionCaptureState *tcs); extern void FreeTriggerDesc(TriggerDesc *trigdesc); --- 175,183 ---- extern TriggerDesc *CopyTriggerDesc(TriggerDesc *trigdesc); extern const char *FindTriggerIncompatibleWithInheritance(TriggerDesc *trigdesc); ! ! extern TransitionCaptureState *MakeTransitionCaptureState(TriggerDesc *trigdesc, ! Oid relid, CmdType cmdType); extern void FreeTriggerDesc(TriggerDesc *trigdesc); diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index 90a60ab..c6d3021 100644 *** a/src/include/nodes/execnodes.h --- b/src/include/nodes/execnodes.h *************** typedef struct ModifyTableState *** 983,989 **** /* Per partition tuple conversion map */ TupleTableSlot *mt_partition_tuple_slot; struct TransitionCaptureState *mt_transition_capture; ! /* controls transition table population */ TupleConversionMap **mt_transition_tupconv_maps; /* Per plan/partition tuple conversion */ } ModifyTableState; --- 983,991 ---- /* Per partition tuple conversion map */ TupleTableSlot *mt_partition_tuple_slot; struct TransitionCaptureState *mt_transition_capture; ! /* controls transition table population for specified operation */ ! struct TransitionCaptureState *mt_oc_transition_capture; ! /* controls transition table population for INSERT...ON CONFLICT UPDATE */ TupleConversionMap **mt_transition_tupconv_maps; /* Per plan/partition tuple conversion */ } ModifyTableState; diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out index 620fac1..0987be5 100644 *** a/src/test/regress/expected/triggers.out --- b/src/test/regress/expected/triggers.out *************** with wcte as (insert into table1 values *** 2217,2222 **** --- 2217,2239 ---- insert into table2 values ('hello world'); NOTICE: trigger = table2_trig, new table = ("hello world") NOTICE: trigger = table1_trig, new table = (42) + with wcte as (insert into table1 values (43)) + insert into table1 values (44); + NOTICE: trigger = table1_trig, new table = (43), (44) + select * from table1; + a + ---- + 42 + 44 + 43 + (3 rows) + + select * from table2; + a + ------------- + hello world + (1 row) + drop table table1; drop table table2; -- *************** create trigger my_table_multievent_trig *** 2256,2261 **** --- 2273,2286 ---- after insert or update on my_table referencing new table as new_table for each statement execute procedure dump_insert(); ERROR: transition tables cannot be specified for triggers with more than one event + -- + -- Verify that you can't create a trigger with transition tables with + -- a column list. + -- + create trigger my_table_col_update_trig + after update of b on my_table referencing new table as new_table + for each statement execute procedure dump_insert(); + ERROR: transition tables cannot be specified for triggers with column lists drop table my_table; -- -- Test firing of triggers with transition tables by foreign key cascades *************** select * from trig_table; *** 2299,2306 **** (6 rows) delete from refd_table where length(b) = 3; ! NOTICE: trigger = trig_table_delete_trig, old table = (2,"two a"), (2,"two b") ! NOTICE: trigger = trig_table_delete_trig, old table = (11,"one a"), (11,"one b") select * from trig_table; a | b ---+--------- --- 2324,2330 ---- (6 rows) delete from refd_table where length(b) = 3; ! NOTICE: trigger = trig_table_delete_trig, old table = (2,"two a"), (2,"two b"), (11,"one a"), (11,"one b") select * from trig_table; a | b ---+--------- diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql index c6deb56..10eee76 100644 *** a/src/test/regress/sql/triggers.sql --- b/src/test/regress/sql/triggers.sql *************** create trigger table2_trig *** 1729,1734 **** --- 1729,1740 ---- with wcte as (insert into table1 values (42)) insert into table2 values ('hello world'); + with wcte as (insert into table1 values (43)) + insert into table1 values (44); + + select * from table1; + select * from table2; + drop table table1; drop table table2; *************** create trigger my_table_multievent_trig *** 1769,1774 **** --- 1775,1789 ---- after insert or update on my_table referencing new table as new_table for each statement execute procedure dump_insert(); + -- + -- Verify that you can't create a trigger with transition tables with + -- a column list. + -- + + create trigger my_table_col_update_trig + after update of b on my_table referencing new table as new_table + for each statement execute procedure dump_insert(); + drop table my_table; -- -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Fri, Sep 15, 2017 at 9:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Attached is a draft patch for this. Some initial feedback: Compiles cleanly and make check-world passes here. with wcte as (insert into table1 values (42)) insert into table2 values ('hello world');NOTICE: trigger = table2_trig,new table = ("hello world")NOTICE: trigger = table1_trig, new table = (42) +with wcte as (insert into table1 values (43)) + insert into table1 values (44); +NOTICE: trigger = table1_trig, new table = (43), (44) The effects of multiple ModifyTable nodes on the same table are merged. Good. I doubt anyone will notice but it might warrant a note somewhere that there is a user-visible change here: previously if you did this (without TTs) your trigger would have fired twice. +create trigger my_table_col_update_trig + after update of b on my_table referencing new table as new_table + for each statement execute procedure dump_insert(); +ERROR: transition tables cannot be specified for triggers with column lists Potential SQL standard non-compliance avoided. Good. delete from refd_table where length(b) = 3; -NOTICE: trigger = trig_table_delete_trig, old table = (2,"two a"), (2,"two b") -NOTICE: trigger = trig_table_delete_trig, old table = (11,"one a"), (11,"one b") +NOTICE: trigger = trig_table_delete_trig, old table = (2,"two a"), (2,"two b"), (11,"one a"), (11,"one b") The effects of fk cascade machinery are merged as discussed. Good, I think, but I'm a bit confused about how this works when the cascading operation also fires triggers. All the other tests show no change in behaviour. Good. What is going on here? === setup === create or replace function dump_delete() returns trigger language plpgsql as $$ begin raise notice 'trigger = %, old table = %, depth = %', TG_NAME, (select string_agg(old_table::text,', ' order by a) from old_table), pg_trigger_depth(); return null; end; $$; create table foo (a int primary key, b int references foo(a) on delete cascade); create trigger foo_s_trig after delete on foo referencing old table as old_table for each statement execute procedure dump_delete(); create trigger foo_r_trig after delete on foo referencing old table as old_table for each row execute procedure dump_delete(); insert into foo values (1, null), (2, 1); ===8<=== postgres=# delete from foo where a = 1; NOTICE: trigger = foo_r_trig, old table = (1,), depth = 1 NOTICE: trigger = foo_s_trig, old table = (1,), depth = 1 NOTICE: trigger = foo_r_trig, old table = (2,1), depth = 1 NOTICE: trigger = foo_s_trig, old table = (2,1), depth = 1 NOTICE: trigger = foo_s_trig, old table = <NULL>, depth = 1 DELETE 1 > Thomas Munro <thomas.munro@enterprisedb.com> writes: >> 1. Merging the transition tables when there are multiple wCTEs >> referencing the same table. Here's one idea: Rename >> MakeTransitionCaptureState() to GetTransitionCaptureState() and use a >> hash table keyed by table OID in >> afterTriggers.transition_capture_states[afterTriggers.query_depth] to >> find the TCS for the given TriggerDesc or create it if not found, so >> that all wCTEs find the same TransitionCaptureState object. The all >> existing callers continue to do what they're doing now, but they'll be >> sharing TCSs appropriately with other things in the plan. Note that >> TransitionCaptureState already holds tuplestores for each operation >> (INSERT, UPDATE, DELETE) so the OID of the table alone is a suitable >> key for the hash table (assuming we are ignoring the column-list part >> of the spec as you suggested). > > It seems unsafe to merge the TCS objects themselves, because the callers > assume that they can munge the tcs_map and tcs_original_insert_tuple > fields freely without regard for any other callers. So as I have it, > we still have a TCS for each caller, but the TCSes point at tuplestores > that can be shared across multiple callers for the same event type. > The tuplestores themselves are managed by the AfterTrigger data > structures. Also, because the TCS structs are just throwaway per-caller > data, it's uncool to reference them in the trigger event lists. > So I replaced ats_transition_capture with two pointers to the actual > tuplestores. Ok, that works and yeah it may be conceptually better. Also, maybe the tcs_original_insert_tuple as member of TCS is a bit clunky and could be reconsidered later: I was trying to avoid widening a bunch of function calls, but that might have been optimising for the wrong thing... > That bloats AfterTriggerSharedData a bit but I think it's > okay; we don't expect a lot of those structs in a normal query. Yeah, it was bloat avoidance that led me to find a way to use a single pointer. But at least it's in the shared part, so I don't think it matters too much. > I chose to make the persistent state (AfterTriggersTableData) independent > for each operation type. We could have done that differently perhaps, but > it seemed more complicated and less likely to match the spec's semantics. OK. Your GetAfterTriggersTableData(Oid relid, CmdType cmdType) is mirroring the spec's way of describing what happens (without the column list). + foreach(lc, qs->tables) + { + table = (AfterTriggersTableData *) lfirst(lc); + if (table->relid == relid && table->cmdType == cmdType && + !table->closed) + return table; + } Yeah, my suggestion of a hash table was overkill. (Maybe in future if we change our rules around inheritance this could finish up being searched for a lot of child tables; we can cross that bridge when we come to it.) I'm a little confused about the "closed" flag. This seems to imply that it's possible for there to be more than one separate tuplestore for a given (table, operation) in a given trigger execution context. Why is that OK? > The INSERT ON CONFLICT UPDATE mess is handled by creating two separate > TCSes with two different underlying AfterTriggersTableData structs. > The insertion tuplestore sees only the inserted tuples, the update > tuplestores see only the updated-pre-existing tuples. That adds a little > code to nodeModifyTable but it seems conceptually much cleaner. OK. The resulting behaviour is unchanged. >> 3. Merging the invocation after statement firing so that if you >> updated the same table directly and also via a wCTE and also >> indirectly via fk ON DELETE/UPDATE trigger then you still only get one >> invocation of the after statement trigger. Not sure exactly how... > > What I did here was to use the AfterTriggersTableData structs to hold > a flag saying we'd already queued statement triggers for this rel and > cmdType. There's probably more than one way to do that, but this seemed > convenient. Seems good. That implements the following (from whatever random draft spec I have): "A statement-level trigger that is considered as executed for a state change SC (in a given trigger execution context) is not subsequently executed for SC." > One thing I don't like too much about that is that it means there are > cases where the single statement trigger firing would occur before some > AFTER ROW trigger firings. Not sure if we promise anything about the > ordering in the docs. It looks quite expensive/complicated to try to > make it always happen afterwards, though, and it might well be totally > impossible if triggers cause more table updates to occur. I suppose that it might be possible for AfterTriggersTableData to record the location of a previously queued event so that you can later disable it and queue a replacement, with the effect of suppressing earlier firings rather than later ones. That might make sense if you think that after statement triggers should fire after all row triggers. I can't figure out from the spec whether that's expected and I'm not sure if it's useful. > Because MakeTransitionCaptureState now depends on the trigger query > level being active, I had to relocate the AfterTriggerBeginQuery calls > to occur before it. Right. > In passing, I refactored the AfterTriggers data structures a bit so > that we don't need to do as many palloc calls to manage them. Instead > of several independent arrays there's now one array of structs. Good improvement. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Thomas Munro <thomas.munro@enterprisedb.com> writes: > On Fri, Sep 15, 2017 at 9:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Attached is a draft patch for this. > Some initial feedback: > The effects of multiple ModifyTable nodes on the same table are > merged. Good. I doubt anyone will notice but it might warrant a note > somewhere that there is a user-visible change here: previously if you > did this (without TTs) your trigger would have fired twice. Check. I've not yet looked at the documentation angle here. > What is going on here? Hmm ... that looks odd, but I'm too tired to debug it right now. > Ok, that works and yeah it may be conceptually better. Also, maybe > the tcs_original_insert_tuple as member of TCS is a bit clunky and > could be reconsidered later: I was trying to avoid widening a bunch of > function calls, but that might have been optimising for the wrong > thing... Yeah, both tcs_map and tcs_original_insert_tuple feel like they ought to be function parameters not persistent state. But we can leave that sort of refactoring for later. > Yeah, my suggestion of a hash table was overkill. (Maybe in future if > we change our rules around inheritance this could finish up being > searched for a lot of child tables; we can cross that bridge when we > come to it.) Exactly; I doubt it matters now, and if it ever does we can improve the lookup infrastructure later. > I'm a little confused about the "closed" flag. This seems to imply > that it's possible for there to be more than one separate tuplestore > for a given (table, operation) in a given trigger execution context. > Why is that OK? The thought I had was that once we've fired any triggers that look at a transition table, we don't want that table to change anymore; it'd be bad if different triggers on the same event saw different tables. So the "closed" flag is intended to shut off additional tuple additions. If any happen (which I'm not sure if it's possible without a rogue C-coded trigger), we'll establish a new set of transition tables for them, and if relevant, fire AFTER STATEMENT triggers again to process those tables. So this sort of breaks the "one transition table and one AFTER STATEMENT trigger firing per statement" concept, but the alternative seems worse. I hope that the case isn't reachable without weird trigger behavior, though. > I suppose that it might be possible for AfterTriggersTableData to > record the location of a previously queued event so that you can later > disable it and queue a replacement, with the effect of suppressing > earlier firings rather than later ones. That might make sense if you > think that after statement triggers should fire after all row > triggers. I can't figure out from the spec whether that's expected > and I'm not sure if it's useful. Hm. There could be more than one such event, but maybe we can make something of that idea. We could expect that the events in question are consecutive in the event list, I think ... regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Thomas Munro <thomas.munro@enterprisedb.com> writes: > What is going on here? > ... > insert into foo values (1, null), (2, 1); > > postgres=# delete from foo where a = 1; > NOTICE: trigger = foo_r_trig, old table = (1,), depth = 1 > NOTICE: trigger = foo_s_trig, old table = (1,), depth = 1 > NOTICE: trigger = foo_r_trig, old table = (2,1), depth = 1 > NOTICE: trigger = foo_s_trig, old table = (2,1), depth = 1 > NOTICE: trigger = foo_s_trig, old table = <NULL>, depth = 1 > DELETE 1 OK, now that I'm a little more awake, I looked at this, and I think it's actually an instance of the "closed transition table" behavior you were asking about. Initially, we perform the delete of the (1,null) row, and that queues AFTER ROW triggers -- both the RI enforcement one and the explicit foo_r_trig one -- and puts the row into the transition table. When the executor finishes that scan it queues the foo_s_trig AFTER STATEMENT trigger. Then we start to execute the triggers, and as I have the patch now, it first marks all the transition tables closed. I think that the RI enforcement trigger fires before foo_r_trig on the basis of name order, but it doesn't actually matter if it fires first or second. Either way, it causes a cascaded delete of the (2,1) row, and again we queue AFTER ROW triggers and put the row into the transition table. But now, since the first transition table was already marked closed, we create a new transition table and put (2,1) into it. And then we queue foo_s_trig, and since we're looking at a new (not closed) AfterTriggersTableData struct, that's allowed to happen. Then we fire foo_r_trig and foo_s_trig referencing the original transition table, which produce the first two NOTICE lines. Then we repeat this entire process with the newly queued triggers. The second invocation of the RI enforcement trigger doesn't find any rows to delete, but it nonetheless causes queuing of a third AFTER STATEMENT trigger, which eventually gets to run with an empty transition table. So this is a bit annoying because we're marking the transition table closed before the RI triggers can have the desired effects. I wonder if we can rejigger things so that the tables are only closed when actually used. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
I wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: >> What is going on here? >> ... >> insert into foo values (1, null), (2, 1); >> >> postgres=# delete from foo where a = 1; >> NOTICE: trigger = foo_r_trig, old table = (1,), depth = 1 >> NOTICE: trigger = foo_s_trig, old table = (1,), depth = 1 >> NOTICE: trigger = foo_r_trig, old table = (2,1), depth = 1 >> NOTICE: trigger = foo_s_trig, old table = (2,1), depth = 1 >> NOTICE: trigger = foo_s_trig, old table = <NULL>, depth = 1 >> DELETE 1 > OK, now that I'm a little more awake, I looked at this, and I think it's > actually an instance of the "closed transition table" behavior you were > asking about. Attached is an updated patch that incorporates the ideas you suggested. I added an extended version of this example, which has an additional level of FK cascade happening: insert into self_ref values (1, null), (2, 1), (3, 2); delete from self_ref where a = 1; NOTICE: trigger = self_ref_r_trig, old table = (1,), (2,1) NOTICE: trigger = self_ref_r_trig, old table = (1,), (2,1) NOTICE: trigger = self_ref_s_trig, old table = (1,), (2,1) NOTICE: trigger = self_ref_r_trig, old table = (3,2) NOTICE: trigger = self_ref_s_trig, old table = (3,2) What happens here is that the outer delete queues AR triggers for RI enforcement and self_ref_r_trig, plus an AS trigger for self_ref_s_trig. Then the RI enforcement trigger deletes (2,1) and queues AR+AS triggers for that. At this point the initial transition table is still open, so (2,1) goes into that table, and we look back and cancel the previous queuing of self_ref_s_trig. Now it's time for the first firing of self_ref_r_trig, and so now we mark the transition table closed. Then we skip the cancelled self_ref_s_trig call, and then it's time for the second RI enforcement trigger to fire, which deletes (3,2) but has to put it into a new transition table. Again we queue AR+AS triggers, but this time we can't cancel the preceding AS call. Then we fire self_ref_r_trig again (for the (2,1) row), and then fire self_ref_s_trig; both of them see the same transition table the first self_ref_r_trig call did. Now it's time for the third RI enforcement trigger; it finds nothing to delete, so it adds nothing to the second transition table, but it does queue an AS trigger call (canceling the one added by the second RI trigger). Finally we have the AR call queued by the second RI trigger, and then the AS call queued by the third RI trigger, both looking at the second transition table. This is pretty messy but I think it's the best we can do as long as RI actions are intermixed with other AFTER ROW triggers. Maybe with Kevin's ideas about converting RI actions to be statement-level, we could arrange for all three deletions to show up in one transition table ... but I don't know how we cause that to happen before the user's AFTER ROW triggers run. In any case, nothing will look odd unless you have AR triggers using transition tables, which seems like a niche usage case in the first place. I also realized that we could undo the bloat I added to AfterTriggerSharedData by storing a pointer to the AfterTriggersTableData rather than the tuplestores themselves. I feel that this is probably committable, except that I still need to look for documentation changes. regards, tom lane diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index cfa3f05..c6fa445 100644 *** a/src/backend/commands/copy.c --- b/src/backend/commands/copy.c *************** CopyFrom(CopyState cstate) *** 2432,2443 **** /* Triggers might need a slot as well */ estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate); /* * If there are any triggers with transition tables on the named relation, * we need to be prepared to capture transition tuples. */ cstate->transition_capture = ! MakeTransitionCaptureState(cstate->rel->trigdesc); /* * If the named relation is a partitioned table, initialize state for --- 2432,2448 ---- /* Triggers might need a slot as well */ estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate); + /* Prepare to catch AFTER triggers. */ + AfterTriggerBeginQuery(); + /* * If there are any triggers with transition tables on the named relation, * we need to be prepared to capture transition tuples. */ cstate->transition_capture = ! MakeTransitionCaptureState(cstate->rel->trigdesc, ! RelationGetRelid(cstate->rel), ! CMD_INSERT); /* * If the named relation is a partitioned table, initialize state for *************** CopyFrom(CopyState cstate) *** 2513,2521 **** bufferedTuples = palloc(MAX_BUFFERED_TUPLES * sizeof(HeapTuple)); } - /* Prepare to catch AFTER triggers. */ - AfterTriggerBeginQuery(); - /* * Check BEFORE STATEMENT insertion triggers. It's debatable whether we * should do this for COPY, since it's not really an "INSERT" statement as --- 2518,2523 ---- diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c index 269c9e1..dfe1348 100644 *** a/src/backend/commands/trigger.c --- b/src/backend/commands/trigger.c *************** CreateTrigger(CreateTrigStmt *stmt, cons *** 234,239 **** --- 234,244 ---- RelationGetRelationName(rel)), errdetail("Foreign tables cannot have TRUNCATE triggers."))); + /* + * We disallow constraint triggers to protect the assumption that + * triggers on FKs can't be deferred. See notes with AfterTriggers + * data structures, below. + */ if (stmt->isconstraint) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), *************** CreateTrigger(CreateTrigStmt *stmt, cons *** 418,423 **** --- 423,448 ---- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("transition tables cannot be specified for triggers with more than one event"))); + /* + * We currently don't allow column-specific triggers with + * transition tables. Per spec, that seems to require + * accumulating separate transition tables for each combination of + * columns, which is a lot of work for a rather marginal feature. + */ + if (stmt->columns != NIL) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("transition tables cannot be specified for triggers with column lists"))); + + /* + * We disallow constraint triggers with transition tables, to + * protect the assumption that such triggers can't be deferred. + * See notes with AfterTriggers data structures, below. + * + * Currently this is enforced by the grammar, so just Assert here. + */ + Assert(!stmt->isconstraint); + if (tt->isNew) { if (!(TRIGGER_FOR_INSERT(tgtype) || *************** FindTriggerIncompatibleWithInheritance(T *** 2086,2181 **** } /* - * Make a TransitionCaptureState object from a given TriggerDesc. The - * resulting object holds the flags which control whether transition tuples - * are collected when tables are modified, and the tuplestores themselves. - * Note that we copy the flags from a parent table into this struct (rather - * than using each relation's TriggerDesc directly) so that we can use it to - * control the collection of transition tuples from child tables. - * - * If there are no triggers with transition tables configured for 'trigdesc', - * then return NULL. - * - * The resulting object can be passed to the ExecAR* functions. The caller - * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing - * with child tables. - */ - TransitionCaptureState * - MakeTransitionCaptureState(TriggerDesc *trigdesc) - { - TransitionCaptureState *state = NULL; - - if (trigdesc != NULL && - (trigdesc->trig_delete_old_table || trigdesc->trig_update_old_table || - trigdesc->trig_update_new_table || trigdesc->trig_insert_new_table)) - { - MemoryContext oldcxt; - ResourceOwner saveResourceOwner; - - /* - * Normally DestroyTransitionCaptureState should be called after - * executing all AFTER triggers for the current statement. - * - * To handle error cleanup, TransitionCaptureState and the tuplestores - * it contains will live in the current [sub]transaction's memory - * context. Likewise for the current resource owner, because we also - * want to clean up temporary files spilled to disk by the tuplestore - * in that scenario. This scope is sufficient, because AFTER triggers - * with transition tables cannot be deferred (only constraint triggers - * can be deferred, and constraint triggers cannot have transition - * tables). The AFTER trigger queue may contain pointers to this - * TransitionCaptureState, but any such entries will be processed or - * discarded before the end of the current [sub]transaction. - * - * If a future release allows deferred triggers with transition - * tables, we'll need to reconsider the scope of the - * TransitionCaptureState object. - */ - oldcxt = MemoryContextSwitchTo(CurTransactionContext); - saveResourceOwner = CurrentResourceOwner; - - state = (TransitionCaptureState *) - palloc0(sizeof(TransitionCaptureState)); - state->tcs_delete_old_table = trigdesc->trig_delete_old_table; - state->tcs_update_old_table = trigdesc->trig_update_old_table; - state->tcs_update_new_table = trigdesc->trig_update_new_table; - state->tcs_insert_new_table = trigdesc->trig_insert_new_table; - PG_TRY(); - { - CurrentResourceOwner = CurTransactionResourceOwner; - if (trigdesc->trig_delete_old_table || trigdesc->trig_update_old_table) - state->tcs_old_tuplestore = tuplestore_begin_heap(false, false, work_mem); - if (trigdesc->trig_insert_new_table) - state->tcs_insert_tuplestore = tuplestore_begin_heap(false, false, work_mem); - if (trigdesc->trig_update_new_table) - state->tcs_update_tuplestore = tuplestore_begin_heap(false, false, work_mem); - } - PG_CATCH(); - { - CurrentResourceOwner = saveResourceOwner; - PG_RE_THROW(); - } - PG_END_TRY(); - CurrentResourceOwner = saveResourceOwner; - MemoryContextSwitchTo(oldcxt); - } - - return state; - } - - void - DestroyTransitionCaptureState(TransitionCaptureState *tcs) - { - if (tcs->tcs_insert_tuplestore != NULL) - tuplestore_end(tcs->tcs_insert_tuplestore); - if (tcs->tcs_update_tuplestore != NULL) - tuplestore_end(tcs->tcs_update_tuplestore); - if (tcs->tcs_old_tuplestore != NULL) - tuplestore_end(tcs->tcs_old_tuplestore); - pfree(tcs); - } - - /* * Call a trigger function. * * trigdata: trigger descriptor. --- 2111,2116 ---- *************** TriggerEnabled(EState *estate, ResultRel *** 3338,3346 **** * during the current transaction tree. (BEFORE triggers are fired * immediately so we don't need any persistent state about them.) The struct * and most of its subsidiary data are kept in TopTransactionContext; however ! * the individual event records are kept in a separate sub-context. This is ! * done mainly so that it's easy to tell from a memory context dump how much ! * space is being eaten by trigger events. * * Because the list of pending events can grow large, we go to some * considerable effort to minimize per-event memory consumption. The event --- 3273,3283 ---- * during the current transaction tree. (BEFORE triggers are fired * immediately so we don't need any persistent state about them.) The struct * and most of its subsidiary data are kept in TopTransactionContext; however ! * some data that can be discarded sooner appears in the CurTransactionContext ! * of the relevant subtransaction. Also, the individual event records are ! * kept in a separate sub-context of TopTransactionContext. This is done ! * mainly so that it's easy to tell from a memory context dump how much space ! * is being eaten by trigger events. * * Because the list of pending events can grow large, we go to some * considerable effort to minimize per-event memory consumption. The event *************** typedef SetConstraintStateData *SetConst *** 3400,3405 **** --- 3337,3349 ---- * tuple(s). This permits storing tuples once regardless of the number of * row-level triggers on a foreign table. * + * Note that we need triggers on foreign tables to be fired in exactly the + * order they were queued, so that the tuples come out of the tuplestore in + * the right order. To ensure that, we forbid deferrable (constraint) + * triggers on foreign tables. This also ensures that such triggers do not + * get deferred into outer trigger query levels, meaning that it's okay to + * destroy the tuplestore at the end of the query level. + * * Statement-level triggers always bear AFTER_TRIGGER_1CTID, though they * require no ctid field. We lack the flag bit space to neatly represent that * distinct case, and it seems unlikely to be worth much trouble. *************** typedef struct AfterTriggerSharedData *** 3433,3439 **** Oid ats_tgoid; /* the trigger's ID */ Oid ats_relid; /* the relation it's on */ CommandId ats_firing_id; /* ID for firing cycle */ ! TransitionCaptureState *ats_transition_capture; } AfterTriggerSharedData; typedef struct AfterTriggerEventData *AfterTriggerEvent; --- 3377,3383 ---- Oid ats_tgoid; /* the trigger's ID */ Oid ats_relid; /* the relation it's on */ CommandId ats_firing_id; /* ID for firing cycle */ ! struct AfterTriggersTableData *ats_table; /* transition table access */ } AfterTriggerSharedData; typedef struct AfterTriggerEventData *AfterTriggerEvent; *************** typedef struct AfterTriggerEventList *** 3505,3510 **** --- 3449,3462 ---- #define for_each_event_chunk(eptr, cptr, evtlist) \ for_each_chunk(cptr, evtlist) for_each_event(eptr, cptr) + /* Macros for iterating from a start point that might not be list start */ + #define for_each_chunk_from(cptr) \ + for (; cptr != NULL; cptr = cptr->next) + #define for_each_event_from(eptr, cptr) \ + for (; \ + (char *) eptr < (cptr)->freeptr; \ + eptr = (AfterTriggerEvent) (((char *) eptr) + SizeofTriggerEvent(eptr))) + /* * All per-transaction data for the AFTER TRIGGERS module. *************** typedef struct AfterTriggerEventList *** 3529,3588 **** * query_depth is the current depth of nested AfterTriggerBeginQuery calls * (-1 when the stack is empty). * ! * query_stack[query_depth] is a list of AFTER trigger events queued by the ! * current query (and the query_stack entries below it are lists of trigger ! * events queued by calling queries). None of these are valid until the ! * matching AfterTriggerEndQuery call occurs. At that point we fire ! * immediate-mode triggers, and append any deferred events to the main events ! * list. * ! * fdw_tuplestores[query_depth] is a tuplestore containing the foreign tuples ! * needed for the current query. * ! * maxquerydepth is just the allocated length of query_stack and the ! * tuplestores. * ! * state_stack is a stack of pointers to saved copies of the SET CONSTRAINTS ! * state data; each subtransaction level that modifies that state first * saves a copy, which we use to restore the state if we abort. * ! * events_stack is a stack of copies of the events head/tail pointers, * which we use to restore those values during subtransaction abort. * ! * depth_stack is a stack of copies of subtransaction-start-time query_depth, * which we similarly use to clean up at subtransaction abort. * ! * firing_stack is a stack of copies of subtransaction-start-time ! * firing_counter. We use this to recognize which deferred triggers were ! * fired (or marked for firing) within an aborted subtransaction. * * We use GetCurrentTransactionNestLevel() to determine the correct array ! * index in these stacks. maxtransdepth is the number of allocated entries in ! * each stack. (By not keeping our own stack pointer, we can avoid trouble * in cases where errors during subxact abort cause multiple invocations * of AfterTriggerEndSubXact() at the same nesting depth.) */ typedef struct AfterTriggersData { CommandId firing_counter; /* next firing ID to assign */ SetConstraintState state; /* the active S C state */ AfterTriggerEventList events; /* deferred-event list */ - int query_depth; /* current query list index */ - AfterTriggerEventList *query_stack; /* events pending from each query */ - Tuplestorestate **fdw_tuplestores; /* foreign tuples for one row from - * each query */ - int maxquerydepth; /* allocated len of above array */ MemoryContext event_cxt; /* memory context for events, if any */ ! /* these fields are just for resetting at subtrans abort: */ ! SetConstraintState *state_stack; /* stacked S C states */ ! AfterTriggerEventList *events_stack; /* stacked list pointers */ ! int *depth_stack; /* stacked query_depths */ ! CommandId *firing_stack; /* stacked firing_counters */ ! int maxtransdepth; /* allocated len of above arrays */ } AfterTriggersData; static AfterTriggersData afterTriggers; static void AfterTriggerExecute(AfterTriggerEvent event, --- 3481,3587 ---- * query_depth is the current depth of nested AfterTriggerBeginQuery calls * (-1 when the stack is empty). * ! * query_stack[query_depth] is the per-query-level data, including these fields: * ! * events is a list of AFTER trigger events queued by the current query. ! * None of these are valid until the matching AfterTriggerEndQuery call ! * occurs. At that point we fire immediate-mode triggers, and append any ! * deferred events to the main events list. * ! * fdw_tuplestore is a tuplestore containing the foreign-table tuples ! * needed by events queued by the current query. (Note: we use just one ! * tuplestore even though more than one foreign table might be involved. ! * This is okay because tuplestores don't really care what's in the tuples ! * they store; but it's possible that someday it'd break.) * ! * tables is a List of AfterTriggersTableData structs for target tables ! * of the current query (see below). ! * ! * maxquerydepth is just the allocated length of query_stack. ! * ! * trans_stack holds per-subtransaction data, including these fields: ! * ! * state is NULL or a pointer to a saved copy of the SET CONSTRAINTS ! * state data. Each subtransaction level that modifies that state first * saves a copy, which we use to restore the state if we abort. * ! * events is a copy of the events head/tail pointers, * which we use to restore those values during subtransaction abort. * ! * query_depth is the subtransaction-start-time value of query_depth, * which we similarly use to clean up at subtransaction abort. * ! * firing_counter is the subtransaction-start-time value of firing_counter. ! * We use this to recognize which deferred triggers were fired (or marked ! * for firing) within an aborted subtransaction. * * We use GetCurrentTransactionNestLevel() to determine the correct array ! * index in trans_stack. maxtransdepth is the number of allocated entries in ! * trans_stack. (By not keeping our own stack pointer, we can avoid trouble * in cases where errors during subxact abort cause multiple invocations * of AfterTriggerEndSubXact() at the same nesting depth.) + * + * We create an AfterTriggersTableData struct for each target table of the + * current query, and each operation mode (INSERT/UPDATE/DELETE), that has + * either transition tables or AFTER STATEMENT triggers. This is used to + * hold the relevant transition tables, as well as info tracking whether + * we already queued the AFTER STATEMENT triggers. (We use that info to + * prevent, as much as possible, firing the same AFTER STATEMENT trigger + * more than once per statement.) These structs, along with the transition + * table tuplestores, live in the (sub)transaction's CurTransactionContext. + * That's sufficient lifespan because we don't allow transition tables to be + * used by deferrable triggers, so they only need to survive until + * AfterTriggerEndQuery. */ + typedef struct AfterTriggersQueryData AfterTriggersQueryData; + typedef struct AfterTriggersTransData AfterTriggersTransData; + typedef struct AfterTriggersTableData AfterTriggersTableData; + typedef struct AfterTriggersData { CommandId firing_counter; /* next firing ID to assign */ SetConstraintState state; /* the active S C state */ AfterTriggerEventList events; /* deferred-event list */ MemoryContext event_cxt; /* memory context for events, if any */ ! /* per-query-level data: */ ! AfterTriggersQueryData *query_stack; /* array of structs shown above */ ! int query_depth; /* current index in above array */ ! int maxquerydepth; /* allocated len of above array */ ! /* per-subtransaction-level data: */ ! AfterTriggersTransData *trans_stack; /* array of structs shown above */ ! int maxtransdepth; /* allocated len of above array */ } AfterTriggersData; + struct AfterTriggersQueryData + { + AfterTriggerEventList events; /* events pending from this query */ + Tuplestorestate *fdw_tuplestore; /* foreign tuples for said events */ + List *tables; /* list of AfterTriggersTableData */ + }; + + struct AfterTriggersTransData + { + /* these fields are just for resetting at subtrans abort: */ + SetConstraintState state; /* saved S C state, or NULL if not yet saved */ + AfterTriggerEventList events; /* saved list pointer */ + int query_depth; /* saved query_depth */ + CommandId firing_counter; /* saved firing_counter */ + }; + + struct AfterTriggersTableData + { + /* relid + cmdType form the lookup key for these structs: */ + Oid relid; /* target table's OID */ + CmdType cmdType; /* event type, CMD_INSERT/UPDATE/DELETE */ + bool closed; /* true when no longer OK to add tuples */ + bool stmt_trig_done; /* did we already queue stmt-level triggers? */ + AfterTriggerEventList stmt_trig_events; /* if so, saved list pointer */ + Tuplestorestate *old_tuplestore; /* "old" transition table, if any */ + Tuplestorestate *new_tuplestore; /* "new" transition table, if any */ + }; + static AfterTriggersData afterTriggers; static void AfterTriggerExecute(AfterTriggerEvent event, *************** static void AfterTriggerExecute(AfterTri *** 3591,3628 **** Instrumentation *instr, MemoryContext per_tuple_context, TupleTableSlot *trig_tuple_slot1, ! TupleTableSlot *trig_tuple_slot2, ! TransitionCaptureState *transition_capture); static SetConstraintState SetConstraintStateCreate(int numalloc); static SetConstraintState SetConstraintStateCopy(SetConstraintState state); static SetConstraintState SetConstraintStateAddItem(SetConstraintState state, Oid tgoid, bool tgisdeferred); /* ! * Gets a current query transition tuplestore and initializes it if necessary. */ static Tuplestorestate * ! GetTriggerTransitionTuplestore(Tuplestorestate **tss) { Tuplestorestate *ret; ! ret = tss[afterTriggers.query_depth]; if (ret == NULL) { MemoryContext oldcxt; ResourceOwner saveResourceOwner; /* ! * Make the tuplestore valid until end of transaction. This is the ! * allocation lifespan of the associated events list, but we really * only need it until AfterTriggerEndQuery(). */ ! oldcxt = MemoryContextSwitchTo(TopTransactionContext); saveResourceOwner = CurrentResourceOwner; PG_TRY(); { ! CurrentResourceOwner = TopTransactionResourceOwner; ret = tuplestore_begin_heap(false, false, work_mem); } PG_CATCH(); --- 3590,3630 ---- Instrumentation *instr, MemoryContext per_tuple_context, TupleTableSlot *trig_tuple_slot1, ! TupleTableSlot *trig_tuple_slot2); ! static AfterTriggersTableData *GetAfterTriggersTableData(Oid relid, ! CmdType cmdType); ! static void AfterTriggerFreeQuery(AfterTriggersQueryData *qs); static SetConstraintState SetConstraintStateCreate(int numalloc); static SetConstraintState SetConstraintStateCopy(SetConstraintState state); static SetConstraintState SetConstraintStateAddItem(SetConstraintState state, Oid tgoid, bool tgisdeferred); + static void cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent); /* ! * Get the FDW tuplestore for the current trigger query level, creating it ! * if necessary. */ static Tuplestorestate * ! GetCurrentFDWTuplestore(void) { Tuplestorestate *ret; ! ret = afterTriggers.query_stack[afterTriggers.query_depth].fdw_tuplestore; if (ret == NULL) { MemoryContext oldcxt; ResourceOwner saveResourceOwner; /* ! * Make the tuplestore valid until end of subtransaction. We really * only need it until AfterTriggerEndQuery(). */ ! oldcxt = MemoryContextSwitchTo(CurTransactionContext); saveResourceOwner = CurrentResourceOwner; PG_TRY(); { ! CurrentResourceOwner = CurTransactionResourceOwner; ret = tuplestore_begin_heap(false, false, work_mem); } PG_CATCH(); *************** GetTriggerTransitionTuplestore(Tuplestor *** 3634,3640 **** CurrentResourceOwner = saveResourceOwner; MemoryContextSwitchTo(oldcxt); ! tss[afterTriggers.query_depth] = ret; } return ret; --- 3636,3642 ---- CurrentResourceOwner = saveResourceOwner; MemoryContextSwitchTo(oldcxt); ! afterTriggers.query_stack[afterTriggers.query_depth].fdw_tuplestore = ret; } return ret; *************** afterTriggerAddEvent(AfterTriggerEventLi *** 3780,3786 **** if (newshared->ats_tgoid == evtshared->ats_tgoid && newshared->ats_relid == evtshared->ats_relid && newshared->ats_event == evtshared->ats_event && ! newshared->ats_transition_capture == evtshared->ats_transition_capture && newshared->ats_firing_id == 0) break; } --- 3782,3788 ---- if (newshared->ats_tgoid == evtshared->ats_tgoid && newshared->ats_relid == evtshared->ats_relid && newshared->ats_event == evtshared->ats_event && ! newshared->ats_table == evtshared->ats_table && newshared->ats_firing_id == 0) break; } *************** AfterTriggerExecute(AfterTriggerEvent ev *** 3892,3899 **** FmgrInfo *finfo, Instrumentation *instr, MemoryContext per_tuple_context, TupleTableSlot *trig_tuple_slot1, ! TupleTableSlot *trig_tuple_slot2, ! TransitionCaptureState *transition_capture) { AfterTriggerShared evtshared = GetTriggerSharedData(event); Oid tgoid = evtshared->ats_tgoid; --- 3894,3900 ---- FmgrInfo *finfo, Instrumentation *instr, MemoryContext per_tuple_context, TupleTableSlot *trig_tuple_slot1, ! TupleTableSlot *trig_tuple_slot2) { AfterTriggerShared evtshared = GetTriggerSharedData(event); Oid tgoid = evtshared->ats_tgoid; *************** AfterTriggerExecute(AfterTriggerEvent ev *** 3934,3942 **** { case AFTER_TRIGGER_FDW_FETCH: { ! Tuplestorestate *fdw_tuplestore = ! GetTriggerTransitionTuplestore ! (afterTriggers.fdw_tuplestores); if (!tuplestore_gettupleslot(fdw_tuplestore, true, false, trig_tuple_slot1)) --- 3935,3941 ---- { case AFTER_TRIGGER_FDW_FETCH: { ! Tuplestorestate *fdw_tuplestore = GetCurrentFDWTuplestore(); if (!tuplestore_gettupleslot(fdw_tuplestore, true, false, trig_tuple_slot1)) *************** AfterTriggerExecute(AfterTriggerEvent ev *** 4006,4041 **** } /* ! * Set up the tuplestore information. */ LocTriggerData.tg_oldtable = LocTriggerData.tg_newtable = NULL; ! if (transition_capture != NULL) { if (LocTriggerData.tg_trigger->tgoldtable) - LocTriggerData.tg_oldtable = transition_capture->tcs_old_tuplestore; - if (LocTriggerData.tg_trigger->tgnewtable) { ! /* ! * Currently a trigger with transition tables may only be defined ! * for a single event type (here AFTER INSERT or AFTER UPDATE, but ! * not AFTER INSERT OR ...). ! */ ! Assert((TRIGGER_FOR_INSERT(LocTriggerData.tg_trigger->tgtype) != 0) ^ ! (TRIGGER_FOR_UPDATE(LocTriggerData.tg_trigger->tgtype) != 0)); ! /* ! * Show either the insert or update new tuple images, depending on ! * which event type the trigger was registered for. A single ! * statement may have produced both in the case of INSERT ... ON ! * CONFLICT ... DO UPDATE, and in that case the event determines ! * which tuplestore the trigger sees as the NEW TABLE. ! */ ! if (TRIGGER_FOR_INSERT(LocTriggerData.tg_trigger->tgtype)) ! LocTriggerData.tg_newtable = ! transition_capture->tcs_insert_tuplestore; ! else ! LocTriggerData.tg_newtable = ! transition_capture->tcs_update_tuplestore; } } --- 4005,4029 ---- } /* ! * Set up the tuplestore information to let the trigger have access to ! * transition tables. When we first make a transition table available to ! * a trigger, mark it "closed" so that it cannot change anymore. If any ! * additional events of the same type get queued in the current trigger ! * query level, they'll go into new transition tables. */ LocTriggerData.tg_oldtable = LocTriggerData.tg_newtable = NULL; ! if (evtshared->ats_table) { if (LocTriggerData.tg_trigger->tgoldtable) { ! LocTriggerData.tg_oldtable = evtshared->ats_table->old_tuplestore; ! evtshared->ats_table->closed = true; ! } ! if (LocTriggerData.tg_trigger->tgnewtable) ! { ! LocTriggerData.tg_newtable = evtshared->ats_table->new_tuplestore; ! evtshared->ats_table->closed = true; } } *************** afterTriggerInvokeEvents(AfterTriggerEve *** 4245,4252 **** * won't try to re-fire it. */ AfterTriggerExecute(event, rel, trigdesc, finfo, instr, ! per_tuple_context, slot1, slot2, ! evtshared->ats_transition_capture); /* * Mark the event as done. --- 4233,4239 ---- * won't try to re-fire it. */ AfterTriggerExecute(event, rel, trigdesc, finfo, instr, ! per_tuple_context, slot1, slot2); /* * Mark the event as done. *************** afterTriggerInvokeEvents(AfterTriggerEve *** 4296,4301 **** --- 4283,4448 ---- } + /* + * GetAfterTriggersTableData + * + * Find or create an AfterTriggersTableData struct for the specified + * trigger event (relation + operation type). Ignore existing structs + * marked "closed"; we don't want to put any additional tuples into them, + * nor change their stmt-triggers-fired state. + * + * Note: the AfterTriggersTableData list is allocated in the current + * (sub)transaction's CurTransactionContext. This is OK because + * we don't need it to live past AfterTriggerEndQuery. + */ + static AfterTriggersTableData * + GetAfterTriggersTableData(Oid relid, CmdType cmdType) + { + AfterTriggersTableData *table; + AfterTriggersQueryData *qs; + MemoryContext oldcxt; + ListCell *lc; + + /* Caller should have ensured query_depth is OK. */ + Assert(afterTriggers.query_depth >= 0 && + afterTriggers.query_depth < afterTriggers.maxquerydepth); + qs = &afterTriggers.query_stack[afterTriggers.query_depth]; + + foreach(lc, qs->tables) + { + table = (AfterTriggersTableData *) lfirst(lc); + if (table->relid == relid && table->cmdType == cmdType && + !table->closed) + return table; + } + + oldcxt = MemoryContextSwitchTo(CurTransactionContext); + + table = (AfterTriggersTableData *) palloc0(sizeof(AfterTriggersTableData)); + table->relid = relid; + table->cmdType = cmdType; + qs->tables = lappend(qs->tables, table); + + MemoryContextSwitchTo(oldcxt); + + return table; + } + + + /* + * MakeTransitionCaptureState + * + * Make a TransitionCaptureState object for the given TriggerDesc, target + * relation, and operation type. The TCS object holds all the state needed + * to decide whether to capture tuples in transition tables. + * + * If there are no triggers in 'trigdesc' that request relevant transition + * tables, then return NULL. + * + * The resulting object can be passed to the ExecAR* functions. The caller + * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing + * with child tables. + * + * Note that we copy the flags from a parent table into this struct (rather + * than subsequently using the relation's TriggerDesc directly) so that we can + * use it to control collection of transition tuples from child tables. + * + * Per SQL spec, all operations of the same kind (INSERT/UPDATE/DELETE) + * on the same table during one query should share one transition table. + * Therefore, the Tuplestores are owned by an AfterTriggersTableData struct + * looked up using the table OID + CmdType, and are merely referenced by + * the TransitionCaptureState objects we hand out to callers. + */ + TransitionCaptureState * + MakeTransitionCaptureState(TriggerDesc *trigdesc, Oid relid, CmdType cmdType) + { + TransitionCaptureState *state; + bool need_old, + need_new; + AfterTriggersTableData *table; + MemoryContext oldcxt; + ResourceOwner saveResourceOwner; + + if (trigdesc == NULL) + return NULL; + + /* Detect which table(s) we need. */ + switch (cmdType) + { + case CMD_INSERT: + need_old = false; + need_new = trigdesc->trig_insert_new_table; + break; + case CMD_UPDATE: + need_old = trigdesc->trig_update_old_table; + need_new = trigdesc->trig_update_new_table; + break; + case CMD_DELETE: + need_old = trigdesc->trig_delete_old_table; + need_new = false; + break; + default: + elog(ERROR, "unexpected CmdType: %d", (int) cmdType); + need_old = need_new = false; /* keep compiler quiet */ + break; + } + if (!need_old && !need_new) + return NULL; + + /* Check state, like AfterTriggerSaveEvent. */ + if (afterTriggers.query_depth < 0) + elog(ERROR, "MakeTransitionCaptureState() called outside of query"); + + /* Be sure we have enough space to record events at this query depth. */ + if (afterTriggers.query_depth >= afterTriggers.maxquerydepth) + AfterTriggerEnlargeQueryState(); + + /* + * Find or create an AfterTriggersTableData struct to hold the + * tuplestore(s). If there's a matching struct but it's marked closed, + * ignore it; we need a newer one. + * + * Note: the AfterTriggersTableData list, as well as the tuplestores, are + * allocated in the current (sub)transaction's CurTransactionContext, and + * the tuplestores are managed by the (sub)transaction's resource owner. + * This is sufficient lifespan because we do not allow triggers using + * transition tables to be deferrable; they will be fired during + * AfterTriggerEndQuery, after which it's okay to delete the data. + */ + table = GetAfterTriggersTableData(relid, cmdType); + + /* Now create required tuplestore(s), if we don't have them already. */ + oldcxt = MemoryContextSwitchTo(CurTransactionContext); + saveResourceOwner = CurrentResourceOwner; + PG_TRY(); + { + CurrentResourceOwner = CurTransactionResourceOwner; + if (need_old && table->old_tuplestore == NULL) + table->old_tuplestore = tuplestore_begin_heap(false, false, work_mem); + if (need_new && table->new_tuplestore == NULL) + table->new_tuplestore = tuplestore_begin_heap(false, false, work_mem); + } + PG_CATCH(); + { + CurrentResourceOwner = saveResourceOwner; + PG_RE_THROW(); + } + PG_END_TRY(); + CurrentResourceOwner = saveResourceOwner; + MemoryContextSwitchTo(oldcxt); + + /* Now build the TransitionCaptureState struct, in caller's context */ + state = (TransitionCaptureState *) palloc0(sizeof(TransitionCaptureState)); + state->tcs_delete_old_table = trigdesc->trig_delete_old_table; + state->tcs_update_old_table = trigdesc->trig_update_old_table; + state->tcs_update_new_table = trigdesc->trig_update_new_table; + state->tcs_insert_new_table = trigdesc->trig_insert_new_table; + state->tcs_private = table; + + return state; + } + + /* ---------- * AfterTriggerBeginXact() * *************** AfterTriggerBeginXact(void) *** 4319,4332 **** */ Assert(afterTriggers.state == NULL); Assert(afterTriggers.query_stack == NULL); - Assert(afterTriggers.fdw_tuplestores == NULL); Assert(afterTriggers.maxquerydepth == 0); Assert(afterTriggers.event_cxt == NULL); Assert(afterTriggers.events.head == NULL); ! Assert(afterTriggers.state_stack == NULL); ! Assert(afterTriggers.events_stack == NULL); ! Assert(afterTriggers.depth_stack == NULL); ! Assert(afterTriggers.firing_stack == NULL); Assert(afterTriggers.maxtransdepth == 0); } --- 4466,4475 ---- */ Assert(afterTriggers.state == NULL); Assert(afterTriggers.query_stack == NULL); Assert(afterTriggers.maxquerydepth == 0); Assert(afterTriggers.event_cxt == NULL); Assert(afterTriggers.events.head == NULL); ! Assert(afterTriggers.trans_stack == NULL); Assert(afterTriggers.maxtransdepth == 0); } *************** AfterTriggerBeginQuery(void) *** 4362,4370 **** void AfterTriggerEndQuery(EState *estate) { - AfterTriggerEventList *events; - Tuplestorestate *fdw_tuplestore; - /* Must be inside a query, too */ Assert(afterTriggers.query_depth >= 0); --- 4505,4510 ---- *************** AfterTriggerEndQuery(EState *estate) *** 4393,4430 **** * will instead fire any triggers in a dedicated query level. Foreign key * enforcement triggers do add to the current query level, thanks to their * passing fire_triggers = false to SPI_execute_snapshot(). Other ! * C-language triggers might do likewise. Be careful here: firing a ! * trigger could result in query_stack being repalloc'd, so we can't save ! * its address across afterTriggerInvokeEvents calls. * * If we find no firable events, we don't have to increment * firing_counter. */ for (;;) { ! events = &afterTriggers.query_stack[afterTriggers.query_depth]; ! if (afterTriggerMarkEvents(events, &afterTriggers.events, true)) { CommandId firing_id = afterTriggers.firing_counter++; /* OK to delete the immediate events after processing them */ ! if (afterTriggerInvokeEvents(events, firing_id, estate, true)) break; /* all fired */ } else break; } ! /* Release query-local storage for events, including tuplestore if any */ ! fdw_tuplestore = afterTriggers.fdw_tuplestores[afterTriggers.query_depth]; ! if (fdw_tuplestore) { ! tuplestore_end(fdw_tuplestore); ! afterTriggers.fdw_tuplestores[afterTriggers.query_depth] = NULL; } - afterTriggerFreeEventList(&afterTriggers.query_stack[afterTriggers.query_depth]); ! afterTriggers.query_depth--; } --- 4533,4618 ---- * will instead fire any triggers in a dedicated query level. Foreign key * enforcement triggers do add to the current query level, thanks to their * passing fire_triggers = false to SPI_execute_snapshot(). Other ! * C-language triggers might do likewise. * * If we find no firable events, we don't have to increment * firing_counter. */ for (;;) { ! AfterTriggersQueryData *qs; ! ! /* ! * Firing a trigger could result in query_stack being repalloc'd, so ! * we must recalculate qs after each afterTriggerInvokeEvents call. ! */ ! qs = &afterTriggers.query_stack[afterTriggers.query_depth]; ! ! if (afterTriggerMarkEvents(&qs->events, &afterTriggers.events, true)) { CommandId firing_id = afterTriggers.firing_counter++; /* OK to delete the immediate events after processing them */ ! if (afterTriggerInvokeEvents(&qs->events, firing_id, estate, true)) break; /* all fired */ } else break; } ! /* Release query-level-local storage, including tuplestores if any */ ! AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]); ! ! afterTriggers.query_depth--; ! } ! ! ! /* ! * AfterTriggerFreeQuery ! * Release subsidiary storage for a trigger query level. ! * This includes closing down tuplestores. ! * Note: it's important for this to be safe if interrupted by an error ! * and then called again for the same query level. ! */ ! static void ! AfterTriggerFreeQuery(AfterTriggersQueryData *qs) ! { ! Tuplestorestate *ts; ! List *tables; ! ListCell *lc; ! ! /* Drop the trigger events */ ! afterTriggerFreeEventList(&qs->events); ! ! /* Drop FDW tuplestore if any */ ! ts = qs->fdw_tuplestore; ! qs->fdw_tuplestore = NULL; ! if (ts) ! tuplestore_end(ts); ! ! /* Release per-table subsidiary storage */ ! tables = qs->tables; ! foreach(lc, tables) { ! AfterTriggersTableData *table = (AfterTriggersTableData *) lfirst(lc); ! ! ts = table->old_tuplestore; ! table->old_tuplestore = NULL; ! if (ts) ! tuplestore_end(ts); ! ts = table->new_tuplestore; ! table->new_tuplestore = NULL; ! if (ts) ! tuplestore_end(ts); } ! /* ! * Now free the AfterTriggersTableData structs and list cells. Reset list ! * pointer first; if list_free_deep somehow gets an error, better to leak ! * that storage than have an infinite loop. ! */ ! qs->tables = NIL; ! list_free_deep(tables); } *************** AfterTriggerEndXact(bool isCommit) *** 4521,4530 **** * large, we let the eventual reset of TopTransactionContext free the * memory instead of doing it here. */ ! afterTriggers.state_stack = NULL; ! afterTriggers.events_stack = NULL; ! afterTriggers.depth_stack = NULL; ! afterTriggers.firing_stack = NULL; afterTriggers.maxtransdepth = 0; --- 4709,4715 ---- * large, we let the eventual reset of TopTransactionContext free the * memory instead of doing it here. */ ! afterTriggers.trans_stack = NULL; afterTriggers.maxtransdepth = 0; *************** AfterTriggerEndXact(bool isCommit) *** 4534,4540 **** * memory here. */ afterTriggers.query_stack = NULL; - afterTriggers.fdw_tuplestores = NULL; afterTriggers.maxquerydepth = 0; afterTriggers.state = NULL; --- 4719,4724 ---- *************** AfterTriggerBeginSubXact(void) *** 4553,4600 **** int my_level = GetCurrentTransactionNestLevel(); /* ! * Allocate more space in the stacks if needed. (Note: because the * minimum nest level of a subtransaction is 2, we waste the first couple ! * entries of each array; not worth the notational effort to avoid it.) */ while (my_level >= afterTriggers.maxtransdepth) { if (afterTriggers.maxtransdepth == 0) { ! MemoryContext old_cxt; ! ! old_cxt = MemoryContextSwitchTo(TopTransactionContext); ! ! #define DEFTRIG_INITALLOC 8 ! afterTriggers.state_stack = (SetConstraintState *) ! palloc(DEFTRIG_INITALLOC * sizeof(SetConstraintState)); ! afterTriggers.events_stack = (AfterTriggerEventList *) ! palloc(DEFTRIG_INITALLOC * sizeof(AfterTriggerEventList)); ! afterTriggers.depth_stack = (int *) ! palloc(DEFTRIG_INITALLOC * sizeof(int)); ! afterTriggers.firing_stack = (CommandId *) ! palloc(DEFTRIG_INITALLOC * sizeof(CommandId)); ! afterTriggers.maxtransdepth = DEFTRIG_INITALLOC; ! ! MemoryContextSwitchTo(old_cxt); } else { ! /* repalloc will keep the stacks in the same context */ int new_alloc = afterTriggers.maxtransdepth * 2; ! afterTriggers.state_stack = (SetConstraintState *) ! repalloc(afterTriggers.state_stack, ! new_alloc * sizeof(SetConstraintState)); ! afterTriggers.events_stack = (AfterTriggerEventList *) ! repalloc(afterTriggers.events_stack, ! new_alloc * sizeof(AfterTriggerEventList)); ! afterTriggers.depth_stack = (int *) ! repalloc(afterTriggers.depth_stack, ! new_alloc * sizeof(int)); ! afterTriggers.firing_stack = (CommandId *) ! repalloc(afterTriggers.firing_stack, ! new_alloc * sizeof(CommandId)); afterTriggers.maxtransdepth = new_alloc; } } --- 4737,4764 ---- int my_level = GetCurrentTransactionNestLevel(); /* ! * Allocate more space in the trans_stack if needed. (Note: because the * minimum nest level of a subtransaction is 2, we waste the first couple ! * entries of the array; not worth the notational effort to avoid it.) */ while (my_level >= afterTriggers.maxtransdepth) { if (afterTriggers.maxtransdepth == 0) { ! /* Arbitrarily initialize for max of 8 subtransaction levels */ ! afterTriggers.trans_stack = (AfterTriggersTransData *) ! MemoryContextAlloc(TopTransactionContext, ! 8 * sizeof(AfterTriggersTransData)); ! afterTriggers.maxtransdepth = 8; } else { ! /* repalloc will keep the stack in the same context */ int new_alloc = afterTriggers.maxtransdepth * 2; ! afterTriggers.trans_stack = (AfterTriggersTransData *) ! repalloc(afterTriggers.trans_stack, ! new_alloc * sizeof(AfterTriggersTransData)); afterTriggers.maxtransdepth = new_alloc; } } *************** AfterTriggerBeginSubXact(void) *** 4604,4613 **** * is not saved until/unless changed. Likewise, we don't make a * per-subtransaction event context until needed. */ ! afterTriggers.state_stack[my_level] = NULL; ! afterTriggers.events_stack[my_level] = afterTriggers.events; ! afterTriggers.depth_stack[my_level] = afterTriggers.query_depth; ! afterTriggers.firing_stack[my_level] = afterTriggers.firing_counter; } /* --- 4768,4777 ---- * is not saved until/unless changed. Likewise, we don't make a * per-subtransaction event context until needed. */ ! afterTriggers.trans_stack[my_level].state = NULL; ! afterTriggers.trans_stack[my_level].events = afterTriggers.events; ! afterTriggers.trans_stack[my_level].query_depth = afterTriggers.query_depth; ! afterTriggers.trans_stack[my_level].firing_counter = afterTriggers.firing_counter; } /* *************** AfterTriggerEndSubXact(bool isCommit) *** 4631,4700 **** { Assert(my_level < afterTriggers.maxtransdepth); /* If we saved a prior state, we don't need it anymore */ ! state = afterTriggers.state_stack[my_level]; if (state != NULL) pfree(state); /* this avoids double pfree if error later: */ ! afterTriggers.state_stack[my_level] = NULL; Assert(afterTriggers.query_depth == ! afterTriggers.depth_stack[my_level]); } else { /* * Aborting. It is possible subxact start failed before calling * AfterTriggerBeginSubXact, in which case we mustn't risk touching ! * stack levels that aren't there. */ if (my_level >= afterTriggers.maxtransdepth) return; /* ! * Release any event lists from queries being aborted, and restore * query_depth to its pre-subxact value. This assumes that a * subtransaction will not add events to query levels started in a * earlier transaction state. */ ! while (afterTriggers.query_depth > afterTriggers.depth_stack[my_level]) { if (afterTriggers.query_depth < afterTriggers.maxquerydepth) ! { ! Tuplestorestate *ts; ! ! ts = afterTriggers.fdw_tuplestores[afterTriggers.query_depth]; ! if (ts) ! { ! tuplestore_end(ts); ! afterTriggers.fdw_tuplestores[afterTriggers.query_depth] = NULL; ! } ! ! afterTriggerFreeEventList(&afterTriggers.query_stack[afterTriggers.query_depth]); ! } ! afterTriggers.query_depth--; } Assert(afterTriggers.query_depth == ! afterTriggers.depth_stack[my_level]); /* * Restore the global deferred-event list to its former length, * discarding any events queued by the subxact. */ afterTriggerRestoreEventList(&afterTriggers.events, ! &afterTriggers.events_stack[my_level]); /* * Restore the trigger state. If the saved state is NULL, then this * subxact didn't save it, so it doesn't need restoring. */ ! state = afterTriggers.state_stack[my_level]; if (state != NULL) { pfree(afterTriggers.state); afterTriggers.state = state; } /* this avoids double pfree if error later: */ ! afterTriggers.state_stack[my_level] = NULL; /* * Scan for any remaining deferred events that were marked DONE or IN --- 4795,4852 ---- { Assert(my_level < afterTriggers.maxtransdepth); /* If we saved a prior state, we don't need it anymore */ ! state = afterTriggers.trans_stack[my_level].state; if (state != NULL) pfree(state); /* this avoids double pfree if error later: */ ! afterTriggers.trans_stack[my_level].state = NULL; Assert(afterTriggers.query_depth == ! afterTriggers.trans_stack[my_level].query_depth); } else { /* * Aborting. It is possible subxact start failed before calling * AfterTriggerBeginSubXact, in which case we mustn't risk touching ! * trans_stack levels that aren't there. */ if (my_level >= afterTriggers.maxtransdepth) return; /* ! * Release query-level storage for queries being aborted, and restore * query_depth to its pre-subxact value. This assumes that a * subtransaction will not add events to query levels started in a * earlier transaction state. */ ! while (afterTriggers.query_depth > afterTriggers.trans_stack[my_level].query_depth) { if (afterTriggers.query_depth < afterTriggers.maxquerydepth) ! AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]); afterTriggers.query_depth--; } Assert(afterTriggers.query_depth == ! afterTriggers.trans_stack[my_level].query_depth); /* * Restore the global deferred-event list to its former length, * discarding any events queued by the subxact. */ afterTriggerRestoreEventList(&afterTriggers.events, ! &afterTriggers.trans_stack[my_level].events); /* * Restore the trigger state. If the saved state is NULL, then this * subxact didn't save it, so it doesn't need restoring. */ ! state = afterTriggers.trans_stack[my_level].state; if (state != NULL) { pfree(afterTriggers.state); afterTriggers.state = state; } /* this avoids double pfree if error later: */ ! afterTriggers.trans_stack[my_level].state = NULL; /* * Scan for any remaining deferred events that were marked DONE or IN *************** AfterTriggerEndSubXact(bool isCommit) *** 4704,4710 **** * (This essentially assumes that the current subxact includes all * subxacts started after it.) */ ! subxact_firing_id = afterTriggers.firing_stack[my_level]; for_each_event_chunk(event, chunk, afterTriggers.events) { AfterTriggerShared evtshared = GetTriggerSharedData(event); --- 4856,4862 ---- * (This essentially assumes that the current subxact includes all * subxacts started after it.) */ ! subxact_firing_id = afterTriggers.trans_stack[my_level].firing_counter; for_each_event_chunk(event, chunk, afterTriggers.events) { AfterTriggerShared evtshared = GetTriggerSharedData(event); *************** AfterTriggerEnlargeQueryState(void) *** 4740,4751 **** { int new_alloc = Max(afterTriggers.query_depth + 1, 8); ! afterTriggers.query_stack = (AfterTriggerEventList *) MemoryContextAlloc(TopTransactionContext, ! new_alloc * sizeof(AfterTriggerEventList)); ! afterTriggers.fdw_tuplestores = (Tuplestorestate **) ! MemoryContextAllocZero(TopTransactionContext, ! new_alloc * sizeof(Tuplestorestate *)); afterTriggers.maxquerydepth = new_alloc; } else --- 4892,4900 ---- { int new_alloc = Max(afterTriggers.query_depth + 1, 8); ! afterTriggers.query_stack = (AfterTriggersQueryData *) MemoryContextAlloc(TopTransactionContext, ! new_alloc * sizeof(AfterTriggersQueryData)); afterTriggers.maxquerydepth = new_alloc; } else *************** AfterTriggerEnlargeQueryState(void) *** 4755,4781 **** int new_alloc = Max(afterTriggers.query_depth + 1, old_alloc * 2); ! afterTriggers.query_stack = (AfterTriggerEventList *) repalloc(afterTriggers.query_stack, ! new_alloc * sizeof(AfterTriggerEventList)); ! afterTriggers.fdw_tuplestores = (Tuplestorestate **) ! repalloc(afterTriggers.fdw_tuplestores, ! new_alloc * sizeof(Tuplestorestate *)); ! /* Clear newly-allocated slots for subsequent lazy initialization. */ ! memset(afterTriggers.fdw_tuplestores + old_alloc, ! 0, (new_alloc - old_alloc) * sizeof(Tuplestorestate *)); afterTriggers.maxquerydepth = new_alloc; } ! /* Initialize new query lists to empty */ while (init_depth < afterTriggers.maxquerydepth) { ! AfterTriggerEventList *events; ! events = &afterTriggers.query_stack[init_depth]; ! events->head = NULL; ! events->tail = NULL; ! events->tailfree = NULL; ++init_depth; } --- 4904,4925 ---- int new_alloc = Max(afterTriggers.query_depth + 1, old_alloc * 2); ! afterTriggers.query_stack = (AfterTriggersQueryData *) repalloc(afterTriggers.query_stack, ! new_alloc * sizeof(AfterTriggersQueryData)); afterTriggers.maxquerydepth = new_alloc; } ! /* Initialize new array entries to empty */ while (init_depth < afterTriggers.maxquerydepth) { ! AfterTriggersQueryData *qs = &afterTriggers.query_stack[init_depth]; ! qs->events.head = NULL; ! qs->events.tail = NULL; ! qs->events.tailfree = NULL; ! qs->fdw_tuplestore = NULL; ! qs->tables = NIL; ++init_depth; } *************** AfterTriggerSetState(ConstraintsSetStmt *** 4873,4881 **** * save it so it can be restored if the subtransaction aborts. */ if (my_level > 1 && ! afterTriggers.state_stack[my_level] == NULL) { ! afterTriggers.state_stack[my_level] = SetConstraintStateCopy(afterTriggers.state); } --- 5017,5025 ---- * save it so it can be restored if the subtransaction aborts. */ if (my_level > 1 && ! afterTriggers.trans_stack[my_level].state == NULL) { ! afterTriggers.trans_stack[my_level].state = SetConstraintStateCopy(afterTriggers.state); } *************** AfterTriggerPendingOnRel(Oid relid) *** 5184,5190 **** */ for (depth = 0; depth <= afterTriggers.query_depth && depth < afterTriggers.maxquerydepth; depth++) { ! for_each_event_chunk(event, chunk, afterTriggers.query_stack[depth]) { AfterTriggerShared evtshared = GetTriggerSharedData(event); --- 5328,5334 ---- */ for (depth = 0; depth <= afterTriggers.query_depth && depth < afterTriggers.maxquerydepth; depth++) { ! for_each_event_chunk(event, chunk, afterTriggers.query_stack[depth].events) { AfterTriggerShared evtshared = GetTriggerSharedData(event); *************** AfterTriggerSaveEvent(EState *estate, Re *** 5229,5235 **** TriggerDesc *trigdesc = relinfo->ri_TrigDesc; AfterTriggerEventData new_event; AfterTriggerSharedData new_shared; ! char relkind = relinfo->ri_RelationDesc->rd_rel->relkind; int tgtype_event; int tgtype_level; int i; --- 5373,5379 ---- TriggerDesc *trigdesc = relinfo->ri_TrigDesc; AfterTriggerEventData new_event; AfterTriggerSharedData new_shared; ! char relkind = rel->rd_rel->relkind; int tgtype_event; int tgtype_level; int i; *************** AfterTriggerSaveEvent(EState *estate, Re *** 5266,5272 **** Tuplestorestate *old_tuplestore; Assert(oldtup != NULL); ! old_tuplestore = transition_capture->tcs_old_tuplestore; if (map != NULL) { --- 5410,5416 ---- Tuplestorestate *old_tuplestore; Assert(oldtup != NULL); ! old_tuplestore = transition_capture->tcs_private->old_tuplestore; if (map != NULL) { *************** AfterTriggerSaveEvent(EState *estate, Re *** 5284,5293 **** Tuplestorestate *new_tuplestore; Assert(newtup != NULL); ! if (event == TRIGGER_EVENT_INSERT) ! new_tuplestore = transition_capture->tcs_insert_tuplestore; ! else ! new_tuplestore = transition_capture->tcs_update_tuplestore; if (original_insert_tuple != NULL) tuplestore_puttuple(new_tuplestore, original_insert_tuple); --- 5428,5434 ---- Tuplestorestate *new_tuplestore; Assert(newtup != NULL); ! new_tuplestore = transition_capture->tcs_private->new_tuplestore; if (original_insert_tuple != NULL) tuplestore_puttuple(new_tuplestore, original_insert_tuple); *************** AfterTriggerSaveEvent(EState *estate, Re *** 5316,5321 **** --- 5457,5467 ---- * The event code will be used both as a bitmask and an array offset, so * validation is important to make sure we don't walk off the edge of our * arrays. + * + * Also, if we're considering statement-level triggers, check whether we + * already queued a set of them for this event, and cancel the prior set + * if so. This preserves the behavior that statement-level triggers fire + * just once per statement and fire after row-level triggers. */ switch (event) { *************** AfterTriggerSaveEvent(EState *estate, Re *** 5334,5339 **** --- 5480,5487 ---- Assert(newtup == NULL); ItemPointerSetInvalid(&(new_event.ate_ctid1)); ItemPointerSetInvalid(&(new_event.ate_ctid2)); + cancel_prior_stmt_triggers(RelationGetRelid(rel), + CMD_INSERT, event); } break; case TRIGGER_EVENT_DELETE: *************** AfterTriggerSaveEvent(EState *estate, Re *** 5351,5356 **** --- 5499,5506 ---- Assert(newtup == NULL); ItemPointerSetInvalid(&(new_event.ate_ctid1)); ItemPointerSetInvalid(&(new_event.ate_ctid2)); + cancel_prior_stmt_triggers(RelationGetRelid(rel), + CMD_DELETE, event); } break; case TRIGGER_EVENT_UPDATE: *************** AfterTriggerSaveEvent(EState *estate, Re *** 5368,5373 **** --- 5518,5525 ---- Assert(newtup == NULL); ItemPointerSetInvalid(&(new_event.ate_ctid1)); ItemPointerSetInvalid(&(new_event.ate_ctid2)); + cancel_prior_stmt_triggers(RelationGetRelid(rel), + CMD_UPDATE, event); } break; case TRIGGER_EVENT_TRUNCATE: *************** AfterTriggerSaveEvent(EState *estate, Re *** 5407,5415 **** { if (fdw_tuplestore == NULL) { ! fdw_tuplestore = ! GetTriggerTransitionTuplestore ! (afterTriggers.fdw_tuplestores); new_event.ate_flags = AFTER_TRIGGER_FDW_FETCH; } else --- 5559,5565 ---- { if (fdw_tuplestore == NULL) { ! fdw_tuplestore = GetCurrentFDWTuplestore(); new_event.ate_flags = AFTER_TRIGGER_FDW_FETCH; } else *************** AfterTriggerSaveEvent(EState *estate, Re *** 5465,5470 **** --- 5615,5622 ---- /* * Fill in event structure and add it to the current query's queue. + * Note we set ats_table to NULL whenever this trigger doesn't use + * transition tables, to improve sharability of the shared event data. */ new_shared.ats_event = (event & TRIGGER_EVENT_OPMASK) | *************** AfterTriggerSaveEvent(EState *estate, Re *** 5474,5484 **** new_shared.ats_tgoid = trigger->tgoid; new_shared.ats_relid = RelationGetRelid(rel); new_shared.ats_firing_id = 0; ! /* deferrable triggers cannot access transition data */ ! new_shared.ats_transition_capture = ! trigger->tgdeferrable ? NULL : transition_capture; ! afterTriggerAddEvent(&afterTriggers.query_stack[afterTriggers.query_depth], &new_event, &new_shared); } --- 5626,5638 ---- new_shared.ats_tgoid = trigger->tgoid; new_shared.ats_relid = RelationGetRelid(rel); new_shared.ats_firing_id = 0; ! if ((trigger->tgoldtable || trigger->tgnewtable) && ! transition_capture != NULL) ! new_shared.ats_table = transition_capture->tcs_private; ! else ! new_shared.ats_table = NULL; ! afterTriggerAddEvent(&afterTriggers.query_stack[afterTriggers.query_depth].events, &new_event, &new_shared); } *************** AfterTriggerSaveEvent(EState *estate, Re *** 5496,5501 **** --- 5650,5749 ---- } } + /* + * If we previously queued a set of AFTER STATEMENT triggers for the given + * relation + operation, and they've not been fired yet, cancel them. The + * caller will queue a fresh set that's after any row-level triggers that may + * have been queued by the current sub-statement, preserving (as much as + * possible) the property that AFTER ROW triggers fire before AFTER STATEMENT + * triggers, and that the latter only fire once. This deals with the + * situation where several FK enforcement triggers sequentially queue triggers + * for the same table into the same trigger query level. We can't fully + * prevent odd behavior though: if there are AFTER ROW triggers taking + * transition tables, we don't want to change the transition tables once the + * first such trigger has seen them. In such a case, any additional events + * will result in creating new transition tables and allowing new firings of + * statement triggers. + * + * This also saves the current event list location so that a later invocation + * of this function can cheaply find the triggers we're about to queue and + * cancel them. + */ + static void + cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent) + { + AfterTriggersTableData *table; + AfterTriggersQueryData *qs = &afterTriggers.query_stack[afterTriggers.query_depth]; + + /* + * We keep this state in the AfterTriggersTableData that also holds + * transition tables for the relation + operation. In this way, if we are + * forced to make a new set of transition tables because more tuples get + * entered after we've already fired triggers, we will allow a new set of + * statement triggers to get queued without canceling the old ones. + */ + table = GetAfterTriggersTableData(relid, cmdType); + + if (table->stmt_trig_done) + { + /* + * We want to start scanning from the tail location that existed just + * before we inserted any statement triggers. But the events list + * might've been entirely empty then, in which case scan from the + * current head. + */ + AfterTriggerEvent event; + AfterTriggerEventChunk *chunk; + + if (table->stmt_trig_events.tail) + { + chunk = table->stmt_trig_events.tail; + event = (AfterTriggerEvent) table->stmt_trig_events.tailfree; + } + else + { + chunk = qs->events.head; + event = NULL; + } + + for_each_chunk_from(chunk) + { + if (event == NULL) + event = (AfterTriggerEvent) CHUNK_DATA_START(chunk); + for_each_event_from(event, chunk) + { + AfterTriggerShared evtshared = GetTriggerSharedData(event); + + /* + * Exit loop when we reach events that aren't AS triggers for + * the target relation. + */ + if (evtshared->ats_relid != relid) + goto done; + if ((evtshared->ats_event & TRIGGER_EVENT_OPMASK) != tgevent) + goto done; + if (!TRIGGER_FIRED_FOR_STATEMENT(evtshared->ats_event)) + goto done; + if (!TRIGGER_FIRED_AFTER(evtshared->ats_event)) + goto done; + /* OK, mark it DONE */ + event->ate_flags &= ~AFTER_TRIGGER_IN_PROGRESS; + event->ate_flags |= AFTER_TRIGGER_DONE; + } + /* signal we must reinitialize event ptr for next chunk */ + event = NULL; + } + } + done: + + /* In any case, save current insertion point for next time */ + table->stmt_trig_done = true; + table->stmt_trig_events = qs->events; + } + + /* + * SQL function pg_trigger_depth() + */ Datum pg_trigger_depth(PG_FUNCTION_ARGS) { diff --git a/src/backend/executor/README b/src/backend/executor/README index a004506..b3e74aa 100644 *** a/src/backend/executor/README --- b/src/backend/executor/README *************** This is a sketch of control flow for ful *** 241,251 **** CreateExecutorState creates per-query context switch to per-query context to run ExecInitNode ExecInitNode --- recursively scans plan tree CreateExprContext creates per-tuple context ExecInitExpr - AfterTriggerBeginQuery ExecutorRun ExecProcNode --- recursively called in per-query context --- 241,251 ---- CreateExecutorState creates per-query context switch to per-query context to run ExecInitNode + AfterTriggerBeginQuery ExecInitNode --- recursively scans plan tree CreateExprContext creates per-tuple context ExecInitExpr ExecutorRun ExecProcNode --- recursively called in per-query context diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 4b594d4..6255cc6 100644 *** a/src/backend/executor/execMain.c --- b/src/backend/executor/execMain.c *************** standard_ExecutorStart(QueryDesc *queryD *** 252,268 **** estate->es_instrument = queryDesc->instrument_options; /* - * Initialize the plan state tree - */ - InitPlan(queryDesc, eflags); - - /* * Set up an AFTER-trigger statement context, unless told not to, or * unless it's EXPLAIN-only mode (when ExecutorFinish won't be called). */ if (!(eflags & (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY))) AfterTriggerBeginQuery(); MemoryContextSwitchTo(oldcontext); } --- 252,268 ---- estate->es_instrument = queryDesc->instrument_options; /* * Set up an AFTER-trigger statement context, unless told not to, or * unless it's EXPLAIN-only mode (when ExecutorFinish won't be called). */ if (!(eflags & (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY))) AfterTriggerBeginQuery(); + /* + * Initialize the plan state tree + */ + InitPlan(queryDesc, eflags); + MemoryContextSwitchTo(oldcontext); } diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 49586a3..845c409 100644 *** a/src/backend/executor/nodeModifyTable.c --- b/src/backend/executor/nodeModifyTable.c *************** ExecInsert(ModifyTableState *mtstate, *** 343,348 **** --- 343,351 ---- mtstate->mt_transition_capture->tcs_map = NULL; } } + if (mtstate->mt_oc_transition_capture != NULL) + mtstate->mt_oc_transition_capture->tcs_map = + mtstate->mt_transition_tupconv_maps[leaf_part_index]; /* * We might need to convert from the parent rowtype to the partition *************** lreplace:; *** 1158,1163 **** --- 1161,1168 ---- /* AFTER ROW UPDATE Triggers */ ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple, tuple, recheckIndexes, + mtstate->operation == CMD_INSERT ? + mtstate->mt_oc_transition_capture : mtstate->mt_transition_capture); list_free(recheckIndexes); *************** fireASTriggers(ModifyTableState *node) *** 1444,1450 **** if (node->mt_onconflict == ONCONFLICT_UPDATE) ExecASUpdateTriggers(node->ps.state, resultRelInfo, ! node->mt_transition_capture); ExecASInsertTriggers(node->ps.state, resultRelInfo, node->mt_transition_capture); break; --- 1449,1455 ---- if (node->mt_onconflict == ONCONFLICT_UPDATE) ExecASUpdateTriggers(node->ps.state, resultRelInfo, ! node->mt_oc_transition_capture); ExecASInsertTriggers(node->ps.state, resultRelInfo, node->mt_transition_capture); break; *************** ExecSetupTransitionCaptureState(ModifyTa *** 1474,1487 **** /* Check for transition tables on the directly targeted relation. */ mtstate->mt_transition_capture = ! MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc); /* * If we found that we need to collect transition tuples then we may also * need tuple conversion maps for any children that have TupleDescs that ! * aren't compatible with the tuplestores. */ ! if (mtstate->mt_transition_capture != NULL) { ResultRelInfo *resultRelInfos; int numResultRelInfos; --- 1479,1502 ---- /* Check for transition tables on the directly targeted relation. */ mtstate->mt_transition_capture = ! MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc, ! RelationGetRelid(targetRelInfo->ri_RelationDesc), ! mtstate->operation); ! if (mtstate->operation == CMD_INSERT && ! mtstate->mt_onconflict == ONCONFLICT_UPDATE) ! mtstate->mt_oc_transition_capture = ! MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc, ! RelationGetRelid(targetRelInfo->ri_RelationDesc), ! CMD_UPDATE); /* * If we found that we need to collect transition tuples then we may also * need tuple conversion maps for any children that have TupleDescs that ! * aren't compatible with the tuplestores. (We can share these maps ! * between the regular and ON CONFLICT cases.) */ ! if (mtstate->mt_transition_capture != NULL || ! mtstate->mt_oc_transition_capture != NULL) { ResultRelInfo *resultRelInfos; int numResultRelInfos; *************** ExecSetupTransitionCaptureState(ModifyTa *** 1522,1531 **** /* * Install the conversion map for the first plan for UPDATE and DELETE * operations. It will be advanced each time we switch to the next ! * plan. (INSERT operations set it every time.) */ ! mtstate->mt_transition_capture->tcs_map = ! mtstate->mt_transition_tupconv_maps[0]; } } --- 1537,1548 ---- /* * Install the conversion map for the first plan for UPDATE and DELETE * operations. It will be advanced each time we switch to the next ! * plan. (INSERT operations set it every time, so we need not update ! * mtstate->mt_oc_transition_capture here.) */ ! if (mtstate->mt_transition_capture) ! mtstate->mt_transition_capture->tcs_map = ! mtstate->mt_transition_tupconv_maps[0]; } } *************** ExecModifyTable(PlanState *pstate) *** 1629,1641 **** estate->es_result_relation_info = resultRelInfo; EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan, node->mt_arowmarks[node->mt_whichplan]); if (node->mt_transition_capture != NULL) { - /* Prepare to convert transition tuples from this child. */ Assert(node->mt_transition_tupconv_maps != NULL); node->mt_transition_capture->tcs_map = node->mt_transition_tupconv_maps[node->mt_whichplan]; } continue; } else --- 1646,1664 ---- estate->es_result_relation_info = resultRelInfo; EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan, node->mt_arowmarks[node->mt_whichplan]); + /* Prepare to convert transition tuples from this child. */ if (node->mt_transition_capture != NULL) { Assert(node->mt_transition_tupconv_maps != NULL); node->mt_transition_capture->tcs_map = node->mt_transition_tupconv_maps[node->mt_whichplan]; } + if (node->mt_oc_transition_capture != NULL) + { + Assert(node->mt_transition_tupconv_maps != NULL); + node->mt_oc_transition_capture->tcs_map = + node->mt_transition_tupconv_maps[node->mt_whichplan]; + } continue; } else *************** ExecInitModifyTable(ModifyTable *node, E *** 1934,1941 **** mtstate->mt_partition_tuple_slot = partition_tuple_slot; } ! /* Build state for collecting transition tuples */ ! ExecSetupTransitionCaptureState(mtstate, estate); /* * Initialize any WITH CHECK OPTION constraints if needed. --- 1957,1968 ---- mtstate->mt_partition_tuple_slot = partition_tuple_slot; } ! /* ! * Build state for collecting transition tuples. This requires having a ! * valid trigger query context, so skip it in explain-only mode. ! */ ! if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY)) ! ExecSetupTransitionCaptureState(mtstate, estate); /* * Initialize any WITH CHECK OPTION constraints if needed. *************** ExecEndModifyTable(ModifyTableState *nod *** 2319,2334 **** int i; /* - * Free transition tables, unless this query is being run in - * EXEC_FLAG_SKIP_TRIGGERS mode, which means that it may have queued AFTER - * triggers that won't be run till later. In that case we'll just leak - * the transition tables till end of (sub)transaction. - */ - if (node->mt_transition_capture != NULL && - !(node->ps.state->es_top_eflags & EXEC_FLAG_SKIP_TRIGGERS)) - DestroyTransitionCaptureState(node->mt_transition_capture); - - /* * Allow any FDWs to shut down */ for (i = 0; i < node->mt_nplans; i++) --- 2346,2351 ---- diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h index aeb363f..adbcfa1 100644 *** a/src/include/commands/trigger.h --- b/src/include/commands/trigger.h *************** typedef struct TriggerData *** 43,55 **** /* * The state for capturing old and new tuples into transition tables for a ! * single ModifyTable node. */ typedef struct TransitionCaptureState { /* * Is there at least one trigger specifying each transition relation on * the relation explicitly named in the DML statement or COPY command? */ bool tcs_delete_old_table; bool tcs_update_old_table; --- 43,63 ---- /* * The state for capturing old and new tuples into transition tables for a ! * single ModifyTable node (or other operation source, e.g. copy.c). ! * ! * This is per-caller to avoid conflicts in setting tcs_map or ! * tcs_original_insert_tuple. Note, however, that the pointed-to ! * private data may be shared across multiple callers. */ + struct AfterTriggersTableData; /* private in trigger.c */ + typedef struct TransitionCaptureState { /* * Is there at least one trigger specifying each transition relation on * the relation explicitly named in the DML statement or COPY command? + * Note: in current usage, these flags could be part of the private state, + * but it seems possibly useful to let callers see them. */ bool tcs_delete_old_table; bool tcs_update_old_table; *************** typedef struct TransitionCaptureState *** 60,66 **** * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the * new and old tuples from a child table's format to the format of the * relation named in a query so that it is compatible with the transition ! * tuplestores. */ TupleConversionMap *tcs_map; --- 68,74 ---- * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the * new and old tuples from a child table's format to the format of the * relation named in a query so that it is compatible with the transition ! * tuplestores. The caller must store the conversion map here if so. */ TupleConversionMap *tcs_map; *************** typedef struct TransitionCaptureState *** 74,90 **** HeapTuple tcs_original_insert_tuple; /* ! * The tuplestores backing the transition tables. We use separate ! * tuplestores for INSERT and UPDATE, because INSERT ... ON CONFLICT ... ! * DO UPDATE causes INSERT and UPDATE triggers to fire and needs a way to ! * keep track of the new tuple images resulting from the two cases ! * separately. We only need a single old image tuplestore, because there ! * is no statement that can both update and delete at the same time. */ ! Tuplestorestate *tcs_old_tuplestore; /* for DELETE and UPDATE old ! * images */ ! Tuplestorestate *tcs_insert_tuplestore; /* for INSERT new images */ ! Tuplestorestate *tcs_update_tuplestore; /* for UPDATE new images */ } TransitionCaptureState; /* --- 82,90 ---- HeapTuple tcs_original_insert_tuple; /* ! * Private data including the tuplestore(s) into which to insert tuples. */ ! struct AfterTriggersTableData *tcs_private; } TransitionCaptureState; /* *************** extern void RelationBuildTriggers(Relati *** 174,181 **** extern TriggerDesc *CopyTriggerDesc(TriggerDesc *trigdesc); extern const char *FindTriggerIncompatibleWithInheritance(TriggerDesc *trigdesc); ! extern TransitionCaptureState *MakeTransitionCaptureState(TriggerDesc *trigdesc); ! extern void DestroyTransitionCaptureState(TransitionCaptureState *tcs); extern void FreeTriggerDesc(TriggerDesc *trigdesc); --- 174,182 ---- extern TriggerDesc *CopyTriggerDesc(TriggerDesc *trigdesc); extern const char *FindTriggerIncompatibleWithInheritance(TriggerDesc *trigdesc); ! ! extern TransitionCaptureState *MakeTransitionCaptureState(TriggerDesc *trigdesc, ! Oid relid, CmdType cmdType); extern void FreeTriggerDesc(TriggerDesc *trigdesc); diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index 90a60ab..c6d3021 100644 *** a/src/include/nodes/execnodes.h --- b/src/include/nodes/execnodes.h *************** typedef struct ModifyTableState *** 983,989 **** /* Per partition tuple conversion map */ TupleTableSlot *mt_partition_tuple_slot; struct TransitionCaptureState *mt_transition_capture; ! /* controls transition table population */ TupleConversionMap **mt_transition_tupconv_maps; /* Per plan/partition tuple conversion */ } ModifyTableState; --- 983,991 ---- /* Per partition tuple conversion map */ TupleTableSlot *mt_partition_tuple_slot; struct TransitionCaptureState *mt_transition_capture; ! /* controls transition table population for specified operation */ ! struct TransitionCaptureState *mt_oc_transition_capture; ! /* controls transition table population for INSERT...ON CONFLICT UPDATE */ TupleConversionMap **mt_transition_tupconv_maps; /* Per plan/partition tuple conversion */ } ModifyTableState; diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out index 620fac1..5b6007a 100644 *** a/src/test/regress/expected/triggers.out --- b/src/test/regress/expected/triggers.out *************** with wcte as (insert into table1 values *** 2217,2222 **** --- 2217,2239 ---- insert into table2 values ('hello world'); NOTICE: trigger = table2_trig, new table = ("hello world") NOTICE: trigger = table1_trig, new table = (42) + with wcte as (insert into table1 values (43)) + insert into table1 values (44); + NOTICE: trigger = table1_trig, new table = (43), (44) + select * from table1; + a + ---- + 42 + 44 + 43 + (3 rows) + + select * from table2; + a + ------------- + hello world + (1 row) + drop table table1; drop table table2; -- *************** create trigger my_table_multievent_trig *** 2256,2261 **** --- 2273,2286 ---- after insert or update on my_table referencing new table as new_table for each statement execute procedure dump_insert(); ERROR: transition tables cannot be specified for triggers with more than one event + -- + -- Verify that you can't create a trigger with transition tables with + -- a column list. + -- + create trigger my_table_col_update_trig + after update of b on my_table referencing new table as new_table + for each statement execute procedure dump_insert(); + ERROR: transition tables cannot be specified for triggers with column lists drop table my_table; -- -- Test firing of triggers with transition tables by foreign key cascades *************** select * from trig_table; *** 2299,2306 **** (6 rows) delete from refd_table where length(b) = 3; ! NOTICE: trigger = trig_table_delete_trig, old table = (2,"two a"), (2,"two b") ! NOTICE: trigger = trig_table_delete_trig, old table = (11,"one a"), (11,"one b") select * from trig_table; a | b ---+--------- --- 2324,2330 ---- (6 rows) delete from refd_table where length(b) = 3; ! NOTICE: trigger = trig_table_delete_trig, old table = (2,"two a"), (2,"two b"), (11,"one a"), (11,"one b") select * from trig_table; a | b ---+--------- *************** select * from trig_table; *** 2309,2314 **** --- 2333,2357 ---- (2 rows) drop table refd_table, trig_table; + -- + -- self-referential FKs are even more fun + -- + create table self_ref (a int primary key, + b int references self_ref(a) on delete cascade); + create trigger self_ref_r_trig + after delete on self_ref referencing old table as old_table + for each row execute procedure dump_delete(); + create trigger self_ref_s_trig + after delete on self_ref referencing old table as old_table + for each statement execute procedure dump_delete(); + insert into self_ref values (1, null), (2, 1), (3, 2); + delete from self_ref where a = 1; + NOTICE: trigger = self_ref_r_trig, old table = (1,), (2,1) + NOTICE: trigger = self_ref_r_trig, old table = (1,), (2,1) + NOTICE: trigger = self_ref_s_trig, old table = (1,), (2,1) + NOTICE: trigger = self_ref_r_trig, old table = (3,2) + NOTICE: trigger = self_ref_s_trig, old table = (3,2) + drop table self_ref; -- cleanup drop function dump_insert(); drop function dump_update(); diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql index c6deb56..6c7e1c0 100644 *** a/src/test/regress/sql/triggers.sql --- b/src/test/regress/sql/triggers.sql *************** create trigger table2_trig *** 1729,1734 **** --- 1729,1740 ---- with wcte as (insert into table1 values (42)) insert into table2 values ('hello world'); + with wcte as (insert into table1 values (43)) + insert into table1 values (44); + + select * from table1; + select * from table2; + drop table table1; drop table table2; *************** create trigger my_table_multievent_trig *** 1769,1774 **** --- 1775,1789 ---- after insert or update on my_table referencing new table as new_table for each statement execute procedure dump_insert(); + -- + -- Verify that you can't create a trigger with transition tables with + -- a column list. + -- + + create trigger my_table_col_update_trig + after update of b on my_table referencing new table as new_table + for each statement execute procedure dump_insert(); + drop table my_table; -- *************** select * from trig_table; *** 1812,1817 **** --- 1827,1852 ---- drop table refd_table, trig_table; + -- + -- self-referential FKs are even more fun + -- + + create table self_ref (a int primary key, + b int references self_ref(a) on delete cascade); + + create trigger self_ref_r_trig + after delete on self_ref referencing old table as old_table + for each row execute procedure dump_delete(); + create trigger self_ref_s_trig + after delete on self_ref referencing old table as old_table + for each statement execute procedure dump_delete(); + + insert into self_ref values (1, null), (2, 1), (3, 2); + + delete from self_ref where a = 1; + + drop table self_ref; + -- cleanup drop function dump_insert(); drop function dump_update(); -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Sat, Sep 16, 2017 at 7:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Attached is an updated patch that incorporates the ideas you suggested. I was imagining that you would just need to keep a back pointer to the last queued event for the same (relation, command), since that's the only one you'd ever need to consider cancelling, and then no scanning would be needed. I am probably missing something. > I added an extended version of this example, which has an additional > level of FK cascade happening: > > insert into self_ref values (1, null), (2, 1), (3, 2); > delete from self_ref where a = 1; > NOTICE: trigger = self_ref_r_trig, old table = (1,), (2,1) > NOTICE: trigger = self_ref_r_trig, old table = (1,), (2,1) > NOTICE: trigger = self_ref_s_trig, old table = (1,), (2,1) > NOTICE: trigger = self_ref_r_trig, old table = (3,2) > NOTICE: trigger = self_ref_s_trig, old table = (3,2) > > What happens here is that the outer delete queues AR triggers for > RI enforcement and self_ref_r_trig, plus an AS trigger for > self_ref_s_trig. Then the RI enforcement trigger deletes (2,1) > and queues AR+AS triggers for that. At this point the initial > transition table is still open, so (2,1) goes into that table, > and we look back and cancel the previous queuing of self_ref_s_trig. > Now it's time for the first firing of self_ref_r_trig, and so now > we mark the transition table closed. Then we skip the cancelled > self_ref_s_trig call, and then it's time for the second RI enforcement > trigger to fire, which deletes (3,2) but has to put it into a new > transition table. Again we queue AR+AS triggers, but this time we > can't cancel the preceding AS call. Then we fire self_ref_r_trig > again (for the (2,1) row), and then fire self_ref_s_trig; both of > them see the same transition table the first self_ref_r_trig call > did. Now it's time for the third RI enforcement trigger; it finds > nothing to delete, so it adds nothing to the second transition table, > but it does queue an AS trigger call (canceling the one added by the > second RI trigger). Finally we have the AR call queued by the second > RI trigger, and then the AS call queued by the third RI trigger, > both looking at the second transition table. > > This is pretty messy but I think it's the best we can do as long as > RI actions are intermixed with other AFTER ROW triggers. Maybe with > Kevin's ideas about converting RI actions to be statement-level, > we could arrange for all three deletions to show up in one transition > table ... but I don't know how we cause that to happen before the > user's AFTER ROW triggers run. In any case, nothing will look odd > unless you have AR triggers using transition tables, which seems like > a niche usage case in the first place. It does seem like an inconsistency that it would be good to fix, but I don't immediately see how to make that happen with the current design. It would be interesting to know what DB2 does here in terms of trigger execution contexts and transition tables when you have a chain of 2, 3 and 4 foreign key referential actions. Is it worth adding a test with an extra level of chaining in the self_ref case? Is it worth adding tests for SET NULL and SET DEFAULT, to exercise the complete set of referential actions? > I also realized that we could undo the bloat I added to > AfterTriggerSharedData by storing a pointer to the AfterTriggersTableData > rather than the tuplestores themselves. Makes sense. > I feel that this is probably committable, except that I still need > to look for documentation changes. +1 -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Thomas Munro <thomas.munro@enterprisedb.com> writes: > On Sat, Sep 16, 2017 at 7:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Attached is an updated patch that incorporates the ideas you suggested. > I was imagining that you would just need to keep a back pointer to the > last queued event for the same (relation, command), since that's the > only one you'd ever need to consider cancelling, and then no scanning > would be needed. I am probably missing something. There could be more than one after-statement trigger, no? >> This is pretty messy but I think it's the best we can do as long as >> RI actions are intermixed with other AFTER ROW triggers. > It does seem like an inconsistency that it would be good to fix, but I > don't immediately see how to make that happen with the current design. > It would be interesting to know what DB2 does here in terms of trigger > execution contexts and transition tables when you have a chain of 2, 3 > and 4 foreign key referential actions. > Is it worth adding a test with an extra level of chaining in the self_ref case? Would it show anything not shown by the three-level case? > Is it worth adding tests for SET NULL and SET DEFAULT, to exercise the > complete set of referential actions? I think they're all about the same as far as this is concerned. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs