Thread: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
[HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
From
Marko Tiikkaja
Date:
Since the subject of transition tables came up, I thought I'd test how this case works:
=# create table qwr(a int);
CREATE TABLE
=# create function qq() returns trigger as $$ begin raise notice '%', (select count(*) from oo); return null; end $$ language plpgsql;
CREATE FUNCTION
=# create trigger xx after insert on qwr referencing new table as oo for each statement execute procedure qq();
CREATE TRIGGER
=# with t as (insert into qwr values (1)) insert into qwr values (2), (3);
NOTICE: 3
NOTICE: 3
INSERT 0 2
to me, this means that it doesn't work. Surely one of the trigger invocations should say 1, and the other 2. Or was this intentional?=# create table qwr(a int);
CREATE TABLE
=# create function qq() returns trigger as $$ begin raise notice '%', (select count(*) from oo); return null; end $$ language plpgsql;
CREATE FUNCTION
=# create trigger xx after insert on qwr referencing new table as oo for each statement execute procedure qq();
CREATE TRIGGER
=# with t as (insert into qwr values (1)) insert into qwr values (2), (3);
NOTICE: 3
NOTICE: 3
INSERT 0 2
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
From
Thomas Munro
Date:
On Fri, Jun 2, 2017 at 10:48 PM, Marko Tiikkaja <marko@joh.to> wrote: > Since the subject of transition tables came up, I thought I'd test how this > case works: > > =# create table qwr(a int); > CREATE TABLE > > =# create function qq() returns trigger as $$ begin raise notice '%', > (select count(*) from oo); return null; end $$ language plpgsql; > CREATE FUNCTION > > =# create trigger xx after insert on qwr referencing new table as oo for > each statement execute procedure qq(); > CREATE TRIGGER > > =# with t as (insert into qwr values (1)) insert into qwr values (2), (3); > NOTICE: 3 > NOTICE: 3 > INSERT 0 2 > > to me, this means that it doesn't work. Surely one of the trigger > invocations should say 1, and the other 2. Or was this intentional? I would have expected that to be handled by separate transition tables at different query levels, but evidently it isn't. The following crashes: create table table1(a int); create table table2(a text); create function trigfunc() returns trigger as $$ begin raise notice 'got: %', (select oo from oo); return null; end $$ language plpgsql; create trigger trig1 after insert on table1 referencing new table as oo for each statement execute procedure trigfunc(); create trigger trig2 after insert on table2 referencing new table as oo for each statement execute procedure trigfunc(); with t as (insert into table1 values (1)) insert into table2 values ('hello'); I've run out of day today but will investigate. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
From
Thomas Munro
Date:
On Sat, Jun 3, 2017 at 12:10 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > I would have expected that to be handled by separate transition tables > at different query levels, but evidently it isn't. I am wondering if we need to wrap the execution of a modifying CTE in AfterTriggerBeginQuery() and AfterTriggerEndQuery(). But I'm not sure where, and it may be a couple of days before I can dig some more. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
From
Thomas Munro
Date:
On Sat, Jun 3, 2017 at 1:20 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Sat, Jun 3, 2017 at 12:10 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> I would have expected that to be handled by separate transition tables >> at different query levels, but evidently it isn't. > > I am wondering if we need to wrap the execution of a modifying CTE in > AfterTriggerBeginQuery() and AfterTriggerEndQuery(). But I'm not sure > where, and it may be a couple of days before I can dig some more. So, afterTriggers.query_stack is used to handle the reentrancy that results from triggers running further statements that might fire triggers. It isn't used for dealing with extra ModifyTable nodes that can appear in a plan because of wCTEs. Could it also be used for that purpose? I think that would only work if it is the case that each ModifyTable node begin and then runs to completion (ie no interleaving of wCTE execution) and then its AS trigger fires, which I'm not sure about. If that is the case, perhaps AfterTriggerBeginQuery and AfterTriggerEndQuery could become the responsibility of nodeModifyTable.c rather than execMain.c. I haven't tried this yet and it may well be too ambitious at this stage. Other ideas: (1) ban wCTEs that target relations with transition tables in PG10, because we're out of time; (2) find an entirely new way to keep track of the current active transition table(s) to replace the current stack approach, such as including them in the TransitionCaptureState object in the patch that I proposed to fix the nearby inheritance bugs[1]. Stepping back and squinting a bit, both this and the inheritance bug stem from a failure to handle multiple ModifyTable nodes in a plan, but the solutions are approximately opposite: in the inheritance case, the solution I propose is to teach them all to coordinate their tuple capture so that it's in a common format, and in the wCTE case the solution must surely be to ensure that their state is kept separate. The question I'd like to figure out is whether the existing AfterTriggerBeginQuery/AfterTriggerEndQuery stack is the right data structure for that task, considering the control flow when CTEs are executed, which I need to learn more about. Thoughts? [1] https://www.postgresql.org/message-id/flat/CA%2BTgmoZzTBBAsEUh4MazAN7ga%3D8SsMC-Knp-6cetts9yNZUCcg%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
From
Kevin Grittner
Date:
On Fri, Jun 2, 2017 at 8:46 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > So, afterTriggers.query_stack is used to handle the reentrancy that > results from triggers running further statements that might fire > triggers. It isn't used for dealing with extra ModifyTable nodes that > can appear in a plan because of wCTEs. Could it also be used for that > purpose? I think that would only work if it is the case that each > ModifyTable node begin and then runs to completion (ie no interleaving > of wCTE execution) and then its AS trigger fires, which I'm not sure > about. I don't think we want to commit to anything that depends on a CTE creating an optimization fence, although *maybe* that would be OK in the case of DML as a CTE. That's a pretty special case; I'm not sure whether the standard discusses it. > If that is the case, perhaps AfterTriggerBeginQuery and > AfterTriggerEndQuery could become the responsibility of > nodeModifyTable.c rather than execMain.c. I haven't tried this yet > and it may well be too ambitious at this stage. In a world where one statement can contain multiple DML statements within CTEs, that may make sense regardless of transition table issues; but I agree we're past the time to be considering making such a change for v10. > Other ideas: (1) ban wCTEs that target relations with transition > tables in PG10, because we're out of time; Given that we haven't even discussed what to do about an UPDATE statement with a CTE that updates the same table when there are BEFORE EACH ROW UPDATE triggers on that table (which perhaps return NULL for some but not all rows?), I'm not sure we've yet plumbed the depths of this morass. For example, for this: drop table if exists t1 cascade; create table t1 (c1 int not null, c2 text not null default ''); insert into t1 (c1) values (1), (2), (3), (4), (5); drop function if exists t1_upd_func() cascade; create function t1_upd_func() returns trigger language plpgsql as $$ begin if old.c1 between 2 and 4 then new.c2 = old.c2 || ';' || old.c1::text || ',' || new.c1::text; end if; if old.c1 >3 then return null; end if; return new; end; $$; create trigger t1_upd_update before update on t1 for each row execute procedure t1_upd_func(); with x as (update t1 set c1 = c1 - 1 returning c1) update t1 set c1 = t1.c1 + 1 from x where x.c1 = t1.c1; select * from t1; ... what is the correct result? I get this: c1 | c2 ----+------ 4 | 5 | 0 | 1 | ;2,1 2 | ;3,2 (5 rows) It is definitely not what I expected, and seems wrong in multiple ways from a logical perspective, looking at those as set-based operations. (Tautologies about the procedural mechanism that creates the result are not defenses of the result itself.) Can we fix things exclusive of transition tables in this release? If not, the only reasonable thing to do is to try to avoid further complicating things with CTE/transition table usage until we sort out the basics of triggers firing from CTE DML in the first place. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
From
Thomas Munro
Date:
On Sun, Jun 4, 2017 at 10:41 AM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Fri, Jun 2, 2017 at 8:46 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: > >> So, afterTriggers.query_stack is used to handle the reentrancy that >> results from triggers running further statements that might fire >> triggers. It isn't used for dealing with extra ModifyTable nodes that >> can appear in a plan because of wCTEs. Could it also be used for that >> purpose? I think that would only work if it is the case that each >> ModifyTable node begin and then runs to completion (ie no interleaving >> of wCTE execution) and then its AS trigger fires, which I'm not sure >> about. > > I don't think we want to commit to anything that depends on a CTE > creating an optimization fence, although *maybe* that would be OK in > the case of DML as a CTE. That's a pretty special case; I'm not > sure whether the standard discusses it. According to our manual the standard doesn't allow DML in CTEs. I suppose it wouldn't make much sense without RETURNING, which is also non-standard. >> If that is the case, perhaps AfterTriggerBeginQuery and >> AfterTriggerEndQuery could become the responsibility of >> nodeModifyTable.c rather than execMain.c. I haven't tried this yet >> and it may well be too ambitious at this stage. > > In a world where one statement can contain multiple DML statements > within CTEs, that may make sense regardless of transition table > issues; but I agree we're past the time to be considering making > such a change for v10. Ok. >> Other ideas: (1) ban wCTEs that target relations with transition >> tables in PG10, because we're out of time; > > Given that we haven't even discussed what to do about an UPDATE > statement with a CTE that updates the same table when there are > BEFORE EACH ROW UPDATE triggers on that table (which perhaps return > NULL for some but not all rows?), I'm not sure we've yet plumbed the > depths of this morass. > > [...] > > Can we fix things exclusive of transition tables in this release? > If not, the only reasonable thing to do is to try to avoid further > complicating things with CTE/transition table usage until we sort > out the basics of triggers firing from CTE DML in the first place. At least it's well documented that the execution order is undefined, but yeah triggers do make things a lot more complicated and I'm not sure I understand to what extent that is unavoidable once you decide to allow DML in CTEs. In the meantime, it seems like you agree that rejecting wCTEs that affect tables with triggers with transition tables is the best response to this bug report? Do you think that parse analysis is the right time to do the check? Here's a first attempt at that. Later, for the next cycle, I think we should investigate storing the transition tables in ModifyTableState rather than in query_stack, and then passing them to the Exec*Triggers() functions as arguments (possibly inside TransitionCaptureState, if you accept my proposal for the inheritance bug). -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Robert Haas
Date:
On Sat, Jun 3, 2017 at 10:39 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > In the meantime, it seems like you agree that rejecting wCTEs that > affect tables with triggers with transition tables is the best > response to this bug report? Do you think that parse analysis is the > right time to do the check? Here's a first attempt at that. I'm starting to like the approach of reverting the entire transition tables patch. Failing to consider the possibility of a plan with multiple ModifyTable nodes seems like a pretty fundamental design mistake, and I'm not eager either to ship this with that broken or try to fix it at this stage of the release cycle. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Jun 3, 2017 at 10:39 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> In the meantime, it seems like you agree that rejecting wCTEs that >> affect tables with triggers with transition tables is the best >> response to this bug report? Do you think that parse analysis is the >> right time to do the check? Here's a first attempt at that. FWIW, parse analysis is surely NOT the time for such a check. Triggers might get added to a table between analysis and execution. I think you might have to do it during executor startup. > I'm starting to like the approach of reverting the entire transition > tables patch. Failing to consider the possibility of a plan with > multiple ModifyTable nodes seems like a pretty fundamental design > mistake, and I'm not eager either to ship this with that broken or try > to fix it at this stage of the release cycle. Postponing the feature to v11 might be a viable solution. We don't have any other major work that depends on it do we? regards, tom lane
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Robert Haas
Date:
On Mon, Jun 5, 2017 at 10:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm starting to like the approach of reverting the entire transition >> tables patch. Failing to consider the possibility of a plan with >> multiple ModifyTable nodes seems like a pretty fundamental design >> mistake, and I'm not eager either to ship this with that broken or try >> to fix it at this stage of the release cycle. > > Postponing the feature to v11 might be a viable solution. We don't > have any other major work that depends on it do we? I don't think anything that depends on it was committed to v10. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Thomas Munro
Date:
On Tue, Jun 6, 2017 at 2:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sat, Jun 3, 2017 at 10:39 PM, Thomas Munro >> <thomas.munro@enterprisedb.com> wrote: >>> In the meantime, it seems like you agree that rejecting wCTEs that >>> affect tables with triggers with transition tables is the best >>> response to this bug report? Do you think that parse analysis is the >>> right time to do the check? Here's a first attempt at that. > > FWIW, parse analysis is surely NOT the time for such a check. Triggers > might get added to a table between analysis and execution. I think you > might have to do it during executor startup. Hmm. My understanding: In analyzeCTE(), where the check is done in that patch, we hold a lock on the relation because we have a RangeTblEntry. That means that no triggers can be added concurrently in the case where you go on to execute the plan. If you save the plan for later execution, triggers created between then and lock reacquisition (because creating triggers touches pg_class) cause reanalysis. This is not fundamentally different from altering the table. For example, with that patch: postgres=# prepare ps as with t as (insert into table1 values (1)) insert into table2 values ('hello'); PREPARE postgres=# create trigger trig1 after insert on table1 referencing new table as oo for each row execute procedure trigfunc(); CREATE TRIGGER postgres=# execute ps; ERROR: WITH clause cannot modify a table that has triggers with transition tables What am I missing? >> I'm starting to like the approach of reverting the entire transition >> tables patch. Failing to consider the possibility of a plan with >> multiple ModifyTable nodes seems like a pretty fundamental design >> mistake, and I'm not eager either to ship this with that broken or try >> to fix it at this stage of the release cycle. Not handling wCTEs or inheritance definitely counts as a howler, but we seem tantalisingly close and it would be a shame to push the whole feature back IMHO. I really hope we can move on to talking about incremental matviews in the next cycle, which is why I've jumped on every problem report so far. I'm still waiting to hear from Kevin whether he thinks what I proposed for the inheritance bug has legs. If so, my vote from the peanut gallery would be to keep transition tables in the tree but block wCTEs with transition tables, and then add support in the next release (which I'd be happy to work on). Support will certainly be needed ASAP, because it'd suck if incremental matview maintenance didn't work just because you use wCTEs, and the 'action at a distance' factor isn't a great user experience (adding a new trigger can in general break existing queries, but breaking them just because they use wCTEs is arguably surprising). On the other hand, if he thinks that the inheritance bug is not adequately fixed by my proposal (with minor tweaks) and needs a completely different approach, then I'm out (but will be happy to help work on this stuff for PG11). -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
From
Tom Lane
Date:
Thomas Munro <thomas.munro@enterprisedb.com> writes: > On Tue, Jun 6, 2017 at 2:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> FWIW, parse analysis is surely NOT the time for such a check. Triggers >> might get added to a table between analysis and execution. I think you >> might have to do it during executor startup. > Hmm. My understanding: In analyzeCTE(), where the check is done in > that patch, we hold a lock on the relation because we have a > RangeTblEntry. That means that no triggers can be added concurrently > in the case where you go on to execute the plan. If you save the plan > for later execution, triggers created between then and lock > reacquisition (because creating triggers touches pg_class) cause > reanalysis. No, they don't necessarily. One case that will certainly not be covered by such a test is if the wCTE is an UPDATE or DELETE on an inheritance hierarchy: parse analysis has no idea whether the target table has children at all, let alone whether any of the children have triggers that use transition tables. You really want the test to happen after the planner has expanded inheritance. Another problem is that the result of parse analysis is frozen for much longer than you're thinking about in the case of stored rules or views. We currently disallow views that contain writable CTEs, but I do not think there's such a prohibition for rules. More generally, the problem I've got with this proposal is that it causes the presence of triggers to be significant much earlier in query execution than they used to be. I'm concerned that this may break optimizations that are already in place somewhere, or that we might want to add later. regards, tom lane
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Thomas Munro
Date:
On Tue, Jun 6, 2017 at 10:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: >> On Tue, Jun 6, 2017 at 2:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> FWIW, parse analysis is surely NOT the time for such a check. Triggers >>> might get added to a table between analysis and execution. I think you >>> might have to do it during executor startup. > >> Hmm. My understanding: In analyzeCTE(), where the check is done in >> that patch, we hold a lock on the relation because we have a >> RangeTblEntry. That means that no triggers can be added concurrently >> in the case where you go on to execute the plan. If you save the plan >> for later execution, triggers created between then and lock >> reacquisition (because creating triggers touches pg_class) cause >> reanalysis. > > No, they don't necessarily. One case that will certainly not be > covered by such a test is if the wCTE is an UPDATE or DELETE on > an inheritance hierarchy: parse analysis has no idea whether the > target table has children at all, let alone whether any of the > children have triggers that use transition tables. You really want > the test to happen after the planner has expanded inheritance. If Kevin accepts my other patch or at least the approach, we won't allow any child table triggers with transition tables, so we wouldn't need to consider them here (triggers on the named result table see tuples collected from children, but ROW triggers attached to children can't have transition tables and STATEMENT triggers attached to children don't fire). Though those decisions probably wouldn't be future-proof, we'd only want this block in place until we had support for wCTEs. > Another problem is that the result of parse analysis is frozen for much > longer than you're thinking about in the case of stored rules or views. > We currently disallow views that contain writable CTEs, but I do not > think there's such a prohibition for rules. Ugh. Right. I can indeed circumvent the above check with a wCTE hiding in a rule, so parse analysis is not the right time for the check. The whole rewrite area is clearly something I need to go and learn about. Thanks for the explanation. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Craig Ringer
Date:
On 4 June 2017 at 06:41, Kevin Grittner <kgrittn@gmail.com> wrote: > On Fri, Jun 2, 2017 at 8:46 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: > >> So, afterTriggers.query_stack is used to handle the reentrancy that >> results from triggers running further statements that might fire >> triggers. It isn't used for dealing with extra ModifyTable nodes that >> can appear in a plan because of wCTEs. Could it also be used for that >> purpose? I think that would only work if it is the case that each >> ModifyTable node begin and then runs to completion (ie no interleaving >> of wCTE execution) and then its AS trigger fires, which I'm not sure >> about. > > I don't think we want to commit to anything that depends on a CTE > creating an optimization fence, although *maybe* that would be OK in > the case of DML as a CTE. That's a pretty special case; I'm not > sure whether the standard discusses it. It's definitely fine to require a fence for wCTEs. They're an extension to the standard, and it's pretty much necessary that we not pull up / push down across them since we don't want to affect the side-effects (row changes). If there are any cases where it's safe, they'll take some careful thought. It's only standard CTEs (SELECT-based) that I think matter for the optimisation fence behaviour. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Thomas Munro
Date:
On Tue, Jun 6, 2017 at 12:58 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 4 June 2017 at 06:41, Kevin Grittner <kgrittn@gmail.com> wrote: >> On Fri, Jun 2, 2017 at 8:46 PM, Thomas Munro >> <thomas.munro@enterprisedb.com> wrote: >> >>> So, afterTriggers.query_stack is used to handle the reentrancy that >>> results from triggers running further statements that might fire >>> triggers. It isn't used for dealing with extra ModifyTable nodes that >>> can appear in a plan because of wCTEs. Could it also be used for that >>> purpose? I think that would only work if it is the case that each >>> ModifyTable node begin and then runs to completion (ie no interleaving >>> of wCTE execution) and then its AS trigger fires, which I'm not sure >>> about. >> >> I don't think we want to commit to anything that depends on a CTE >> creating an optimization fence, although *maybe* that would be OK in >> the case of DML as a CTE. That's a pretty special case; I'm not >> sure whether the standard discusses it. > > It's definitely fine to require a fence for wCTEs. They're an > extension to the standard, and it's pretty much necessary that we not > pull up / push down across them since we don't want to affect the > side-effects (row changes). If there are any cases where it's safe, > they'll take some careful thought. > > It's only standard CTEs (SELECT-based) that I think matter for the > optimisation fence behaviour. After sleeping on it, I don't think we need to make that decision here though. I think it's better to just move the tuplestores into ModifyTableState so that each embedded DML statement has its own, and have ModifyTable pass them to the trigger code explicitly. I think I'd like to do that via the TransitionCaptureState object that I proposed elsewhere, but I'll hold off on doing anything until I hear from interested committers on which way we're going here, time being short. Call me an anti-globalisation (of variables) protestor. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Kevin Grittner
Date:
Nice as it would be to add a SQL standard feature and advance the effort to get to incremental maintenance of materialized views, and much as I really appreciate the efforts Thomas has put into trying to solve these problems, I agree that it is best to revert the feature. It took years to get an in-depth review, then I was asked not to commit it because others were working on patches that would conflict. That just doesn't leave enough time to address these issues before release. Fundamentally, I'm not sure that there is a level of interest sufficient to support the effort. I'll give it a few days for objections before reverting. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Marko Tiikkaja
Date:
On Tue, Jun 6, 2017 at 10:25 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
Nice as it would be to add a SQL standard feature and advance the
effort to get to incremental maintenance of materialized views, and
much as I really appreciate the efforts Thomas has put into trying
to solve these problems, I agree that it is best to revert the
feature. It took years to get an in-depth review, then I was asked
not to commit it because others were working on patches that would
conflict. That just doesn't leave enough time to address these
issues before release. Fundamentally, I'm not sure that there is a
level of interest sufficient to support the effort.
I'll give it a few days for objections before reverting.
I can only say that the lack of this feature comes up on a weekly basis on IRC, and a lot of people would be disappointed to see it reverted.
.m
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Kevin Grittner
Date:
On Tue, Jun 6, 2017 at 3:41 PM, Marko Tiikkaja <marko@joh.to> wrote: > On Tue, Jun 6, 2017 at 10:25 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >> It took years to get an in-depth review, then I was asked >> not to commit it because others were working on patches that would >> conflict. That just doesn't leave enough time to address these >> issues before release. Fundamentally, I'm not sure that there is a >> level of interest sufficient to support the effort. > I can only say that the lack of this feature comes up on a weekly basis on > IRC, and a lot of people would be disappointed to see it reverted. Well, at PGCon I talked with someone who worked on the implementation in Oracle 20-some years ago. He said they had a team of 20 people full time working on the feature for years to get it working. Now, I think the PostgreSQL community is a little lighter on its feet, but without more active participation from others than there has been so far, I don't intend to take another run at it. I'll be happy to participate at such point that it's not treated as a second-class feature set. Barring that, anyone else who wants to take the patch and run with it is welcome to do so. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Robert Haas
Date:
On Tue, Jun 6, 2017 at 4:25 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > Nice as it would be to add a SQL standard feature and advance the > effort to get to incremental maintenance of materialized views, and > much as I really appreciate the efforts Thomas has put into trying > to solve these problems, I agree that it is best to revert the > feature. It took years to get an in-depth review, then I was asked > not to commit it because others were working on patches that would > conflict. That just doesn't leave enough time to address these > issues before release. Fundamentally, I'm not sure that there is a > level of interest sufficient to support the effort. I find it a little unfair that you are blaming the community for not taking enough interest in this feature when other people are posting bug fixes for the feature which you can't find time to review (or even respond to with an i'm-too-busy email) for lengthy periods of time. Previously, I took the time to review and commit fixes for several reported fixes which Thomas developed, while you ignored all of the relevant threads. Now, he's posted another patch which you said you would review by last Friday or at the latest by Monday and which you have not reviewed. He's also suggested that the fix for this issue probably relies on that one, so getting that patch reviewed is relevant to this thread also. I think I'd like to walk back my earlier statements about reverting this patch just a little bit. Although putting the tuplestore at the wrong level does seem like a fairly significant design mistake, Thomas more or less convinced me yesterday on a Skype call that relocating it to the ModifyTable node might not be that much work. If it's a 150-line patch, it'd probably be less disruptive to install a fix than to revert the whole thing (and maybe put it back in again, in some future release). On the other hand, if you aren't willing to write, review, or commit any further patches to fix bugs in this feature, then it can really only stay in the tree if somebody else is willing to assume completely responsibility for it going forward. So, are you willing and able to put any effort into this, like say reviewing the patch Thomas posted, and if so when and how much? If you're just done and you aren't going to put any more work into maintaining it (for whatever reasons), then I think you should say so straight out. People shouldn't have to guess whether you're going to maintain your patch or not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Peter Geoghegan
Date:
On Mon, Jun 5, 2017 at 6:40 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > After sleeping on it, I don't think we need to make that decision here > though. I think it's better to just move the tuplestores into > ModifyTableState so that each embedded DML statement has its own, and > have ModifyTable pass them to the trigger code explicitly. I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE case -- one for updated tuples, and the other for inserted tuples. -- Peter Geoghegan
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Peter Geoghegan
Date:
On Tue, Jun 6, 2017 at 3:47 PM, Peter Geoghegan <pg@bowt.ie> wrote: > I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE > case -- one for updated tuples, and the other for inserted tuples. Also, ISTM that the code within ENRMetadataGetTupDesc() probably requires more explanation, resource management wise. -- Peter Geoghegan
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Peter Geoghegan
Date:
On Tue, Jun 6, 2017 at 5:01 PM, Peter Geoghegan <pg@bowt.ie> wrote: > Also, ISTM that the code within ENRMetadataGetTupDesc() probably > requires more explanation, resource management wise. Also, it's not clear why it should be okay that the new type of ephemeral RTEs introduced don't have permissions checks. There are currently cases where the user cannot see data that they inserted themselves (e.g., through RETURNING), on the theory that a before row trigger might have modified the final contents of the tuple in a way that the original inserter isn't supposed to know details about. As the INSERT docs say, "Use of the RETURNING clause requires SELECT privilege on all columns mentioned in RETURNING". Similarly, the excluded.* pseudo-relation requires select privilege (on the corresponding target relation columns) in order to be usable by ON CONFLICT DO UPDATE. -- Peter Geoghegan
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Adam Brusselback
Date:
I'll give it a few days for objections before reverting.I can only say that the lack of this feature comes up on a weekly basis on IRC, and a lot of people would be disappointed to see it reverted.
Not that my opinion matters, but I was very much looking forward to this feature in Postgres 10, but I understand if it just won't be stable enough to stay in.
I have my fingers crossed these issues can be resolved.
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Thomas Munro
Date:
On Wed, Jun 7, 2017 at 9:42 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I think I'd like to walk back my earlier statements about reverting > this patch just a little bit. Although putting the tuplestore at the > wrong level does seem like a fairly significant design mistake, Thomas > more or less convinced me yesterday on a Skype call that relocating it > to the ModifyTable node might not be that much work. If it's a > 150-line patch, it'd probably be less disruptive to install a fix than > to revert the whole thing (and maybe put it back in again, in some > future release). I spent a couple of hours drafting a proof-of-concept to see if my hunch was right. It seems to work correctly so far and isn't huge (but certainly needs more testing and work): 6 files changed, 156 insertions(+), 109 deletions(-) It applies on top of the other patch[1]. It extends TransitionCaptureState to hold the the new and old tuplestores for each ModifyTable node, instead of using global variables. The result is that tuplestores don't get mixed up, and there aren't any weird assumptions about the order of execution as discussed earlier. Example: with wcte as (insert into table1 values (42)) insert into table2 values ('hello world'); NOTICE: trigger = table2_trig, old table = <NULL>, new table = ("hello world") NOTICE: trigger = table1_trig, old table = <NULL>, new table = (42) Summary of how these patches relate: 1. In the inheritance patch[1], TransitionCaptureState is introduced. It holds flags that control whether we capture tuples. There is one of these per ModifyTable node. In master we use the flags in TriggerDesc to control transition tuple capture directly, but we needed a way for ModifyTable's result rel's TriggerDesc to affect all child tables that are touched. My proposal is to do that by inventing this new object to activate transition tuple capture while modifying child tables too. It is passed into the ExecAR*Trigger() functions of all relations touched by the ModifyTable node. 2. In the attached patch, that struct is extended to hold the actual tuplestores. They are used for two purposes: ExecAR*Trigger() captures tuples into them (instead of using global variables to find the tuplestores to capture tuples into), and ExecA[RS]*Trigger() keeps hold of the TransitionCaptureState in the after trigger queue so that when the queued event is eventually executed AfterTriggerExecute() can expose the correct tuplestores to triggers. There are a couple of things that definitely need work and I'd welcome any comments: 1. I added a pointer to TransitionCaptureState to AfterTriggerShared, in order to record which tuplestores a queued after trigger event should see. I suspected that enqueuing pointers like that wouldn't be popular, and when I ran the idea past Andres on IRC he just said "yuck" :-) Perhaps there needs to be a way to convert this into an index into some array in EState, ... or something else. The basic requirement is that the AfterTriggerExecute() needs to know *which* tuplestores should be visible to the trigger when it runs. I believe the object lifetime is sound (the TransitionCaptureState lasts until ExecutorEnd(), and triggers are fired before that during ExecutorFinish()). 2. I didn't think about what execReplication.c needs. Although that code apparently doesn't know how to fire AS triggers, it does know how to fire AR triggers (so that RI works?), and in theory those might have transition tables, so I guess that needs to use MakeTransitionCaptureState() -- but it seems to lack a place to keep that around, and I ran out of time thinking about that today. Thoughts? [1] https://www.postgresql.org/message-id/CAEepm%3D1dGNzh98Gt21fn_Ed6k20sVB-NuAARE1EF693itK6%3DLg%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Thomas Munro
Date:
On Wed, Jun 7, 2017 at 12:19 PM, Peter Geoghegan <pg@bowt.ie> wrote: > On Tue, Jun 6, 2017 at 5:01 PM, Peter Geoghegan <pg@bowt.ie> wrote: >> Also, ISTM that the code within ENRMetadataGetTupDesc() probably >> requires more explanation, resource management wise. > > Also, it's not clear why it should be okay that the new type of > ephemeral RTEs introduced don't have permissions checks. There are > currently cases where the user cannot see data that they inserted > themselves (e.g., through RETURNING), on the theory that a before row > trigger might have modified the final contents of the tuple in a way > that the original inserter isn't supposed to know details about. > > As the INSERT docs say, "Use of the RETURNING clause requires SELECT > privilege on all columns mentioned in RETURNING". Similarly, the > excluded.* pseudo-relation requires select privilege (on the > corresponding target relation columns) in order to be usable by ON > CONFLICT DO UPDATE. This is an interesting question and one that was mentioned here (near the bottom): https://www.postgresql.org/message-id/CACjxUsO%3DsquN2XbYBM6qLfS9co%3DbfGqQFxMfY%2BpjyAGKP_jpwQ%40mail.gmail.com I'm pretty sure that whatever applies to the NEW and OLD variables should also apply to the new table and old table, because the former are conceptually range variables over the latter. Without having checked either the standard or our behaviour for NEW and OLD, I'll bet you one internet point that they say we should apply the same access restrictions as the underlying table, and we don't. Am I wrong? -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Thomas Munro
Date:
On Wed, Jun 7, 2017 at 10:47 AM, Peter Geoghegan <pg@bowt.ie> wrote: > On Mon, Jun 5, 2017 at 6:40 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> After sleeping on it, I don't think we need to make that decision here >> though. I think it's better to just move the tuplestores into >> ModifyTableState so that each embedded DML statement has its own, and >> have ModifyTable pass them to the trigger code explicitly. > > I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE > case -- one for updated tuples, and the other for inserted tuples. Hmm. Right. INSERT ... ON CONFLICT DO UPDATE causes both AFTER INSERT and AFTER UPDATE statement-level triggers to be fired, but then both kinds see all the tuples -- those that were inserted and those that were updated. That's not right. For example: postgres=# insert into my_table values ('ID1', 1), ('ID2', 1), ('ID3', 1) postgres-# on conflict (a) do postgres-# update set counter = my_table.counter + excluded.counter; NOTICE: trigger = my_update_trigger, old table = (ID1,1), (ID2,1), new table = (ID1,2), (ID2,2), (ID3,1) NOTICE: trigger = my_insert_trigger, new table = (ID1,2), (ID2,2), (ID3,1) INSERT 0 3 That's the result of the following: create or replace function dump_insert() returns trigger language plpgsql as $$ begin raise notice 'trigger = %, new table = %', TG_NAME, (select string_agg(new_table::text, ', ' order bya) from new_table); return null; end; $$; create or replace function dump_update() returns trigger language plpgsql as $$ begin raise notice 'trigger = %, old table = %, new table = %', TG_NAME, (select string_agg(old_table::text,', ' order by a) from old_table), (select string_agg(new_table::text, ', ' order by a) fromnew_table); return null; end; $$; create table my_table (a text primary key, counter int); insert into my_table values ('ID1', 1), ('ID2', 1); create trigger my_insert_trigger after insert on my_table referencing new table as new_table for each statement execute proceduredump_insert(); create trigger my_update_trigger after update on my_table referencing old table as old_table new table as new_table for eachstatement execute procedure dump_update(); insert into my_table values ('ID1', 1), ('ID2', 1), ('ID3', 1) on conflict (a) do update set counter = my_table.counter +excluded.counter; -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Thomas Munro
Date:
On Wed, Jun 7, 2017 at 7:27 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Wed, Jun 7, 2017 at 10:47 AM, Peter Geoghegan <pg@bowt.ie> wrote: >> I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE >> case -- one for updated tuples, and the other for inserted tuples. > > Hmm. Right. INSERT ... ON CONFLICT DO UPDATE causes both AFTER > INSERT and AFTER UPDATE statement-level triggers to be fired, but then > both kinds see all the tuples -- those that were inserted and those > that were updated. That's not right. Or maybe it is right. We're out of spec here, so we'd basically have to decide (though MERGE's behaviour would be interesting to compare with). What we have seems reasonable for an AFTER INSERT OR UPDATE statement-level trigger, I think. It's just a bit questionable if you asked for just one of those things. The question is whether the trigger's 'when' clause is supposed to refer to the type of statement you ran or the way individual rows turned out to be processed in our out-of-standard ON CONFLICT case. If you take the former view then we could simply decree that such a statement is both an INSERT and an UPDATE for trigger selection purposes. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Kevin Grittner
Date:
On Wed, Jun 7, 2017 at 3:42 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Wed, Jun 7, 2017 at 7:27 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> On Wed, Jun 7, 2017 at 10:47 AM, Peter Geoghegan <pg@bowt.ie> wrote: >>> I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE >>> case -- one for updated tuples, and the other for inserted tuples. >> >> Hmm. Right. INSERT ... ON CONFLICT DO UPDATE causes both AFTER >> INSERT and AFTER UPDATE statement-level triggers to be fired, but then >> both kinds see all the tuples -- those that were inserted and those >> that were updated. That's not right. > > Or maybe it is right. The idea of transition tables is that you see all changes to the target of a single statement in the form of delta relations -- with and "old" table for any rows affected by a delete or update and a "new" table for any rows affected by an update or delete. If we have a single statement that combines INSERT and UPDATE behaviors, it might make sense to have an "old" table for updates and a single "new" table for both. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Kevin Grittner
Date:
On Tue, Jun 6, 2017 at 4:42 PM, Robert Haas <robertmhaas@gmail.com> wrote: > So, are you willing and able to put any effort into this, like say > reviewing the patch Thomas posted, and if so when and how much? If > you're just done and you aren't going to put any more work into > maintaining it (for whatever reasons), then I think you should say so > straight out. People shouldn't have to guess whether you're going to > maintain your patch or not. It has become clear that the scope of problems being found now exceed what I can be sure of being able to fix in time to make for a stable release, in spite of the heroic efforts Thomas has been putting in. I had hoped to get this into the first or second CF of this release, same with the release before, and same with the release before that. At least landing it in the final CF drew the level of review and testing needed to polish it, but it's far from ideal timing (or procedure). I can revert from v10 and deal with all of this for the first CF of some future release, but if someone feels they can deal with it in v10 I'll stand back and offer what help I can. You mentioned blame earlier. That seems pointless to me. I'm looking to see what works and what doesn't. When I went to develop SSI it was because my employer needed it, and they asked what it would take to get that done; I said one year of my time without being drawn off for other tasks to get it to a point where they would be better off using it than not, and then two years half time work to address community concerns and get it committed, and follow up on problems. They gave me that and it worked, and worked well. In this case, resources to carry it through were not reserved when I started, and when I became full-up with other things, then problems surfaced. That doesn't work. I don't want to start something big again until I have resources set up and reserved, as a priority, to see it through. It's a matter of what works for both me and the community and what doesn't. In the meantime I'll try to keep to small enough issues that the resources required to support what I do is not beyond what I can deliver. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Thomas Munro
Date:
On Thu, Jun 8, 2017 at 9:19 AM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Wed, Jun 7, 2017 at 3:42 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> On Wed, Jun 7, 2017 at 7:27 PM, Thomas Munro >> <thomas.munro@enterprisedb.com> wrote: >>> On Wed, Jun 7, 2017 at 10:47 AM, Peter Geoghegan <pg@bowt.ie> wrote: >>>> I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE >>>> case -- one for updated tuples, and the other for inserted tuples. >>> >>> Hmm. Right. INSERT ... ON CONFLICT DO UPDATE causes both AFTER >>> INSERT and AFTER UPDATE statement-level triggers to be fired, but then >>> both kinds see all the tuples -- those that were inserted and those >>> that were updated. That's not right. >> >> Or maybe it is right. > > The idea of transition tables is that you see all changes to the > target of a single statement in the form of delta relations -- with > and "old" table for any rows affected by a delete or update and a > "new" table for any rows affected by an update or delete. If we > have a single statement that combines INSERT and UPDATE behaviors, > it might make sense to have an "old" table for updates and a single > "new" table for both. That would be cool because that's the behaviour we have. Is there anything about that semantics that is incompatible with the incremental matview use case? For example, are the "counting" and "DRed" algorithms you've proposed (for non-recursive and recursive views) driven entirely by deletions and insertions, so that updates look like deletes followed by inserts anyway? If so, I think our current treatment of ON CONFLICT DO UPDATE should be fine: you can't tell whether the tuples in the new table result from insert or update without also consulting the old table, but if the algorithm treats all tuples in the old table as deletes (even though in this case they result from updates) and all tuples in the new table as inserts (even though in this case *some* of them result from updates) anyway then I don't think it matters. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Peter Geoghegan
Date:
On Wed, Jun 7, 2017 at 2:19 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > The idea of transition tables is that you see all changes to the > target of a single statement in the form of delta relations -- with > and "old" table for any rows affected by a delete or update and a > "new" table for any rows affected by an update or delete. If we > have a single statement that combines INSERT and UPDATE behaviors, > it might make sense to have an "old" table for updates and a single > "new" table for both. My assumption would be that since you have as many as two statement-level triggers firing that could reference transition tables when ON CONFLICT DO UPDATE is used (one AFTER UPDATE statement level trigger, and another AFTER INSERT statement level trigger), there'd be separation at that level. You'd see updated tuples with one, and inserted within the other. While INSERT ON CONFLICT DO UPDATE is essentially one statement, it behaves as two statements in certain limited ways. IIRC MERGE as implemented in other systems has severe restrictions on things like row level triggers work (i.e. they simply don't work), and so they don't provide much in the way of guidance. My assumption about how transition tables ought to behave here is based on the simple fact that we already fire both AFTER statement-level triggers, plus my sense of aesthetics, or bias. I admit that I might be missing the point, but if I am it would be useful to know how. -- Peter Geoghegan
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Kevin Grittner
Date:
On Wed, Jun 7, 2017 at 4:48 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Is there anything about that semantics that is incompatible with the > incremental matview use case? Nothing incompatible at all. If we had separate "new" tables for UPDATE and DELETE we would logically need to do a "counting"-style UNION of them just like we will want to do with the OLD (with a count of -1) and NEW (with a count of 1) to get to a delta now. So keeping them separate would be marginally more work for incremental matview, but not a big deal. > For example, are the "counting" and > "DRed" algorithms you've proposed (for non-recursive and recursive > views) driven entirely by deletions and insertions, so that updates > look like deletes followed by inserts anyway? Counting is best done from a "delta" relation which has old and new combined with opposite counts. I'm sure that will be fine with DRed, too. > If so, I think our > current treatment of ON CONFLICT DO UPDATE should be fine: you can't > tell whether the tuples in the new table result from insert or update > without also consulting the old table, but if the algorithm treats all > tuples in the old table as deletes (even though in this case they > result from updates) and all tuples in the new table as inserts (even > though in this case *some* of them result from updates) anyway then I > don't think it matters. I don't think so, either. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Kevin Grittner
Date:
On Wed, Jun 7, 2017 at 5:00 PM, Peter Geoghegan <pg@bowt.ie> wrote: > My assumption about how transition tables ought to behave here is > based on the simple fact that we already fire both AFTER > statement-level triggers, plus my sense of aesthetics, or bias. I > admit that I might be missing the point, but if I am it would be > useful to know how. Well, either will work. My inclination is that a single statement should cause one execution of the FOR EACH STATEMENT trigger, but if a good case can be made that we should have a FOR EACH STATEMENT trigger fire for each clause within a statement -- well, it won't be a problem for matview maintenance. The biggest hurt there would be to *my* sense of aesthetics. ;-) -- Kevin Grittner VMware vCenter Server https://www.vmware.com/
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Peter Geoghegan
Date:
On Wed, Jun 7, 2017 at 3:00 PM, Peter Geoghegan <pg@bowt.ie> wrote: > My assumption would be that since you have as many as two > statement-level triggers firing that could reference transition tables > when ON CONFLICT DO UPDATE is used (one AFTER UPDATE statement level > trigger, and another AFTER INSERT statement level trigger), there'd be > separation at that level. You'd see updated tuples with one, and > inserted within the other. While INSERT ON CONFLICT DO UPDATE is > essentially one statement, it behaves as two statements in certain > limited ways. BTW, as you probably already know INSERT ON CONFLICT DO UPDATE is unable to affect the same row version a second time -- if we go on to update a row that the same upsert statement itself actually originally inserted, then a cardinality violation error is raised (recall that upsert does not actually use MVCC semantics to detect conflicting tuples). This differs from the UPDATE situation (I should really say the UPDATE ... FROM situation), where the implementation has always silently not updated the row a second time (to avoid the halloween problem). This cardinality violation restriction is a useful one for us to have imposed, because there is now no question about the same row appearing more than once. We know that a given row (any tuple from a single update chain) cannot appear once for an INSERT statement level trigger firing (appearing in the transition table there), while also making a second appearance for the UPDATE statement level trigger. -- Peter Geoghegan
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Thomas Munro
Date:
On Thu, Jun 8, 2017 at 10:21 AM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Wed, Jun 7, 2017 at 5:00 PM, Peter Geoghegan <pg@bowt.ie> wrote: > >> My assumption about how transition tables ought to behave here is >> based on the simple fact that we already fire both AFTER >> statement-level triggers, plus my sense of aesthetics, or bias. I >> admit that I might be missing the point, but if I am it would be >> useful to know how. > > Well, either will work. My inclination is that a single statement > should cause one execution of the FOR EACH STATEMENT trigger, but if > a good case can be made that we should have a FOR EACH STATEMENT > trigger fire for each clause within a statement -- well, it won't be > a problem for matview maintenance. The biggest hurt there would be > to *my* sense of aesthetics. ;-) Ok, I think we have two options here: 1. Keep the current behaviour. ON CONFLICT statements cause inserted and updated tuples to be mixed together in the same 'new table' tuplestore. Trigger authors will need to be aware that AFTER INSERT triggers might see some rows that were actually updated, and vice versa, which seems moderately surprising. If you defined one AFTER INSERT trigger and one AFTER UPDATE trigger, you'll see the same rows in both if someone runs such a statement (as I showed upthread), which also seems to violate the PoLA. 2. Make a code change that would split the 'new table' tuplestore in two: an insert tuplestore and an update tuplestore (for new images; old images could remain in the old tuplestore that is also used for deletes) as Peter suggests. That raises two questions for me: (1) Does it really make sense for the 'when' clause, which sounds like it only controls when we fire a trigger, also to affect which transition tuples it sees? There is something slightly fishy about that. (2) Assuming yes, should an AFTER INSERT OR UPDATE trigger see the union of the the two tuplestores? Trigger authors would need to be aware a single statement can produce a mixture of updates and inserts, but only if they explicitly invite such problems into their lives with that incantation. I think the code change for 2 is not enormous. Split the tuplestore in two, route tuples appropriately, and teach NamedTuplestoreScan to scan both if a union is needed. It does sound like the less user-hostile of the options. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Thomas Munro
Date:
On Thu, Jun 8, 2017 at 11:05 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > 1. Keep the current behaviour. [...] > > 2. Make a code change that would split the 'new table' tuplestore in > two: an insert tuplestore and an update tuplestore (for new images; > old images could remain in the old tuplestore that is also used for > deletes) as Peter suggests. That raises two questions for me: (1) > Does it really make sense for the 'when' clause, which sounds like it > only controls when we fire a trigger, also to affect which transition > tuples it sees? There is something slightly fishy about that. (2) > Assuming yes, should an AFTER INSERT OR UPDATE trigger see the union > of the the two tuplestores? Trigger authors would need to be aware a > single statement can produce a mixture of updates and inserts, but > only if they explicitly invite such problems into their lives with > that incantation. A third option would be for an AFTER INSERT OR UPDATE trigger to be invoked twice, once for the inserts and once again for the updates. No union required, but also surprising. Any other ideas? -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Robert Haas
Date:
On Wed, Jun 7, 2017 at 5:47 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > It has become clear that the scope of problems being found now > exceed what I can be sure of being able to fix in time to make for a > stable release, in spite of the heroic efforts Thomas has been > putting in. I had hoped to get this into the first or second CF of > this release, same with the release before, and same with the > release before that. At least landing it in the final CF drew the > level of review and testing needed to polish it, but it's far from > ideal timing (or procedure). I can revert from v10 and deal with > all of this for the first CF of some future release, but if someone > feels they can deal with it in v10 I'll stand back and offer what > help I can. I really don't know what to make of this. I can't ever remember a committer taking this attitude before. Aside from committing one patch that Thomas submitted shortly after the original commit, you haven't been willing to write, review, or commit a single line of code. The problem doesn't seem to be that the amount of effort is any more than it would be for any other feature this size; it's in better shape than some. The problem appears to be, rather, that the scope of problems you were willing to try to fix in a timely fashion was basically zero, which isn't very realistic for a patch of this size no matter who has reviewed it or in what level of detail. We don't have a lot of formal documentation around the responsibilities of PostgreSQL committers, but I think one that is pretty clearly acknowledged is "you are responsible for what you commit". If you're not willing to be responsible for your own patches, turn in your commit bit. Then, when you submit a patch, it'll either not get committed, or it'll be the responsibility of whoever does. I freely admit I encouraged you to commit this. I did not imagine that would be followed immediately by abdicating all responsibility for it. My mistake, I guess. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Heikki Linnakangas
Date:
On 06/08/2017 08:36 PM, Robert Haas wrote: > I freely admit I encouraged you to commit this. I did not imagine > that would be followed immediately by abdicating all responsibility > for it. My mistake, I guess. Robert, chill out. Kevin offered to revert. It's perhaps not the best way forward - I'm not familiar with the details here - but it's certainly one way to take responsibility. - Heikki
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Robert Haas
Date:
On Thu, Jun 8, 2017 at 1:48 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 06/08/2017 08:36 PM, Robert Haas wrote: >> >> I freely admit I encouraged you to commit this. I did not imagine >> that would be followed immediately by abdicating all responsibility >> for it. My mistake, I guess. > > Robert, chill out. That's probably good advice, but ... > Kevin offered to revert. It's perhaps not the best way > forward - I'm not familiar with the details here - but it's certainly one > way to take responsibility. ... he also proposed to then commit it again for some future release cycle, and what then? Revert it again if it turns out to have any bugs, and commit it a third time in some release cycle after that? It's a big, invasive patch. I don't think we want it going in and out of the tree repeatedly. More generally, I don't think there's ever a time when it's OK to commit a patch that you're not willing to put at least some effort into fixing up afterwards. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Peter Geoghegan
Date:
On Thu, Jun 8, 2017 at 10:59 AM, Robert Haas <robertmhaas@gmail.com> wrote: > More generally, I don't think there's ever a > time when it's OK to commit a patch that you're not willing to put at > least some effort into fixing up afterwards. Kevin said "It has become clear that the scope of problems being found now exceed what I can be sure of being able to fix in time to make for a stable release, in spite of the heroic efforts Thomas has been putting in". I think it's clear that Kevin is willing to put in some work. The issue is that he is unable to *guarantee* that he'll be able to put in *sufficient* time, and in light of that concedes that it might be best to revert and revisit for Postgres 11. He is being cautious, and does not want to *risk* unduly holding up the release. That was my understanding, at least. -- Peter Geoghegan
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Bruce Momjian
Date:
On Thu, Jun 8, 2017 at 11:05:43AM -0700, Peter Geoghegan wrote: > On Thu, Jun 8, 2017 at 10:59 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > More generally, I don't think there's ever a > > time when it's OK to commit a patch that you're not willing to put at > > least some effort into fixing up afterwards. > > Kevin said "It has become clear that the scope of problems being found > now exceed what I can be sure of being able to fix in time to make for > a stable release, in spite of the heroic efforts Thomas has been > putting in". I think it's clear that Kevin is willing to put in some > work. The issue is that he is unable to *guarantee* that he'll be able > to put in *sufficient* time, and in light of that concedes that it > might be best to revert and revisit for Postgres 11. He is being > cautious, and does not want to *risk* unduly holding up the release. > > That was my understanding, at least. I think we can all agree that Kevin should have communicated this earlier, rather than requiring Robert to push him on the issue. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Robert Haas
Date:
On Thu, Jun 8, 2017 at 2:05 PM, Peter Geoghegan <pg@bowt.ie> wrote: > On Thu, Jun 8, 2017 at 10:59 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> More generally, I don't think there's ever a >> time when it's OK to commit a patch that you're not willing to put at >> least some effort into fixing up afterwards. > > Kevin said "It has become clear that the scope of problems being found > now exceed what I can be sure of being able to fix in time to make for > a stable release, in spite of the heroic efforts Thomas has been > putting in". I think it's clear that Kevin is willing to put in some > work. The issue is that he is unable to *guarantee* that he'll be able > to put in *sufficient* time, and in light of that concedes that it > might be best to revert and revisit for Postgres 11. He is being > cautious, and does not want to *risk* unduly holding up the release. > > That was my understanding, at least. I understood the same. However, he didn't review or commit any of the bug fix patches that Thomas posted in May, or even respond to the mailing list threads. I eventually reviewed and committed them to avoid having the feature reverted; it took me only a few hours. Later, he said he would review the TransitionCaptureState patch Thomas posted at PGCon, later said again on-list that he would do so by Friday or anyway Monday, and it's now Thursday. In other words, I am not going just by this particular email, but by the fact that he hasn't committed so much as a one line bug fix or posted any reviews in a long time. The last non-reverted commit he made related to this feature was on April 6th, two days after the initial commit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Peter Geoghegan
Date:
On Thu, Jun 8, 2017 at 11:32 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I understood the same. However, he didn't review or commit any of the > bug fix patches that Thomas posted in May, or even respond to the > mailing list threads. I eventually reviewed and committed them to > avoid having the feature reverted; it took me only a few hours. > Later, he said he would review the TransitionCaptureState patch Thomas > posted at PGCon, later said again on-list that he would do so by > Friday or anyway Monday, and it's now Thursday. In other words, I am > not going just by this particular email, but by the fact that he > hasn't committed so much as a one line bug fix or posted any reviews > in a long time. The last non-reverted commit he made related to this > feature was on April 6th, two days after the initial commit. It does seem like Kevin could have done better there. However, I think that Kevin was taking responsibility for the situation, and that it wasn't accurate to suggest he blamed others. I thought his account of the constraints that he operated under was a simple factual account. I also don't think it's useful for you to discourage revert on the grounds that it's a slippery slope. Admitting fault doesn't need to be made any harder. -- Peter Geoghegan
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Robert Haas
Date:
On Thu, Jun 8, 2017 at 2:55 PM, Peter Geoghegan <pg@bowt.ie> wrote: > It does seem like Kevin could have done better there. However, I think > that Kevin was taking responsibility for the situation, and that it > wasn't accurate to suggest he blamed others. I thought his account of > the constraints that he operated under was a simple factual account. > > I also don't think it's useful for you to discourage revert on the > grounds that it's a slippery slope. Admitting fault doesn't need to be > made any harder. I am not sure that I entirely agree with you on these points, but I don't want to debate it further and especially not on a public mailing list. I'm going to spend some time over the next ~24 hours studying this thread and the patches and determining whether or not I'm willing to take responsibility for this patch. If I decide that I am, I will then work on trying to fix the technical problems. If I decide that I am not, then I think it will be necessary to take Kevin up on his offer to revert, unless some other committer volunteers. (Of course, anyone could step in to do the work, as Thomas already has to a considerable degree, but without a committer involved it doesn't fix the problem.) I don't really like either option, because, on the one hand, this is a pretty invasive thing to go rip out -- maybe we don't have to rip out the entire patch series, but just some of the later patches? -- and on the other hand, keeping it in the tree will require work, and I'm busy. But there don't seem to be any other options. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Peter Geoghegan
Date:
On Thu, Jun 8, 2017 at 1:20 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I am not sure that I entirely agree with you on these points, but I > don't want to debate it further and especially not on a public mailing > list. Fair enough. > I'm going to spend some time over the next ~24 hours studying > this thread and the patches and determining whether or not I'm willing > to take responsibility for this patch. Thank you. -- Peter Geoghegan
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
From
Andrew Gierth
Date:
>>>>> "Thomas" == Thomas Munro <thomas.munro@enterprisedb.com> writes: Thomas> So, afterTriggers.query_stack is used to handle the reentrancyThomas> that results from triggers running furtherstatements thatThomas> might fire triggers. It isn't used for dealing with extraThomas> ModifyTable nodes that canappear in a plan because of wCTEs.Thomas> Could it also be used for that purpose? I think that wouldThomas> only workif it is the case that each ModifyTable node beginThomas> and then runs to completion (ie no interleaving of wCTEThomas>execution) and then its AS trigger fires, which I'm not sureThomas> about. There's a problem with this which I didn't see anyone mention (though I've not read the whole history); existing users of wCTEs rely on the fact that no AFTER triggers are run until _all_ modifications are completed. If you change that, then you can't use wCTEs to insert into tables with FK checks without knowing the order in which the individual wCTEs are executed, which is currently a bit vague and non-obvious (which doesn't cause issues at the moment only because nothing actually depends on the order). for example: create table foo (id integer primary key); create table foo_bar (foo_id integer references foo, bar_id integer); with i1 as (insert into foo values (1)) insert into foo_bar values (1,2),(1,3); This would fail the FK check if each insert did its own trigger firing, since the foo_bar insert is actually run _first_. Firing triggers earlier than they currently are would thus break existing working queries. -- Andrew (irc:RhodiumToad)
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
From
Andrew Gierth
Date:
>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes: Robert> unless some other committer volunteers. (Of course, anyoneRobert> could step in to do the work, as Thomas alreadyhas to aRobert> considerable degree, but without a committer involved itRobert> doesn't fix the problem.) I can probably take this on if needed. -- Andrew (irc:RhodiumToad)
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Thomas Munro
Date:
On Fri, Jun 9, 2017 at 8:41 AM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: >>>>>> "Thomas" == Thomas Munro <thomas.munro@enterprisedb.com> writes: > > Thomas> So, afterTriggers.query_stack is used to handle the reentrancy > Thomas> that results from triggers running further statements that > Thomas> might fire triggers. It isn't used for dealing with extra > Thomas> ModifyTable nodes that can appear in a plan because of wCTEs. > Thomas> Could it also be used for that purpose? I think that would > Thomas> only work if it is the case that each ModifyTable node begin > Thomas> and then runs to completion (ie no interleaving of wCTE > Thomas> execution) and then its AS trigger fires, which I'm not sure > Thomas> about. > > There's a problem with this which I didn't see anyone mention (though > I've not read the whole history); existing users of wCTEs rely on the > fact that no AFTER triggers are run until _all_ modifications are > completed. If you change that, then you can't use wCTEs to insert into > tables with FK checks without knowing the order in which the individual > wCTEs are executed, which is currently a bit vague and non-obvious > (which doesn't cause issues at the moment only because nothing actually > depends on the order). > > for example: > create table foo (id integer primary key); > create table foo_bar (foo_id integer references foo, bar_id integer); > with i1 as (insert into foo values (1)) > insert into foo_bar values (1,2),(1,3); > > This would fail the FK check if each insert did its own trigger firing, > since the foo_bar insert is actually run _first_. > > Firing triggers earlier than they currently are would thus break > existing working queries. Thanks, and I agree. Later I came to the conclusion that we should neither change nor depend on the order of execution, but in order to avoid doing those things we simply couldn't keep using query_stack as working space for the tuplestores -- the access pattern does not fit stack semantics. That's why I proposed having the ModifyTable nodes own their own transition tuplestores, and passing them explicitly to the sites where they're needed (various trigger.c routines), instead of accessing them via that global stack. That's the approach of the WIP patch I posted here: https://www.postgresql.org/message-id/CAEepm%3D1K7F08QPu%2BkGcNQ-bCcYBa3QX%3D9AeE%3Dj0doCmgqVs4Tg%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > I don't really like either option, because, on the one hand, this is a > pretty invasive thing to go rip out -- maybe we don't have to rip out > the entire patch series, but just some of the later patches? -- and on > the other hand, keeping it in the tree will require work, and I'm > busy. But there don't seem to be any other options. I was wondering if we could simply remove the syntax (and documentation), and leave the infrastructure in place until things are fixed. I think everybody wants this feature, there's just concern about stabilizing it in time for v10. regards, tom lane
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Robert Haas
Date:
On Thu, Jun 8, 2017 at 5:01 PM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: >>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes: > Robert> unless some other committer volunteers. (Of course, anyone > Robert> could step in to do the work, as Thomas already has to a > Robert> considerable degree, but without a committer involved it > Robert> doesn't fix the problem.) > > I can probably take this on if needed. I would certainly not complain if you are willing to do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Robert Haas
Date:
On Tue, Jun 6, 2017 at 8:19 PM, Peter Geoghegan <pg@bowt.ie> wrote: > On Tue, Jun 6, 2017 at 5:01 PM, Peter Geoghegan <pg@bowt.ie> wrote: >> Also, ISTM that the code within ENRMetadataGetTupDesc() probably >> requires more explanation, resource management wise. Why so? I'm not saying you're wrong, but I don't see what's confusing about the status quo. I may be missing something important. > Also, it's not clear why it should be okay that the new type of > ephemeral RTEs introduced don't have permissions checks. There are > currently cases where the user cannot see data that they inserted > themselves (e.g., through RETURNING), on the theory that a before row > trigger might have modified the final contents of the tuple in a way > that the original inserter isn't supposed to know details about. Right, but the trigger has to be have been created by someone who has TRIGGER permission on the table, and such an individual can trivially steal the entire contents of every table row subsequently touched by any DML statement by installing a FOR-EACH-ROW trigger that records the contents of the tuple someplace. So the fact that they can now also do that by installing a FOR-EACH-STATEMENT trigger doesn't seem to be expanding the vulnerability surface. If anything, the new thing gives trigger-creators less power than they have already, since the tuplestore only lets you see what tuples were modified, whereas a before-row trigger could block or alter the modifications. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Thomas Munro
Date:
On Thu, Jun 8, 2017 at 11:50 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Thu, Jun 8, 2017 at 11:05 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> 1. Keep the current behaviour. [...] >> >> 2. Make a code change that would split the 'new table' tuplestore in >> two: an insert tuplestore and an update tuplestore (for new images; >> old images could remain in the old tuplestore that is also used for >> deletes) as Peter suggests. That raises two questions for me: (1) >> Does it really make sense for the 'when' clause, which sounds like it >> only controls when we fire a trigger, also to affect which transition >> tuples it sees? There is something slightly fishy about that. (2) >> Assuming yes, should an AFTER INSERT OR UPDATE trigger see the union >> of the the two tuplestores? Trigger authors would need to be aware a >> single statement can produce a mixture of updates and inserts, but >> only if they explicitly invite such problems into their lives with >> that incantation. > > A third option would be for an AFTER INSERT OR UPDATE trigger to be > invoked twice, once for the inserts and once again for the updates. > No union required, but also surprising. > > Any other ideas? I discussed this off-list with Andrew Gierth and we came up with a fourth way: Use separate insert and update tuplestores (as originally suggested by Peter) and use the <trigger event> (INSERT, UPDATE) to decide which one a trigger should see, as described in option 2 above, but disallow INSERT OR UPDATE triggers with transition tables so that we don't have to choose any of the surprising behaviours described above. Triggers with multiple <trigger event>s are a PostgreSQL extension, so by not allowing them with transition tables we don't reduce our compliance. If you want to be invoked twice when you run ON CONFLICT statements (like option 3 above) then you'll need to create two triggers, one for INSERT and the other for UPDATE, and each will see only the transition tuples resulting from inserts or updates respectively. The door is still open for us to allow INSERT OR UPDATE with transition tables in future releases if someone can figure out what that should do. I will draft a patch for this. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Thomas Munro
Date:
On Wed, Jun 7, 2017 at 5:36 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > I spent a couple of hours drafting a proof-of-concept to see if my > hunch was right. It seems to work correctly so far and isn't huge > (but certainly needs more testing and work): > > 6 files changed, 156 insertions(+), 109 deletions(-) [Adding Andrew Gierth] Here is a new version of the patch to fix transition tables with wCTEs, rebased on top of transition-tuples-from-child-tables-v10.patch[1] which must be applied first. This is patch 2 of a stack of 3 patches addressing currently known problems with transition tables. [1] https://www.postgresql.org/message-id/CAEepm%3D1Ei_0yN%2BvKTHHsTYdajaY59LBMUunxmpfhBU-eQQzqxA%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Robert Haas
Date:
On Fri, Jun 9, 2017 at 12:00 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Wed, Jun 7, 2017 at 5:36 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> I spent a couple of hours drafting a proof-of-concept to see if my >> hunch was right. It seems to work correctly so far and isn't huge >> (but certainly needs more testing and work): >> >> 6 files changed, 156 insertions(+), 109 deletions(-) > > [Adding Andrew Gierth] > > Here is a new version of the patch to fix transition tables with > wCTEs, rebased on top of > transition-tuples-from-child-tables-v10.patch[1] which must be applied > first. > > This is patch 2 of a stack of 3 patches addressing currently known > problems with transition tables. I don't see a reason why MakeTransitionCaptureState needs to force the tuplestores into TopTransactionContext or make them owned by TopTransactionResourceOwner. I mean, it was like that before, but afterTriggers is a global variable and, potentially, there could still be a pointer accessible through it to a tuplestore linked from it even after the corresponding subtransaction aborted, possibly causing some cleanup code to trip and fall over. But that can't be used to justify this case, because the TransitionCaptureState is only reached through the PlanState tree; if that goes away, how is anybody going to accidentally follow a pointer to the now-absent tuplestore? The structure of AfterTriggerSaveEvent with this patch applied looks pretty weird. IIUC, whenever we enter the block guarded by if (row_trigger), then either (a) transition_capture != NULL or (b) each of the four "if" statements inside that block will separately decide to do nothing. I think now that you're requiring a transition_capture for all instances where transition tuplestores are used, you could simplify this code. Maybe just change the outer "if" statement to if (row_trigger) and then Assert(transition_capture != NULL)? Not this patch's problem directly, but while scrutinizing this it crossed my mind that we would need to prohibit constraint triggers with transition tables. It turns out that we do, in the parser: create constraint trigger table2_trig after insert on table2 referencing new table as new_table for each statement executeprocedure dump_insert(); ERROR: syntax error at or near "referencing" Possibly it would be better to allow the syntax and error out in CreateTrigger(), so that we can give a better error message. It's certainly not obvious from the syntax summary produced by \h CREATE TRIGGER why this doesn't work. This might be more future-proof, too, if somebody someday adds support for REFERENCING { OLD | NEW } ROW to constraint triggers and fails to realize that there's not a check anywhere outside the parser for the table case. I don't see anything else wrong with this, and it seems like it solves the reported problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Peter Geoghegan
Date:
On Thu, Jun 8, 2017 at 3:13 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jun 6, 2017 at 8:19 PM, Peter Geoghegan <pg@bowt.ie> wrote: >> On Tue, Jun 6, 2017 at 5:01 PM, Peter Geoghegan <pg@bowt.ie> wrote: >>> Also, ISTM that the code within ENRMetadataGetTupDesc() probably >>> requires more explanation, resource management wise. > > Why so? I'm not saying you're wrong, but I don't see what's confusing > about the status quo. I may be missing something important. Perhaps nothing. I just thought it looked a bit odd to rely on the target relation's TupleDesc like that. It's not quite the same as the foreign table tuplestore stuff, because that has its own relkind, and has a relation RTE in the parser. -- Peter Geoghegan
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Alvaro Herrera
Date:
Robert Haas wrote: > Not this patch's problem directly, but while scrutinizing this it > crossed my mind that we would need to prohibit constraint triggers > with transition tables. It turns out that we do, in the parser: > > create constraint trigger table2_trig > after insert on table2 referencing new table as new_table > for each statement execute procedure dump_insert(); > ERROR: syntax error at or near "referencing" > > Possibly it would be better to allow the syntax and error out in > CreateTrigger(), so that we can give a better error message. It's > certainly not obvious from the syntax summary produced by \h CREATE > TRIGGER why this doesn't work. Yeah, I agree. This doesn't look like actual protection, but just a "happy accident", and the syntax error message is not helpful either. We could keep it very simple by just throwing the error right there in gram.y's production, adding TriggerReferencing in the right place in the CREATE CONSTRAINT TRIGGER production -- at this stage there doesn't seem to be a need to expand this any further. > This might be more future-proof, too, > if somebody someday adds support for REFERENCING { OLD | NEW } ROW to > constraint triggers and fails to realize that there's not a check > anywhere outside the parser for the table case. I suppose in the future it would make sense to allow for-each-statement constraint triggers with transition tables. It'd probably a useful way to optimize FK checks for bulk operations. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
From
Andrew Gierth
Date:
>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes: Robert> I don't see a reason why MakeTransitionCaptureState needs toRobert> force the tuplestores into TopTransactionContextor make themRobert> owned by TopTransactionResourceOwner. Nor do I, and I'm pretty sure it's leaking memory wholesale within a transaction if you have aborted subxacts. e.g. a few iterations of this, on a table with an appropriate trigger: savepoint s1; insert into foo select i, case when i < 100000 then repeat('a',100) else repeat('a',1/(i-100000)) end from generate_series(1,100000)i; rollback to s1; This is a bit contrived of course but it shows that there's missing cleanup somewhere, either in the form of poor choice of context or missing code in the subxact cleanup. Robert> I mean, it was like that before, but afterTriggers is a globalRobert> variable and, potentially, there could stillbe a pointerRobert> accessible through it to a tuplestore linked from it even afterRobert> the corresponding subtransactionaborted, possibly causing someRobert> cleanup code to trip and fall over. But that can't be used toRobert>justify this case, because the TransitionCaptureState is onlyRobert> reached through the PlanState tree; if thatgoes away, how isRobert> anybody going to accidentally follow a pointer to theRobert> now-absent tuplestore? For per-row triggers with transition tables, a pointer to the transition capture state ends up in the shared-data record in the event queue? -- Andrew.
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Thomas Munro
Date:
On Mon, Jun 12, 2017 at 9:29 AM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: >>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes: > > Robert> I don't see a reason why MakeTransitionCaptureState needs to > Robert> force the tuplestores into TopTransactionContext or make them > Robert> owned by TopTransactionResourceOwner. > > Nor do I, and I'm pretty sure it's leaking memory wholesale within a > transaction if you have aborted subxacts. > > e.g. a few iterations of this, on a table with an appropriate trigger: > > savepoint s1; > insert into foo > select i, case when i < 100000 then repeat('a',100) > else repeat('a',1/(i-100000)) end > from generate_series(1,100000) i; > rollback to s1; > > This is a bit contrived of course but it shows that there's missing > cleanup somewhere, either in the form of poor choice of context or > missing code in the subxact cleanup. Right, thanks. I think the fix for this is to use CurTransactionContext (for memory cleanup) and CurTransactionResourceOwner (for temporary file cleanup, in case the tuplestores spill to disk). The attached does it that way. With the previous version of the patch, I could see temporary files accumulating under base/pgsql_tmp from each aborted subtransaction when I set work_mem = '64kB' to provoke spillage. With the attached version, the happy path cleans up (because ExecEndModifyTable calls DestroyTransitionCaptureState), and the error path does too (because the subxact's memory and temporary files are automatically cleaned up). > Robert> I mean, it was like that before, but afterTriggers is a global > Robert> variable and, potentially, there could still be a pointer > Robert> accessible through it to a tuplestore linked from it even after > Robert> the corresponding subtransaction aborted, possibly causing some > Robert> cleanup code to trip and fall over. But that can't be used to > Robert> justify this case, because the TransitionCaptureState is only > Robert> reached through the PlanState tree; if that goes away, how is > Robert> anybody going to accidentally follow a pointer to the > Robert> now-absent tuplestore? > > For per-row triggers with transition tables, a pointer to the transition > capture state ends up in the shared-data record in the event queue? Yes. But the only way for event queue data to last longer than the execution of the current statement is if it's a deferred trigger. Only constraint triggers can be deferred, and constraint triggers are not allowed to have transition tables. If we wanted to support that (for example, to handle set-orient FK checks as has been discussed), we'd need to change the lifetime of the TransitionCaptureState object (it would need to outlast ModifyTableState). I've added a note to that effect to MakeTransitionCaptureState. So far there seem to be no objections to the inclusion of the pointer in the event queue that records which transition tables each trigger invocation should see...? The extra memory consumption per chunk is limited by the number of ModifyTable nodes, so it's not going to be a memory hog, but I thought someone might not like it on other grounds. On Sat, Jun 10, 2017 at 7:48 AM, Robert Haas <robertmhaas@gmail.com> wrote: > The structure of AfterTriggerSaveEvent with this patch applied looks > pretty weird. IIUC, whenever we enter the block guarded by if > (row_trigger), then either (a) transition_capture != NULL or (b) each > of the four "if" statements inside that block will separately decide > to do nothing. I think now that you're requiring a transition_capture > for all instances where transition tuplestores are used, you could > simplify this code. Maybe just change the outer "if" statement to if > (row_trigger) and then Assert(transition_capture != NULL)? Yeah, that was silly, and also the comment was misleading. It is possible for row_trigger to be true and transition_capture to be NULL, so I did it slightly differently. Fixed in the attached. Thanks both for the review. New version of patch #2 attached. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
From
Andrew Gierth
Date:
>>>>> "Thomas" == Thomas Munro <thomas.munro@enterprisedb.com> writes: Thomas> Thanks both for the review. New version of patch #2 attached. I'm looking to commit this soon; if anyone has any further comment now would be a good time to speak up. -- Andrew (irc:RhodiumToad)
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
From
Thomas Munro
Date:
On Mon, Jun 19, 2017 at 11:06 AM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: >>>>>> "Thomas" == Thomas Munro <thomas.munro@enterprisedb.com> writes: > > Thomas> Thanks both for the review. New version of patch #2 attached. > > I'm looking to commit this soon; if anyone has any further comment now > would be a good time to speak up. Here's patch #2 rebased for the recent reindent. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers