Thread: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

Since the subject of transition tables came up, I thought I'd test how this case works:

=# create table qwr(a int);
CREATE TABLE

=# create function qq() returns trigger as $$ begin raise notice '%', (select count(*) from oo); return null; end $$ language plpgsql;
CREATE FUNCTION

=# create trigger xx after insert on qwr referencing new table as oo for each statement execute procedure qq();
CREATE TRIGGER

=# with t as (insert into qwr values (1)) insert into qwr values (2), (3);
NOTICE:  3
NOTICE:  3
INSERT 0 2

to me, this means that it doesn't work.  Surely one of the trigger invocations should say 1, and the other 2.  Or was this intentional?


.m
On Fri, Jun 2, 2017 at 10:48 PM, Marko Tiikkaja <marko@joh.to> wrote:
> Since the subject of transition tables came up, I thought I'd test how this
> case works:
>
> =# create table qwr(a int);
> CREATE TABLE
>
> =# create function qq() returns trigger as $$ begin raise notice '%',
> (select count(*) from oo); return null; end $$ language plpgsql;
> CREATE FUNCTION
>
> =# create trigger xx after insert on qwr referencing new table as oo for
> each statement execute procedure qq();
> CREATE TRIGGER
>
> =# with t as (insert into qwr values (1)) insert into qwr values (2), (3);
> NOTICE:  3
> NOTICE:  3
> INSERT 0 2
>
> to me, this means that it doesn't work.  Surely one of the trigger
> invocations should say 1, and the other 2.  Or was this intentional?

I would have expected that to be handled by separate transition tables
at different query levels, but evidently it isn't.  The following
crashes:

create table table1(a int);
create table table2(a text);

create function trigfunc() returns trigger as $$ begin raise notice
'got: %', (select oo from oo); return null; end $$ language plpgsql;

create trigger trig1 after insert on table1 referencing new table as
oo for each statement execute procedure trigfunc();

create trigger trig2 after insert on table2 referencing new table as
oo for each statement execute procedure trigfunc();

with t as (insert into table1 values (1)) insert into table2 values ('hello');

I've run out of day today but will investigate.

-- 
Thomas Munro
http://www.enterprisedb.com



On Sat, Jun 3, 2017 at 12:10 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> I would have expected that to be handled by separate transition tables
> at different query levels, but evidently it isn't.

I am wondering if we need to wrap the execution of a modifying CTE in
AfterTriggerBeginQuery() and AfterTriggerEndQuery().  But I'm not sure
where, and it may be a couple of days before I can dig some more.

-- 
Thomas Munro
http://www.enterprisedb.com



On Sat, Jun 3, 2017 at 1:20 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Sat, Jun 3, 2017 at 12:10 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> I would have expected that to be handled by separate transition tables
>> at different query levels, but evidently it isn't.
>
> I am wondering if we need to wrap the execution of a modifying CTE in
> AfterTriggerBeginQuery() and AfterTriggerEndQuery().  But I'm not sure
> where, and it may be a couple of days before I can dig some more.

So, afterTriggers.query_stack is used to handle the reentrancy that
results from triggers running further statements that might fire
triggers.  It isn't used for dealing with extra ModifyTable nodes that
can appear in a plan because of wCTEs.  Could it also be used for that
purpose?  I think that would only work if it is the case that each
ModifyTable node begin and then runs to completion (ie no interleaving
of wCTE execution) and then its AS trigger fires, which I'm not sure
about.  If that is the case, perhaps AfterTriggerBeginQuery and
AfterTriggerEndQuery could become the responsibility of
nodeModifyTable.c rather than execMain.c.  I haven't tried this yet
and it may well be too ambitious at this stage.

Other ideas: (1) ban wCTEs that target relations with transition
tables in PG10, because we're out of time; (2) find an entirely new
way to keep track of the current active transition table(s) to replace
the current stack approach, such as including them in the
TransitionCaptureState object in the patch that I proposed to fix the
nearby inheritance bugs[1].

Stepping back and squinting a bit, both this and the inheritance bug
stem from a failure to handle multiple ModifyTable nodes in a plan,
but the solutions are approximately opposite: in the inheritance case,
the solution I propose is to teach them all to coordinate their tuple
capture so that it's in a common format, and in the wCTE case the
solution must surely be to ensure that their state is kept separate.
The question I'd like to figure out is whether the existing
AfterTriggerBeginQuery/AfterTriggerEndQuery stack is the right data
structure for that task, considering the control flow when CTEs are
executed, which I need to learn more about.

Thoughts?

[1]
https://www.postgresql.org/message-id/flat/CA%2BTgmoZzTBBAsEUh4MazAN7ga%3D8SsMC-Knp-6cetts9yNZUCcg%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com



On Fri, Jun 2, 2017 at 8:46 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

> So, afterTriggers.query_stack is used to handle the reentrancy that
> results from triggers running further statements that might fire
> triggers.  It isn't used for dealing with extra ModifyTable nodes that
> can appear in a plan because of wCTEs.  Could it also be used for that
> purpose?  I think that would only work if it is the case that each
> ModifyTable node begin and then runs to completion (ie no interleaving
> of wCTE execution) and then its AS trigger fires, which I'm not sure
> about.

I don't think we want to commit to anything that depends on a CTE
creating an optimization fence, although *maybe* that would be OK in
the case of DML as a CTE.  That's a pretty special case; I'm not
sure whether the standard discusses it.

> If that is the case, perhaps AfterTriggerBeginQuery and
> AfterTriggerEndQuery could become the responsibility of
> nodeModifyTable.c rather than execMain.c.  I haven't tried this yet
> and it may well be too ambitious at this stage.

In a world where one statement can contain multiple DML statements
within CTEs, that may make sense regardless of transition table
issues; but I agree we're past the time to be considering making
such a change for v10.

> Other ideas: (1) ban wCTEs that target relations with transition
> tables in PG10, because we're out of time;

Given that we haven't even discussed what to do about an UPDATE
statement with a CTE that updates the same table when there are
BEFORE EACH ROW UPDATE triggers on that table (which perhaps return
NULL for some but not all rows?), I'm not sure we've yet plumbed the
depths of this morass.

For example, for this:

drop table if exists t1 cascade;

create table t1 (c1 int not null, c2 text not null default '');
insert into t1 (c1) values (1), (2), (3), (4), (5);

drop function if exists t1_upd_func() cascade;
create function t1_upd_func() returns trigger language plpgsql
as $$
begin if old.c1 between 2 and 4 then   new.c2 = old.c2 || ';' || old.c1::text || ',' || new.c1::text; end if; if old.c1
>3 then   return null; end if; return new;
 
end;
$$;
create trigger t1_upd_update before update on t1 for each row execute procedure t1_upd_func();
with x as (update t1 set c1 = c1 - 1 returning c1) update t1 set c1 = t1.c1 + 1 from x where x.c1 = t1.c1;
select * from t1;

... what is the correct result?

I get this:
c1 |  c2
----+------ 4 | 5 | 0 | 1 | ;2,1 2 | ;3,2
(5 rows)

It is definitely not what I expected, and seems wrong in multiple
ways from a logical perspective, looking  at those as set-based
operations.  (Tautologies about the procedural mechanism that
creates the result are not defenses of the result itself.)

Can we fix things exclusive of transition tables in this release?
If not, the only reasonable thing to do is to try to avoid further
complicating things with CTE/transition table usage until we sort
out the basics of triggers firing from CTE DML in the first place.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/



On Sun, Jun 4, 2017 at 10:41 AM, Kevin Grittner <kgrittn@gmail.com> wrote:
> On Fri, Jun 2, 2017 at 8:46 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>
>> So, afterTriggers.query_stack is used to handle the reentrancy that
>> results from triggers running further statements that might fire
>> triggers.  It isn't used for dealing with extra ModifyTable nodes that
>> can appear in a plan because of wCTEs.  Could it also be used for that
>> purpose?  I think that would only work if it is the case that each
>> ModifyTable node begin and then runs to completion (ie no interleaving
>> of wCTE execution) and then its AS trigger fires, which I'm not sure
>> about.
>
> I don't think we want to commit to anything that depends on a CTE
> creating an optimization fence, although *maybe* that would be OK in
> the case of DML as a CTE.  That's a pretty special case; I'm not
> sure whether the standard discusses it.

According to our manual the standard doesn't allow DML in CTEs.  I
suppose it wouldn't make much sense without RETURNING, which is also
non-standard.

>> If that is the case, perhaps AfterTriggerBeginQuery and
>> AfterTriggerEndQuery could become the responsibility of
>> nodeModifyTable.c rather than execMain.c.  I haven't tried this yet
>> and it may well be too ambitious at this stage.
>
> In a world where one statement can contain multiple DML statements
> within CTEs, that may make sense regardless of transition table
> issues; but I agree we're past the time to be considering making
> such a change for v10.

Ok.

>> Other ideas: (1) ban wCTEs that target relations with transition
>> tables in PG10, because we're out of time;
>
> Given that we haven't even discussed what to do about an UPDATE
> statement with a CTE that updates the same table when there are
> BEFORE EACH ROW UPDATE triggers on that table (which perhaps return
> NULL for some but not all rows?), I'm not sure we've yet plumbed the
> depths of this morass.
>
> [...]
>
> Can we fix things exclusive of transition tables in this release?
> If not, the only reasonable thing to do is to try to avoid further
> complicating things with CTE/transition table usage until we sort
> out the basics of triggers firing from CTE DML in the first place.

At least it's well documented that the execution order is undefined,
but yeah triggers do make things a lot more complicated and I'm not
sure I understand to what extent that is unavoidable once you decide
to allow DML in CTEs.

In the meantime, it seems like you agree that rejecting wCTEs that
affect tables with triggers with transition tables is the best
response to this bug report?  Do you think that parse analysis is the
right time to do the check?  Here's a first attempt at that.

Later, for the next cycle, I think we should investigate storing the
transition tables in ModifyTableState rather than in query_stack, and
then passing them to the Exec*Triggers() functions as arguments
(possibly inside TransitionCaptureState, if you accept my proposal for
the inheritance bug).

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On Sat, Jun 3, 2017 at 10:39 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> In the meantime, it seems like you agree that rejecting wCTEs that
> affect tables with triggers with transition tables is the best
> response to this bug report?  Do you think that parse analysis is the
> right time to do the check?  Here's a first attempt at that.

I'm starting to like the approach of reverting the entire transition
tables patch.  Failing to consider the possibility of a plan with
multiple ModifyTable nodes seems like a pretty fundamental design
mistake, and I'm not eager either to ship this with that broken or try
to fix it at this stage of the release cycle.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Jun 3, 2017 at 10:39 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> In the meantime, it seems like you agree that rejecting wCTEs that
>> affect tables with triggers with transition tables is the best
>> response to this bug report?  Do you think that parse analysis is the
>> right time to do the check?  Here's a first attempt at that.

FWIW, parse analysis is surely NOT the time for such a check.  Triggers
might get added to a table between analysis and execution.  I think you
might have to do it during executor startup.

> I'm starting to like the approach of reverting the entire transition
> tables patch.  Failing to consider the possibility of a plan with
> multiple ModifyTable nodes seems like a pretty fundamental design
> mistake, and I'm not eager either to ship this with that broken or try
> to fix it at this stage of the release cycle.

Postponing the feature to v11 might be a viable solution.  We don't
have any other major work that depends on it do we?
        regards, tom lane



On Mon, Jun 5, 2017 at 10:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm starting to like the approach of reverting the entire transition
>> tables patch.  Failing to consider the possibility of a plan with
>> multiple ModifyTable nodes seems like a pretty fundamental design
>> mistake, and I'm not eager either to ship this with that broken or try
>> to fix it at this stage of the release cycle.
>
> Postponing the feature to v11 might be a viable solution.  We don't
> have any other major work that depends on it do we?

I don't think anything that depends on it was committed to v10.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Tue, Jun 6, 2017 at 2:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sat, Jun 3, 2017 at 10:39 PM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>> In the meantime, it seems like you agree that rejecting wCTEs that
>>> affect tables with triggers with transition tables is the best
>>> response to this bug report?  Do you think that parse analysis is the
>>> right time to do the check?  Here's a first attempt at that.
>
> FWIW, parse analysis is surely NOT the time for such a check.  Triggers
> might get added to a table between analysis and execution.  I think you
> might have to do it during executor startup.

Hmm.  My understanding:  In analyzeCTE(), where the check is done in
that patch, we hold a lock on the relation because we have a
RangeTblEntry.  That means that no triggers can be added concurrently
in the case where you go on to execute the plan.  If you save the plan
for later execution, triggers created between then and lock
reacquisition (because creating triggers touches pg_class) cause
reanalysis.  This is not fundamentally different from altering the
table.  For example, with that patch:

postgres=# prepare ps as with t as (insert into table1 values (1))
insert into table2 values ('hello');
PREPARE
postgres=# create trigger trig1 after insert on table1 referencing new
table as oo for each row execute procedure trigfunc();
CREATE TRIGGER
postgres=# execute ps;
ERROR:  WITH clause cannot modify a table that has triggers with
transition tables

What am I missing?

>> I'm starting to like the approach of reverting the entire transition
>> tables patch.  Failing to consider the possibility of a plan with
>> multiple ModifyTable nodes seems like a pretty fundamental design
>> mistake, and I'm not eager either to ship this with that broken or try
>> to fix it at this stage of the release cycle.

Not handling wCTEs or inheritance definitely counts as a howler, but
we seem tantalisingly close and it would be a shame to push the whole
feature back IMHO.  I really hope we can move on to talking about
incremental matviews in the next cycle, which is why I've jumped on
every problem report so far.

I'm still waiting to hear from Kevin whether he thinks what I proposed
for the inheritance bug has legs.  If so, my vote from the peanut
gallery would be to keep transition tables in the tree but block wCTEs
with transition tables, and then add support in the next release
(which I'd be happy to work on).  Support will certainly be needed
ASAP, because it'd suck if incremental matview maintenance didn't work
just because you use wCTEs, and the 'action at a distance' factor
isn't a great user experience (adding a new trigger can in general
break existing queries, but breaking them just because they use wCTEs
is arguably surprising).  On the other hand, if he thinks that the
inheritance bug is not adequately fixed by my proposal (with minor
tweaks) and needs a completely different approach, then I'm out (but
will be happy to help work on this stuff for PG11).

-- 
Thomas Munro
http://www.enterprisedb.com



Thomas Munro <thomas.munro@enterprisedb.com> writes:
> On Tue, Jun 6, 2017 at 2:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> FWIW, parse analysis is surely NOT the time for such a check.  Triggers
>> might get added to a table between analysis and execution.  I think you
>> might have to do it during executor startup.

> Hmm.  My understanding:  In analyzeCTE(), where the check is done in
> that patch, we hold a lock on the relation because we have a
> RangeTblEntry.  That means that no triggers can be added concurrently
> in the case where you go on to execute the plan.  If you save the plan
> for later execution, triggers created between then and lock
> reacquisition (because creating triggers touches pg_class) cause
> reanalysis.

No, they don't necessarily.  One case that will certainly not be
covered by such a test is if the wCTE is an UPDATE or DELETE on
an inheritance hierarchy: parse analysis has no idea whether the
target table has children at all, let alone whether any of the
children have triggers that use transition tables.  You really want
the test to happen after the planner has expanded inheritance.

Another problem is that the result of parse analysis is frozen for much
longer than you're thinking about in the case of stored rules or views.
We currently disallow views that contain writable CTEs, but I do not
think there's such a prohibition for rules.

More generally, the problem I've got with this proposal is that it causes
the presence of triggers to be significant much earlier in query execution
than they used to be.  I'm concerned that this may break optimizations
that are already in place somewhere, or that we might want to add later.
        regards, tom lane



On Tue, Jun 6, 2017 at 10:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>> On Tue, Jun 6, 2017 at 2:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> FWIW, parse analysis is surely NOT the time for such a check.  Triggers
>>> might get added to a table between analysis and execution.  I think you
>>> might have to do it during executor startup.
>
>> Hmm.  My understanding:  In analyzeCTE(), where the check is done in
>> that patch, we hold a lock on the relation because we have a
>> RangeTblEntry.  That means that no triggers can be added concurrently
>> in the case where you go on to execute the plan.  If you save the plan
>> for later execution, triggers created between then and lock
>> reacquisition (because creating triggers touches pg_class) cause
>> reanalysis.
>
> No, they don't necessarily.  One case that will certainly not be
> covered by such a test is if the wCTE is an UPDATE or DELETE on
> an inheritance hierarchy: parse analysis has no idea whether the
> target table has children at all, let alone whether any of the
> children have triggers that use transition tables.  You really want
> the test to happen after the planner has expanded inheritance.

If Kevin accepts my other patch or at least the approach, we won't
allow any child table triggers with transition tables, so we wouldn't
need to consider them here (triggers on the named result table see
tuples collected from children, but ROW triggers attached to children
can't have transition tables and STATEMENT triggers attached to
children don't fire).  Though those decisions probably wouldn't be
future-proof, we'd only want this block in place until we had support
for wCTEs.

> Another problem is that the result of parse analysis is frozen for much
> longer than you're thinking about in the case of stored rules or views.
> We currently disallow views that contain writable CTEs, but I do not
> think there's such a prohibition for rules.

Ugh.  Right.  I can indeed circumvent the above check with a wCTE
hiding in a rule, so parse analysis is not the right time for the
check.  The whole rewrite area is clearly something I need to go and
learn about.  Thanks for the explanation.

-- 
Thomas Munro
http://www.enterprisedb.com



On 4 June 2017 at 06:41, Kevin Grittner <kgrittn@gmail.com> wrote:
> On Fri, Jun 2, 2017 at 8:46 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>
>> So, afterTriggers.query_stack is used to handle the reentrancy that
>> results from triggers running further statements that might fire
>> triggers.  It isn't used for dealing with extra ModifyTable nodes that
>> can appear in a plan because of wCTEs.  Could it also be used for that
>> purpose?  I think that would only work if it is the case that each
>> ModifyTable node begin and then runs to completion (ie no interleaving
>> of wCTE execution) and then its AS trigger fires, which I'm not sure
>> about.
>
> I don't think we want to commit to anything that depends on a CTE
> creating an optimization fence, although *maybe* that would be OK in
> the case of DML as a CTE.  That's a pretty special case; I'm not
> sure whether the standard discusses it.

It's definitely fine to require a fence for wCTEs. They're an
extension to the standard, and it's pretty much necessary that we not
pull up / push down across them since we don't want to affect the
side-effects (row changes). If there are any cases where it's safe,
they'll take some careful thought.

It's only standard CTEs (SELECT-based) that I think matter for the
optimisation fence behaviour.


-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



On Tue, Jun 6, 2017 at 12:58 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 4 June 2017 at 06:41, Kevin Grittner <kgrittn@gmail.com> wrote:
>> On Fri, Jun 2, 2017 at 8:46 PM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>
>>> So, afterTriggers.query_stack is used to handle the reentrancy that
>>> results from triggers running further statements that might fire
>>> triggers.  It isn't used for dealing with extra ModifyTable nodes that
>>> can appear in a plan because of wCTEs.  Could it also be used for that
>>> purpose?  I think that would only work if it is the case that each
>>> ModifyTable node begin and then runs to completion (ie no interleaving
>>> of wCTE execution) and then its AS trigger fires, which I'm not sure
>>> about.
>>
>> I don't think we want to commit to anything that depends on a CTE
>> creating an optimization fence, although *maybe* that would be OK in
>> the case of DML as a CTE.  That's a pretty special case; I'm not
>> sure whether the standard discusses it.
>
> It's definitely fine to require a fence for wCTEs. They're an
> extension to the standard, and it's pretty much necessary that we not
> pull up / push down across them since we don't want to affect the
> side-effects (row changes). If there are any cases where it's safe,
> they'll take some careful thought.
>
> It's only standard CTEs (SELECT-based) that I think matter for the
> optimisation fence behaviour.

After sleeping on it, I don't think we need to make that decision here
though.  I think it's better to just move the tuplestores into
ModifyTableState so that each embedded DML statement has its own, and
have ModifyTable pass them to the trigger code explicitly.  I think
I'd like to do that via the TransitionCaptureState object that I
proposed elsewhere, but I'll hold off on doing anything until I hear
from interested committers on which way we're going here, time being
short.

Call me an anti-globalisation (of variables) protestor.

-- 
Thomas Munro
http://www.enterprisedb.com



Nice as it would be to add a SQL standard feature and advance the
effort to get to incremental maintenance of materialized views, and
much as I really appreciate the efforts Thomas has put into trying
to solve these problems, I agree that it is best to revert the
feature.  It took years to get an in-depth review, then I was asked
not to commit it because others were working on patches that would
conflict.  That just doesn't leave enough time to address these
issues before release.  Fundamentally, I'm not sure that there is a
level of interest sufficient to support the effort.

I'll give it a few days for objections before reverting.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/



On Tue, Jun 6, 2017 at 10:25 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
Nice as it would be to add a SQL standard feature and advance the
effort to get to incremental maintenance of materialized views, and
much as I really appreciate the efforts Thomas has put into trying
to solve these problems, I agree that it is best to revert the
feature.  It took years to get an in-depth review, then I was asked
not to commit it because others were working on patches that would
conflict.  That just doesn't leave enough time to address these
issues before release.  Fundamentally, I'm not sure that there is a
level of interest sufficient to support the effort.

I'll give it a few days for objections before reverting.

I can only say that the lack of this feature comes up on a weekly basis on IRC, and a lot of people would be disappointed to see it reverted.


.m
On Tue, Jun 6, 2017 at 3:41 PM, Marko Tiikkaja <marko@joh.to> wrote:
> On Tue, Jun 6, 2017 at 10:25 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

>> It took years to get an in-depth review, then I was asked
>> not to commit it because others were working on patches that would
>> conflict.  That just doesn't leave enough time to address these
>> issues before release.  Fundamentally, I'm not sure that there is a
>> level of interest sufficient to support the effort.

> I can only say that the lack of this feature comes up on a weekly basis on
> IRC, and a lot of people would be disappointed to see it reverted.

Well, at PGCon I talked with someone who worked on the
implementation in Oracle 20-some years ago.  He said they had a team
of 20 people full time working on the feature for years to get it
working.  Now, I think the PostgreSQL community is a little lighter
on its feet, but without more active participation from others than
there has been so far, I don't intend to take another run at it.
I'll be happy to participate at such point that it's not treated as
a second-class feature set.  Barring that, anyone else who wants to
take the patch and run with it is welcome to do so.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/



On Tue, Jun 6, 2017 at 4:25 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
> Nice as it would be to add a SQL standard feature and advance the
> effort to get to incremental maintenance of materialized views, and
> much as I really appreciate the efforts Thomas has put into trying
> to solve these problems, I agree that it is best to revert the
> feature.  It took years to get an in-depth review, then I was asked
> not to commit it because others were working on patches that would
> conflict.  That just doesn't leave enough time to address these
> issues before release.  Fundamentally, I'm not sure that there is a
> level of interest sufficient to support the effort.

I find it a little unfair that you are blaming the community for not
taking enough interest in this feature when other people are posting
bug fixes for the feature which you can't find time to review (or even
respond to with an i'm-too-busy email) for lengthy periods of time.
Previously, I took the time to review and commit fixes for several
reported fixes which Thomas developed, while you ignored all of the
relevant threads.  Now, he's posted another patch which you said you
would review by last Friday or at the latest by Monday and which you
have not reviewed.  He's also suggested that the fix for this issue
probably relies on that one, so getting that patch reviewed is
relevant to this thread also.

I think I'd like to walk back my earlier statements about reverting
this patch just a little bit.  Although putting the tuplestore at the
wrong level does seem like a fairly significant design mistake, Thomas
more or less convinced me yesterday on a Skype call that relocating it
to the ModifyTable node might not be that much work.  If it's a
150-line patch, it'd probably be less disruptive to install a fix than
to revert the whole thing (and maybe put it back in again, in some
future release).  On the other hand, if you aren't willing to write,
review, or commit any further patches to fix bugs in this feature,
then it can really only stay in the tree if somebody else is willing
to assume completely responsibility for it going forward.

So, are you willing and able to put any effort into this, like say
reviewing the patch Thomas posted, and if so when and how much?  If
you're just done and you aren't going to put any more work into
maintaining it (for whatever reasons), then I think you should say so
straight out.  People shouldn't have to guess whether you're going to
maintain your patch or not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Mon, Jun 5, 2017 at 6:40 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> After sleeping on it, I don't think we need to make that decision here
> though.  I think it's better to just move the tuplestores into
> ModifyTableState so that each embedded DML statement has its own, and
> have ModifyTable pass them to the trigger code explicitly.

I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE
case -- one for updated tuples, and the other for inserted tuples.


-- 
Peter Geoghegan



On Tue, Jun 6, 2017 at 3:47 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE
> case -- one for updated tuples, and the other for inserted tuples.

Also, ISTM that the code within ENRMetadataGetTupDesc() probably
requires more explanation, resource management wise.


-- 
Peter Geoghegan



On Tue, Jun 6, 2017 at 5:01 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> Also, ISTM that the code within ENRMetadataGetTupDesc() probably
> requires more explanation, resource management wise.

Also, it's not clear why it should be okay that the new type of
ephemeral RTEs introduced don't have permissions checks. There are
currently cases where the user cannot see data that they inserted
themselves (e.g., through RETURNING), on the theory that a before row
trigger might have modified the final contents of the tuple in a way
that the original inserter isn't supposed to know details about.

As the INSERT docs say, "Use of the RETURNING clause requires SELECT
privilege on all columns mentioned in RETURNING". Similarly, the
excluded.* pseudo-relation requires select privilege (on the
corresponding target relation columns) in order to be usable by ON
CONFLICT DO UPDATE.

-- 
Peter Geoghegan



I'll give it a few days for objections before reverting.

I can only say that the lack of this feature comes up on a weekly basis on IRC, and a lot of people would be disappointed to see it reverted.

Not that my opinion matters, but I was very much looking forward to this feature in Postgres 10, but I understand if it just won't be stable enough to stay in.

I have my fingers crossed these issues can be resolved.
On Wed, Jun 7, 2017 at 9:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think I'd like to walk back my earlier statements about reverting
> this patch just a little bit.  Although putting the tuplestore at the
> wrong level does seem like a fairly significant design mistake, Thomas
> more or less convinced me yesterday on a Skype call that relocating it
> to the ModifyTable node might not be that much work.  If it's a
> 150-line patch, it'd probably be less disruptive to install a fix than
> to revert the whole thing (and maybe put it back in again, in some
> future release).

I spent a couple of hours drafting a proof-of-concept to see if my
hunch was right.  It seems to work correctly so far and isn't huge
(but certainly needs more testing and work):

 6 files changed, 156 insertions(+), 109 deletions(-)

It applies on top of the other patch[1].  It extends
TransitionCaptureState to hold the the new and old tuplestores for
each ModifyTable node, instead of using global variables.  The result
is that tuplestores don't get mixed up, and there aren't any weird
assumptions about the order of execution as discussed earlier.
Example:

  with wcte as (insert into table1 values (42))
    insert into table2 values ('hello world');
  NOTICE:  trigger = table2_trig, old table = <NULL>, new table =
("hello world")
  NOTICE:  trigger = table1_trig, old table = <NULL>, new table = (42)

Summary of how these patches relate:

1.  In the inheritance patch[1], TransitionCaptureState is introduced.
It holds flags that control whether we capture tuples.  There is one
of these per ModifyTable node.  In master we use the flags in
TriggerDesc to control transition tuple capture directly, but we
needed a way for ModifyTable's result rel's TriggerDesc to affect all
child tables that are touched.  My proposal is to do that by inventing
this new object to activate transition tuple capture while modifying
child tables too.  It is passed into the ExecAR*Trigger() functions of
all relations touched by the ModifyTable node.

2.  In the attached patch, that struct is extended to hold the actual
tuplestores.  They are used for two purposes: ExecAR*Trigger()
captures tuples into them (instead of using global variables to find
the tuplestores to capture tuples into), and ExecA[RS]*Trigger() keeps
hold of the TransitionCaptureState in the after trigger queue so that
when the queued event is eventually executed AfterTriggerExecute() can
expose the correct tuplestores to triggers.

There are a couple of things that definitely need work and I'd welcome
any comments:

1.  I added a pointer to TransitionCaptureState to AfterTriggerShared,
in order to record which tuplestores a queued after trigger event
should see.  I suspected that enqueuing pointers like that wouldn't be
popular, and when I ran the idea past Andres on IRC he just said
"yuck" :-)  Perhaps there needs to be a way to convert this into an
index into some array in EState, ... or something else.  The basic
requirement is that the AfterTriggerExecute() needs to know *which*
tuplestores should be visible to the trigger when it runs.  I believe
the object lifetime is sound (the TransitionCaptureState lasts until
ExecutorEnd(), and triggers are fired before that during
ExecutorFinish()).

2.  I didn't think about what execReplication.c needs.  Although that
code apparently doesn't know how to fire AS triggers, it does know how
to fire AR triggers (so that RI works?), and in theory those might
have transition tables, so I guess that needs to use
MakeTransitionCaptureState() -- but it seems to lack a place to keep
that around, and I ran out of time thinking about that today.

Thoughts?

[1] https://www.postgresql.org/message-id/CAEepm%3D1dGNzh98Gt21fn_Ed6k20sVB-NuAARE1EF693itK6%3DLg%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On Wed, Jun 7, 2017 at 12:19 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Tue, Jun 6, 2017 at 5:01 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> Also, ISTM that the code within ENRMetadataGetTupDesc() probably
>> requires more explanation, resource management wise.
>
> Also, it's not clear why it should be okay that the new type of
> ephemeral RTEs introduced don't have permissions checks. There are
> currently cases where the user cannot see data that they inserted
> themselves (e.g., through RETURNING), on the theory that a before row
> trigger might have modified the final contents of the tuple in a way
> that the original inserter isn't supposed to know details about.
>
> As the INSERT docs say, "Use of the RETURNING clause requires SELECT
> privilege on all columns mentioned in RETURNING". Similarly, the
> excluded.* pseudo-relation requires select privilege (on the
> corresponding target relation columns) in order to be usable by ON
> CONFLICT DO UPDATE.

This is an interesting question and one that was mentioned here (near
the bottom):

https://www.postgresql.org/message-id/CACjxUsO%3DsquN2XbYBM6qLfS9co%3DbfGqQFxMfY%2BpjyAGKP_jpwQ%40mail.gmail.com

I'm pretty sure that whatever applies to the NEW and OLD variables
should also apply to the new table and old table, because the former
are conceptually range variables over the latter.  Without having
checked either the standard or our behaviour for NEW and OLD, I'll bet
you one internet point that they say we should apply the same access
restrictions as the underlying table, and we don't.  Am I wrong?

-- 
Thomas Munro
http://www.enterprisedb.com



On Wed, Jun 7, 2017 at 10:47 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Mon, Jun 5, 2017 at 6:40 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> After sleeping on it, I don't think we need to make that decision here
>> though.  I think it's better to just move the tuplestores into
>> ModifyTableState so that each embedded DML statement has its own, and
>> have ModifyTable pass them to the trigger code explicitly.
>
> I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE
> case -- one for updated tuples, and the other for inserted tuples.

Hmm.  Right.  INSERT ... ON CONFLICT DO UPDATE causes both AFTER
INSERT and AFTER UPDATE statement-level triggers to be fired, but then
both kinds see all the tuples -- those that were inserted and those
that were updated.  That's not right.

For example:

postgres=# insert into my_table values ('ID1', 1), ('ID2', 1), ('ID3', 1)
postgres-#   on conflict (a) do
postgres-#   update set counter = my_table.counter + excluded.counter;
NOTICE:  trigger = my_update_trigger, old table = (ID1,1), (ID2,1),
new table = (ID1,2), (ID2,2), (ID3,1)
NOTICE:  trigger = my_insert_trigger, new table = (ID1,2), (ID2,2), (ID3,1)
INSERT 0 3

That's the result of the following:

create or replace function dump_insert() returns trigger language plpgsql as
$$ begin   raise notice 'trigger = %, new table = %',     TG_NAME,     (select string_agg(new_table::text, ', ' order
bya) from new_table);   return null; end;
 
$$;

create or replace function dump_update() returns trigger language plpgsql as
$$ begin   raise notice 'trigger = %, old table = %, new table = %',     TG_NAME,     (select
string_agg(old_table::text,', ' order by a) from old_table),     (select string_agg(new_table::text, ', ' order by a)
fromnew_table);   return null; end;
 
$$;

create table my_table (a text primary key, counter int);

insert into my_table values ('ID1', 1), ('ID2', 1);

create trigger my_insert_trigger after insert on my_table referencing new table as new_table for each statement execute
proceduredump_insert();
 

create trigger my_update_trigger after update on my_table referencing old table as old_table new table as new_table for
eachstatement execute procedure dump_update();
 

insert into my_table values ('ID1', 1), ('ID2', 1), ('ID3', 1) on conflict (a) do update set counter = my_table.counter
+excluded.counter;
 

-- 
Thomas Munro
http://www.enterprisedb.com



On Wed, Jun 7, 2017 at 7:27 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, Jun 7, 2017 at 10:47 AM, Peter Geoghegan <pg@bowt.ie> wrote:
>> I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE
>> case -- one for updated tuples, and the other for inserted tuples.
>
> Hmm.  Right.  INSERT ... ON CONFLICT DO UPDATE causes both AFTER
> INSERT and AFTER UPDATE statement-level triggers to be fired, but then
> both kinds see all the tuples -- those that were inserted and those
> that were updated.  That's not right.

Or maybe it is right.  We're out of spec here, so we'd basically have
to decide (though MERGE's behaviour would be interesting to compare
with).  What we have seems reasonable for an AFTER INSERT OR UPDATE
statement-level trigger, I think.  It's just a bit questionable if you
asked for just one of those things.  The question is whether the
trigger's 'when' clause is supposed to refer to the type of statement
you ran or the way individual rows turned out to be processed in our
out-of-standard ON CONFLICT case.  If you take the former view then we
could simply decree that such a statement is both an INSERT and an
UPDATE for trigger selection purposes.

-- 
Thomas Munro
http://www.enterprisedb.com



On Wed, Jun 7, 2017 at 3:42 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, Jun 7, 2017 at 7:27 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Wed, Jun 7, 2017 at 10:47 AM, Peter Geoghegan <pg@bowt.ie> wrote:
>>> I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE
>>> case -- one for updated tuples, and the other for inserted tuples.
>>
>> Hmm.  Right.  INSERT ... ON CONFLICT DO UPDATE causes both AFTER
>> INSERT and AFTER UPDATE statement-level triggers to be fired, but then
>> both kinds see all the tuples -- those that were inserted and those
>> that were updated.  That's not right.
>
> Or maybe it is right.

The idea of transition tables is that you see all changes to the
target of a single statement in the form of delta relations -- with
and "old" table for any rows affected by a delete or update and a
"new" table for any rows affected by an update or delete.  If we
have a single statement that combines INSERT and UPDATE behaviors,
it might make sense to have an "old" table for updates and a single
"new" table for both.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/



On Tue, Jun 6, 2017 at 4:42 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> So, are you willing and able to put any effort into this, like say
> reviewing the patch Thomas posted, and if so when and how much?  If
> you're just done and you aren't going to put any more work into
> maintaining it (for whatever reasons), then I think you should say so
> straight out.  People shouldn't have to guess whether you're going to
> maintain your patch or not.

It has become clear that the scope of problems being found now
exceed what I can be sure of being able to fix in time to make for a
stable release, in spite of the heroic efforts Thomas has been
putting in.  I had hoped to get this into the first or second CF of
this release, same with the release before, and same with the
release before that.  At least landing it in the final CF drew the
level of review and testing needed to polish it, but it's far from
ideal timing (or procedure).  I can revert from v10 and deal with
all of this for the first CF of some future release, but if someone
feels they can deal with it in v10 I'll stand back and offer what
help I can.

You mentioned blame earlier.  That seems pointless to me.  I'm
looking to see what works and what doesn't.  When I went to develop
SSI it was because my employer needed it, and they asked what it
would take to get that done; I said one year of my time without
being drawn off for other tasks to get it to a point where they
would be better off using it than not, and then two years half time
work to address community concerns and get it committed, and follow
up on problems.  They gave me that and it worked, and worked well.
In this case, resources to carry it through were not reserved when I
started, and when I became full-up with other things, then problems
surfaced.  That doesn't work.  I don't want to start something big
again until I have resources set up and reserved, as a priority, to
see it through.  It's a matter of what works for both me and the
community and what doesn't.

In the meantime I'll try to keep to small enough issues that the
resources required to support what I do is not beyond what I can
deliver.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/



On Thu, Jun 8, 2017 at 9:19 AM, Kevin Grittner <kgrittn@gmail.com> wrote:
> On Wed, Jun 7, 2017 at 3:42 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Wed, Jun 7, 2017 at 7:27 PM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>> On Wed, Jun 7, 2017 at 10:47 AM, Peter Geoghegan <pg@bowt.ie> wrote:
>>>> I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE
>>>> case -- one for updated tuples, and the other for inserted tuples.
>>>
>>> Hmm.  Right.  INSERT ... ON CONFLICT DO UPDATE causes both AFTER
>>> INSERT and AFTER UPDATE statement-level triggers to be fired, but then
>>> both kinds see all the tuples -- those that were inserted and those
>>> that were updated.  That's not right.
>>
>> Or maybe it is right.
>
> The idea of transition tables is that you see all changes to the
> target of a single statement in the form of delta relations -- with
> and "old" table for any rows affected by a delete or update and a
> "new" table for any rows affected by an update or delete.  If we
> have a single statement that combines INSERT and UPDATE behaviors,
> it might make sense to have an "old" table for updates and a single
> "new" table for both.

That would be cool because that's the behaviour we have.

Is there anything about that semantics that is incompatible with the
incremental matview use case?  For example, are the "counting" and
"DRed" algorithms you've proposed (for non-recursive and recursive
views) driven entirely by deletions and insertions, so that updates
look like deletes followed by inserts anyway?  If so, I think our
current treatment of ON CONFLICT DO UPDATE should be fine: you can't
tell whether the tuples in the new table result from insert or update
without also consulting the old table, but if the algorithm treats all
tuples in the old table as deletes (even though in this case they
result from updates) and all tuples in the new table as inserts (even
though in this case *some* of them result from updates) anyway then I
don't think it matters.

-- 
Thomas Munro
http://www.enterprisedb.com



On Wed, Jun 7, 2017 at 2:19 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
> The idea of transition tables is that you see all changes to the
> target of a single statement in the form of delta relations -- with
> and "old" table for any rows affected by a delete or update and a
> "new" table for any rows affected by an update or delete.  If we
> have a single statement that combines INSERT and UPDATE behaviors,
> it might make sense to have an "old" table for updates and a single
> "new" table for both.

My assumption would be that since you have as many as two
statement-level triggers firing that could reference transition tables
when ON CONFLICT DO UPDATE is used (one AFTER UPDATE statement level
trigger, and another AFTER INSERT statement level trigger), there'd be
separation at that level. You'd see updated tuples with one, and
inserted within the other. While INSERT ON CONFLICT DO UPDATE is
essentially one statement, it behaves as two statements in certain
limited ways. IIRC MERGE as implemented in other systems has severe
restrictions on things like row level triggers work (i.e. they simply
don't work), and so they don't provide much in the way of guidance.

My assumption about how transition tables ought to behave here is
based on the simple fact that we already fire both AFTER
statement-level triggers, plus my sense of aesthetics, or bias. I
admit that I might be missing the point, but if I am it would be
useful to know how.

-- 
Peter Geoghegan



On Wed, Jun 7, 2017 at 4:48 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

> Is there anything about that semantics that is incompatible with the
> incremental matview use case?

Nothing incompatible at all.  If we had separate "new" tables for
UPDATE and DELETE we would logically need to do a "counting"-style
UNION of them just like we will want to do with the OLD (with a
count of -1) and NEW (with a count of 1) to get to a delta now.  So
keeping them separate would be marginally more work for incremental
matview, but not a big deal.

> For example, are the "counting" and
> "DRed" algorithms you've proposed (for non-recursive and recursive
> views) driven entirely by deletions and insertions, so that updates
> look like deletes followed by inserts anyway?

Counting is best done from a "delta" relation which has old and new
combined with opposite counts.  I'm sure that will be fine with
DRed, too.

> If so, I think our
> current treatment of ON CONFLICT DO UPDATE should be fine: you can't
> tell whether the tuples in the new table result from insert or update
> without also consulting the old table, but if the algorithm treats all
> tuples in the old table as deletes (even though in this case they
> result from updates) and all tuples in the new table as inserts (even
> though in this case *some* of them result from updates) anyway then I
> don't think it matters.

I don't think so, either.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/



On Wed, Jun 7, 2017 at 5:00 PM, Peter Geoghegan <pg@bowt.ie> wrote:

> My assumption about how transition tables ought to behave here is
> based on the simple fact that we already fire both AFTER
> statement-level triggers, plus my sense of aesthetics, or bias. I
> admit that I might be missing the point, but if I am it would be
> useful to know how.

Well, either will work.  My inclination is that a single statement
should cause one execution of the FOR EACH STATEMENT trigger, but if
a good case can be made that we should have a FOR EACH STATEMENT
trigger fire for each clause within a statement -- well, it won't be
a problem for matview maintenance.  The biggest hurt there would be
to *my* sense of aesthetics.  ;-)

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/



On Wed, Jun 7, 2017 at 3:00 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> My assumption would be that since you have as many as two
> statement-level triggers firing that could reference transition tables
> when ON CONFLICT DO UPDATE is used (one AFTER UPDATE statement level
> trigger, and another AFTER INSERT statement level trigger), there'd be
> separation at that level. You'd see updated tuples with one, and
> inserted within the other. While INSERT ON CONFLICT DO UPDATE is
> essentially one statement, it behaves as two statements in certain
> limited ways.

BTW, as you probably already know INSERT ON CONFLICT DO UPDATE is
unable to affect the same row version a second time -- if we go on to
update a row that the same upsert statement itself actually originally
inserted, then a cardinality violation error is raised (recall that
upsert does not actually use MVCC semantics to detect conflicting
tuples). This differs from the UPDATE situation (I should really say
the UPDATE ... FROM situation), where the implementation has always
silently not updated the row a second time (to avoid the halloween
problem).

This cardinality violation restriction is a useful one for us to have
imposed, because there is now no question about the same row appearing
more than once. We know that a given row (any tuple from a single
update chain) cannot appear once for an INSERT statement level trigger
firing (appearing in the transition table there), while also making a
second appearance for the UPDATE statement level trigger.

-- 
Peter Geoghegan



On Thu, Jun 8, 2017 at 10:21 AM, Kevin Grittner <kgrittn@gmail.com> wrote:
> On Wed, Jun 7, 2017 at 5:00 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>
>> My assumption about how transition tables ought to behave here is
>> based on the simple fact that we already fire both AFTER
>> statement-level triggers, plus my sense of aesthetics, or bias. I
>> admit that I might be missing the point, but if I am it would be
>> useful to know how.
>
> Well, either will work.  My inclination is that a single statement
> should cause one execution of the FOR EACH STATEMENT trigger, but if
> a good case can be made that we should have a FOR EACH STATEMENT
> trigger fire for each clause within a statement -- well, it won't be
> a problem for matview maintenance.  The biggest hurt there would be
> to *my* sense of aesthetics.  ;-)

Ok, I think we have two options here:

1.  Keep the current behaviour.  ON CONFLICT statements cause inserted
and updated tuples to be mixed together in the same 'new table'
tuplestore.  Trigger authors will need to be aware that AFTER INSERT
triggers might see some rows that were actually updated, and vice
versa, which seems moderately surprising.  If you defined one AFTER
INSERT trigger and one AFTER UPDATE trigger, you'll see the same rows
in both if someone runs such a statement (as I showed upthread), which
also seems to violate the PoLA.

2.  Make a code change that would split the 'new table' tuplestore in
two: an insert tuplestore and an update tuplestore (for new images;
old images could remain in the old tuplestore that is also used for
deletes) as Peter suggests.  That raises two questions for me: (1)
Does it really make sense for the 'when' clause, which sounds like it
only controls when we fire a trigger, also to affect which transition
tuples it sees?  There is something slightly fishy about that.  (2)
Assuming yes, should an AFTER INSERT OR UPDATE trigger see the union
of the the two tuplestores?  Trigger authors would need to be aware a
single statement can produce a mixture of updates and inserts, but
only if they explicitly invite such problems into their lives with
that incantation.

I think the code change for 2 is not enormous.  Split the tuplestore
in two, route tuples appropriately, and teach NamedTuplestoreScan to
scan both if a union is needed.  It does sound like the less
user-hostile of the options.

-- 
Thomas Munro
http://www.enterprisedb.com



On Thu, Jun 8, 2017 at 11:05 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> 1.  Keep the current behaviour. [...]
>
> 2.  Make a code change that would split the 'new table' tuplestore in
> two: an insert tuplestore and an update tuplestore (for new images;
> old images could remain in the old tuplestore that is also used for
> deletes) as Peter suggests.  That raises two questions for me: (1)
> Does it really make sense for the 'when' clause, which sounds like it
> only controls when we fire a trigger, also to affect which transition
> tuples it sees?  There is something slightly fishy about that.  (2)
> Assuming yes, should an AFTER INSERT OR UPDATE trigger see the union
> of the the two tuplestores?  Trigger authors would need to be aware a
> single statement can produce a mixture of updates and inserts, but
> only if they explicitly invite such problems into their lives with
> that incantation.

A third option would be for an AFTER INSERT OR UPDATE trigger to be
invoked twice, once for the inserts and once again for the updates.
No union required, but also surprising.

Any other ideas?

-- 
Thomas Munro
http://www.enterprisedb.com



On Wed, Jun 7, 2017 at 5:47 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
> It has become clear that the scope of problems being found now
> exceed what I can be sure of being able to fix in time to make for a
> stable release, in spite of the heroic efforts Thomas has been
> putting in.  I had hoped to get this into the first or second CF of
> this release, same with the release before, and same with the
> release before that.  At least landing it in the final CF drew the
> level of review and testing needed to polish it, but it's far from
> ideal timing (or procedure).  I can revert from v10 and deal with
> all of this for the first CF of some future release, but if someone
> feels they can deal with it in v10 I'll stand back and offer what
> help I can.

I really don't know what to make of this.  I can't ever remember a
committer taking this attitude before.  Aside from committing one
patch that Thomas submitted shortly after the original commit, you
haven't been willing to write, review, or commit a single line of
code.  The problem doesn't seem to be that the amount of effort is any
more than it would be for any other feature this size; it's in better
shape than some.  The problem appears to be, rather, that the scope of
problems you were willing to try to fix in a timely fashion was
basically zero, which isn't very realistic for a patch of this size no
matter who has reviewed it or in what level of detail.

We don't have a lot of formal documentation around the
responsibilities of PostgreSQL committers, but I think one that is
pretty clearly acknowledged is "you are responsible for what you
commit".  If you're not willing to be responsible for your own
patches, turn in your commit bit.  Then, when you submit a patch,
it'll either not get committed, or it'll be the responsibility of
whoever does.

I freely admit I encouraged you to commit this.  I did not imagine
that would be followed immediately by abdicating all responsibility
for it.  My mistake, I guess.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table

From
Heikki Linnakangas
Date:
On 06/08/2017 08:36 PM, Robert Haas wrote:
> I freely admit I encouraged you to commit this.  I did not imagine
> that would be followed immediately by abdicating all responsibility
> for it.  My mistake, I guess.

Robert, chill out. Kevin offered to revert. It's perhaps not the best 
way forward - I'm not familiar with the details here - but it's 
certainly one way to take responsibility.

- Heikki




On Thu, Jun 8, 2017 at 1:48 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 06/08/2017 08:36 PM, Robert Haas wrote:
>>
>> I freely admit I encouraged you to commit this.  I did not imagine
>> that would be followed immediately by abdicating all responsibility
>> for it.  My mistake, I guess.
>
> Robert, chill out.

That's probably good advice, but ...

> Kevin offered to revert. It's perhaps not the best way
> forward - I'm not familiar with the details here - but it's certainly one
> way to take responsibility.

... he also proposed to then commit it again for some future release
cycle, and what then?  Revert it again if it turns out to have any
bugs, and commit it a third time in some release cycle after that?
It's a big, invasive patch.  I don't think we want it going in and out
of the tree repeatedly.  More generally, I don't think there's ever a
time when it's OK to commit a patch that you're not willing to put at
least some effort into fixing up afterwards.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Thu, Jun 8, 2017 at 10:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> More generally, I don't think there's ever a
> time when it's OK to commit a patch that you're not willing to put at
> least some effort into fixing up afterwards.

Kevin said "It has become clear that the scope of problems being found
now exceed what I can be sure of being able to fix in time to make for
a stable release, in spite of the heroic efforts Thomas has been
putting in". I think it's clear that Kevin is willing to put in some
work. The issue is that he is unable to *guarantee* that he'll be able
to put in *sufficient* time, and in light of that concedes that it
might be best to revert and revisit for Postgres 11. He is being
cautious, and does not want to *risk* unduly holding up the release.

That was my understanding, at least.

-- 
Peter Geoghegan



On Thu, Jun  8, 2017 at 11:05:43AM -0700, Peter Geoghegan wrote:
> On Thu, Jun 8, 2017 at 10:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > More generally, I don't think there's ever a
> > time when it's OK to commit a patch that you're not willing to put at
> > least some effort into fixing up afterwards.
> 
> Kevin said "It has become clear that the scope of problems being found
> now exceed what I can be sure of being able to fix in time to make for
> a stable release, in spite of the heroic efforts Thomas has been
> putting in". I think it's clear that Kevin is willing to put in some
> work. The issue is that he is unable to *guarantee* that he'll be able
> to put in *sufficient* time, and in light of that concedes that it
> might be best to revert and revisit for Postgres 11. He is being
> cautious, and does not want to *risk* unduly holding up the release.
> 
> That was my understanding, at least.

I think we can all agree that Kevin should have communicated this
earlier, rather than requiring Robert to push him on the issue.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Thu, Jun 8, 2017 at 2:05 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Thu, Jun 8, 2017 at 10:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> More generally, I don't think there's ever a
>> time when it's OK to commit a patch that you're not willing to put at
>> least some effort into fixing up afterwards.
>
> Kevin said "It has become clear that the scope of problems being found
> now exceed what I can be sure of being able to fix in time to make for
> a stable release, in spite of the heroic efforts Thomas has been
> putting in". I think it's clear that Kevin is willing to put in some
> work. The issue is that he is unable to *guarantee* that he'll be able
> to put in *sufficient* time, and in light of that concedes that it
> might be best to revert and revisit for Postgres 11. He is being
> cautious, and does not want to *risk* unduly holding up the release.
>
> That was my understanding, at least.

I understood the same.  However, he didn't review or commit any of the
bug fix patches that Thomas posted in May, or even respond to the
mailing list threads.  I eventually reviewed and committed them to
avoid having the feature reverted; it took me only a few hours.
Later, he said he would review the TransitionCaptureState patch Thomas
posted at PGCon, later said again on-list that he would do so by
Friday or anyway Monday, and it's now Thursday.  In other words, I am
not going just by this particular email, but by the fact that he
hasn't committed so much as a one line bug fix or posted any reviews
in a long time.  The last non-reverted commit he made related to this
feature was on April 6th, two days after the initial commit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Thu, Jun 8, 2017 at 11:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I understood the same.  However, he didn't review or commit any of the
> bug fix patches that Thomas posted in May, or even respond to the
> mailing list threads.  I eventually reviewed and committed them to
> avoid having the feature reverted; it took me only a few hours.
> Later, he said he would review the TransitionCaptureState patch Thomas
> posted at PGCon, later said again on-list that he would do so by
> Friday or anyway Monday, and it's now Thursday.  In other words, I am
> not going just by this particular email, but by the fact that he
> hasn't committed so much as a one line bug fix or posted any reviews
> in a long time.  The last non-reverted commit he made related to this
> feature was on April 6th, two days after the initial commit.

It does seem like Kevin could have done better there. However, I think
that Kevin was taking responsibility for the situation, and that it
wasn't accurate to suggest he blamed others. I thought his account of
the constraints that he operated under was a simple factual account.

I also don't think it's useful for you to discourage revert on the
grounds that it's a slippery slope. Admitting fault doesn't need to be
made any harder.

-- 
Peter Geoghegan



On Thu, Jun 8, 2017 at 2:55 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> It does seem like Kevin could have done better there. However, I think
> that Kevin was taking responsibility for the situation, and that it
> wasn't accurate to suggest he blamed others. I thought his account of
> the constraints that he operated under was a simple factual account.
>
> I also don't think it's useful for you to discourage revert on the
> grounds that it's a slippery slope. Admitting fault doesn't need to be
> made any harder.

I am not sure that I entirely agree with you on these points, but I
don't want to debate it further and especially not on a public mailing
list.  I'm going to spend some time over the next ~24 hours studying
this thread and the patches and determining whether or not I'm willing
to take responsibility for this patch.  If I decide that I am, I will
then work on trying to fix the technical problems.  If I decide that I
am not, then I think it will be necessary to take Kevin up on his
offer to revert, unless some other committer volunteers.   (Of course,
anyone could step in to do the work, as Thomas already has to a
considerable degree, but without a committer involved it doesn't fix
the problem.)

I don't really like either option, because, on the one hand, this is a
pretty invasive thing to go rip out -- maybe we don't have to rip out
the entire patch series, but just some of the later patches? -- and on
the other hand, keeping it in the tree will require work, and I'm
busy.  But there don't seem to be any other options.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Thu, Jun 8, 2017 at 1:20 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I am not sure that I entirely agree with you on these points, but I
> don't want to debate it further and especially not on a public mailing
> list.

Fair enough.

> I'm going to spend some time over the next ~24 hours studying
> this thread and the patches and determining whether or not I'm willing
> to take responsibility for this patch.

Thank you.

-- 
Peter Geoghegan



>>>>> "Thomas" == Thomas Munro <thomas.munro@enterprisedb.com> writes:
Thomas> So, afterTriggers.query_stack is used to handle the reentrancyThomas> that results from triggers running
furtherstatements thatThomas> might fire triggers.  It isn't used for dealing with extraThomas> ModifyTable nodes that
canappear in a plan because of wCTEs.Thomas> Could it also be used for that purpose?  I think that wouldThomas> only
workif it is the case that each ModifyTable node beginThomas> and then runs to completion (ie no interleaving of
wCTEThomas>execution) and then its AS trigger fires, which I'm not sureThomas> about.
 

There's a problem with this which I didn't see anyone mention (though
I've not read the whole history); existing users of wCTEs rely on the
fact that no AFTER triggers are run until _all_ modifications are
completed.  If you change that, then you can't use wCTEs to insert into
tables with FK checks without knowing the order in which the individual
wCTEs are executed, which is currently a bit vague and non-obvious
(which doesn't cause issues at the moment only because nothing actually
depends on the order).

for example:
create table foo (id integer primary key);
create table foo_bar (foo_id integer references foo, bar_id integer);
with i1 as (insert into foo values (1)) insert into foo_bar values (1,2),(1,3);

This would fail the FK check if each insert did its own trigger firing,
since the foo_bar insert is actually run _first_.

Firing triggers earlier than they currently are would thus break
existing working queries.

-- 
Andrew (irc:RhodiumToad)



>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes:
Robert> unless some other committer volunteers.  (Of course, anyoneRobert> could step in to do the work, as Thomas
alreadyhas to aRobert> considerable degree, but without a committer involved itRobert> doesn't fix the problem.)
 

I can probably take this on if needed.

-- 
Andrew (irc:RhodiumToad)



On Fri, Jun 9, 2017 at 8:41 AM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
>>>>>> "Thomas" == Thomas Munro <thomas.munro@enterprisedb.com> writes:
>
>  Thomas> So, afterTriggers.query_stack is used to handle the reentrancy
>  Thomas> that results from triggers running further statements that
>  Thomas> might fire triggers.  It isn't used for dealing with extra
>  Thomas> ModifyTable nodes that can appear in a plan because of wCTEs.
>  Thomas> Could it also be used for that purpose?  I think that would
>  Thomas> only work if it is the case that each ModifyTable node begin
>  Thomas> and then runs to completion (ie no interleaving of wCTE
>  Thomas> execution) and then its AS trigger fires, which I'm not sure
>  Thomas> about.
>
> There's a problem with this which I didn't see anyone mention (though
> I've not read the whole history); existing users of wCTEs rely on the
> fact that no AFTER triggers are run until _all_ modifications are
> completed.  If you change that, then you can't use wCTEs to insert into
> tables with FK checks without knowing the order in which the individual
> wCTEs are executed, which is currently a bit vague and non-obvious
> (which doesn't cause issues at the moment only because nothing actually
> depends on the order).
>
> for example:
> create table foo (id integer primary key);
> create table foo_bar (foo_id integer references foo, bar_id integer);
> with i1 as (insert into foo values (1))
>   insert into foo_bar values (1,2),(1,3);
>
> This would fail the FK check if each insert did its own trigger firing,
> since the foo_bar insert is actually run _first_.
>
> Firing triggers earlier than they currently are would thus break
> existing working queries.

Thanks, and I agree.  Later I came to the conclusion that we should
neither change nor depend on the order of execution, but in order to
avoid doing those things we simply couldn't keep using query_stack as
working space for the tuplestores -- the access pattern does not fit
stack semantics.  That's why I proposed having the ModifyTable nodes
own their own transition tuplestores, and passing them explicitly to
the sites where they're needed (various trigger.c routines), instead
of accessing them via that global stack.  That's the approach of the
WIP patch I posted here:

https://www.postgresql.org/message-id/CAEepm%3D1K7F08QPu%2BkGcNQ-bCcYBa3QX%3D9AeE%3Dj0doCmgqVs4Tg%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com



Robert Haas <robertmhaas@gmail.com> writes:
> I don't really like either option, because, on the one hand, this is a
> pretty invasive thing to go rip out -- maybe we don't have to rip out
> the entire patch series, but just some of the later patches? -- and on
> the other hand, keeping it in the tree will require work, and I'm
> busy.  But there don't seem to be any other options.

I was wondering if we could simply remove the syntax (and documentation),
and leave the infrastructure in place until things are fixed.  I think
everybody wants this feature, there's just concern about stabilizing it
in time for v10.
        regards, tom lane



On Thu, Jun 8, 2017 at 5:01 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
>>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes:
>  Robert> unless some other committer volunteers.  (Of course, anyone
>  Robert> could step in to do the work, as Thomas already has to a
>  Robert> considerable degree, but without a committer involved it
>  Robert> doesn't fix the problem.)
>
> I can probably take this on if needed.

I would certainly not complain if you are willing to do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Tue, Jun 6, 2017 at 8:19 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Tue, Jun 6, 2017 at 5:01 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> Also, ISTM that the code within ENRMetadataGetTupDesc() probably
>> requires more explanation, resource management wise.

Why so?  I'm not saying you're wrong, but I don't see what's confusing
about the status quo.  I may be missing something important.

> Also, it's not clear why it should be okay that the new type of
> ephemeral RTEs introduced don't have permissions checks. There are
> currently cases where the user cannot see data that they inserted
> themselves (e.g., through RETURNING), on the theory that a before row
> trigger might have modified the final contents of the tuple in a way
> that the original inserter isn't supposed to know details about.

Right, but the trigger has to be have been created by someone who has
TRIGGER permission on the table, and such an individual can trivially
steal the entire contents of every table row subsequently touched by
any DML statement by installing a FOR-EACH-ROW trigger that records
the contents of the tuple someplace.  So the fact that they can now
also do that by installing a FOR-EACH-STATEMENT trigger doesn't seem
to be expanding the vulnerability surface.  If anything, the new thing
gives trigger-creators less power than they have already, since the
tuplestore only lets you see what tuples were modified, whereas a
before-row trigger could block or alter the modifications.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Thu, Jun 8, 2017 at 11:50 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, Jun 8, 2017 at 11:05 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> 1.  Keep the current behaviour. [...]
>>
>> 2.  Make a code change that would split the 'new table' tuplestore in
>> two: an insert tuplestore and an update tuplestore (for new images;
>> old images could remain in the old tuplestore that is also used for
>> deletes) as Peter suggests.  That raises two questions for me: (1)
>> Does it really make sense for the 'when' clause, which sounds like it
>> only controls when we fire a trigger, also to affect which transition
>> tuples it sees?  There is something slightly fishy about that.  (2)
>> Assuming yes, should an AFTER INSERT OR UPDATE trigger see the union
>> of the the two tuplestores?  Trigger authors would need to be aware a
>> single statement can produce a mixture of updates and inserts, but
>> only if they explicitly invite such problems into their lives with
>> that incantation.
>
> A third option would be for an AFTER INSERT OR UPDATE trigger to be
> invoked twice, once for the inserts and once again for the updates.
> No union required, but also surprising.
>
> Any other ideas?

I discussed this off-list with Andrew Gierth and we came up with a
fourth way:  Use separate insert and update tuplestores (as originally
suggested by Peter) and use the <trigger event> (INSERT, UPDATE) to
decide which one a trigger should see, as described in option 2 above,
but disallow INSERT OR UPDATE triggers with transition tables so that
we don't have to choose any of the surprising behaviours described
above.  Triggers with multiple <trigger event>s are a PostgreSQL
extension, so by not allowing them with transition tables we don't
reduce our compliance.  If you want to be invoked twice when you run
ON CONFLICT statements (like option 3 above) then you'll need to
create two triggers, one for INSERT and the other for UPDATE, and each
will see only the transition tuples resulting from inserts or updates
respectively.

The door is still open for us to allow INSERT OR UPDATE with
transition tables in future releases if someone can figure out what
that should do.

I will draft a patch for this.

-- 
Thomas Munro
http://www.enterprisedb.com



On Wed, Jun 7, 2017 at 5:36 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> I spent a couple of hours drafting a proof-of-concept to see if my
> hunch was right.  It seems to work correctly so far and isn't huge
> (but certainly needs more testing and work):
>
>  6 files changed, 156 insertions(+), 109 deletions(-)

[Adding Andrew Gierth]

Here is a new version of the patch to fix transition tables with
wCTEs, rebased on top of
transition-tuples-from-child-tables-v10.patch[1] which must be applied
first.

This is patch 2 of a stack of 3 patches addressing currently known
problems with transition tables.

[1] https://www.postgresql.org/message-id/CAEepm%3D1Ei_0yN%2BvKTHHsTYdajaY59LBMUunxmpfhBU-eQQzqxA%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On Fri, Jun 9, 2017 at 12:00 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, Jun 7, 2017 at 5:36 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> I spent a couple of hours drafting a proof-of-concept to see if my
>> hunch was right.  It seems to work correctly so far and isn't huge
>> (but certainly needs more testing and work):
>>
>>  6 files changed, 156 insertions(+), 109 deletions(-)
>
> [Adding Andrew Gierth]
>
> Here is a new version of the patch to fix transition tables with
> wCTEs, rebased on top of
> transition-tuples-from-child-tables-v10.patch[1] which must be applied
> first.
>
> This is patch 2 of a stack of 3 patches addressing currently known
> problems with transition tables.

I don't see a reason why MakeTransitionCaptureState needs to force the
tuplestores into TopTransactionContext or make them owned by
TopTransactionResourceOwner.  I mean, it was like that before, but
afterTriggers is a global variable and, potentially, there could still
be a pointer accessible through it to a tuplestore linked from it even
after the corresponding subtransaction aborted, possibly causing some
cleanup code to trip and fall over.  But that can't be used to justify
this case, because the TransitionCaptureState is only reached through
the PlanState tree; if that goes away, how is anybody going to
accidentally follow a pointer to the now-absent tuplestore?

The structure of AfterTriggerSaveEvent with this patch applied looks
pretty weird.  IIUC, whenever we enter the block guarded by if
(row_trigger), then either (a) transition_capture != NULL or (b) each
of the four "if" statements inside that block will separately decide
to do nothing.  I think now that you're requiring a transition_capture
for all instances where transition tuplestores are used, you could
simplify this code.  Maybe just change the outer "if" statement to if
(row_trigger) and then Assert(transition_capture != NULL)?

Not this patch's problem directly, but while scrutinizing this it
crossed my mind that we would need to prohibit constraint triggers
with transition tables.  It turns out that we do, in the parser:

create constraint trigger table2_trig   after insert on table2 referencing new table as new_table   for each statement
executeprocedure dump_insert();
 
ERROR:  syntax error at or near "referencing"

Possibly it would be better to allow the syntax and error out in
CreateTrigger(), so that we can give a better error message.  It's
certainly not obvious from the syntax summary produced by \h CREATE
TRIGGER why this doesn't work. This might be more future-proof, too,
if somebody someday adds support for REFERENCING { OLD | NEW } ROW to
constraint triggers and fails to realize that there's not a check
anywhere outside the parser for the table case.

I don't see anything else wrong with this, and it seems like it solves
the reported problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Thu, Jun 8, 2017 at 3:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Jun 6, 2017 at 8:19 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> On Tue, Jun 6, 2017 at 5:01 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>>> Also, ISTM that the code within ENRMetadataGetTupDesc() probably
>>> requires more explanation, resource management wise.
>
> Why so?  I'm not saying you're wrong, but I don't see what's confusing
> about the status quo.  I may be missing something important.

Perhaps nothing. I just thought it looked a bit odd to rely on the
target relation's TupleDesc like that. It's not quite the same as the
foreign table tuplestore stuff, because that has its own relkind, and
has a relation RTE in the parser.


-- 
Peter Geoghegan



Robert Haas wrote:

> Not this patch's problem directly, but while scrutinizing this it
> crossed my mind that we would need to prohibit constraint triggers
> with transition tables.  It turns out that we do, in the parser:
> 
> create constraint trigger table2_trig
>     after insert on table2 referencing new table as new_table
>     for each statement execute procedure dump_insert();
> ERROR:  syntax error at or near "referencing"
> 
> Possibly it would be better to allow the syntax and error out in
> CreateTrigger(), so that we can give a better error message.  It's
> certainly not obvious from the syntax summary produced by \h CREATE
> TRIGGER why this doesn't work.

Yeah, I agree.  This doesn't look like actual protection, but just a
"happy accident", and the syntax error message is not helpful either.
We could keep it very simple by just throwing the error right there in
gram.y's production, adding TriggerReferencing in the right place in the
CREATE CONSTRAINT TRIGGER production -- at this stage there doesn't seem
to be a need to expand this any further.

> This might be more future-proof, too,
> if somebody someday adds support for REFERENCING { OLD | NEW } ROW to
> constraint triggers and fails to realize that there's not a check
> anywhere outside the parser for the table case.

I suppose in the future it would make sense to allow for-each-statement
constraint triggers with transition tables.  It'd probably a useful way
to optimize FK checks for bulk operations.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes:
Robert> I don't see a reason why MakeTransitionCaptureState needs toRobert> force the tuplestores into
TopTransactionContextor make themRobert> owned by TopTransactionResourceOwner.
 

Nor do I, and I'm pretty sure it's leaking memory wholesale within a
transaction if you have aborted subxacts.

e.g. a few iterations of this, on a table with an appropriate trigger:

savepoint s1;
insert into foo select i, case when i < 100000 then repeat('a',100)           else repeat('a',1/(i-100000)) end   from
generate_series(1,100000)i;
 
rollback to s1;

This is a bit contrived of course but it shows that there's missing
cleanup somewhere, either in the form of poor choice of context or
missing code in the subxact cleanup.
Robert> I mean, it was like that before, but afterTriggers is a globalRobert> variable and, potentially, there could
stillbe a pointerRobert> accessible through it to a tuplestore linked from it even afterRobert> the corresponding
subtransactionaborted, possibly causing someRobert> cleanup code to trip and fall over.  But that can't be used
toRobert>justify this case, because the TransitionCaptureState is onlyRobert> reached through the PlanState tree; if
thatgoes away, how isRobert> anybody going to accidentally follow a pointer to theRobert> now-absent tuplestore?
 

For per-row triggers with transition tables, a pointer to the transition
capture state ends up in the shared-data record in the event queue?

-- 
Andrew.



On Mon, Jun 12, 2017 at 9:29 AM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
>>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes:
>
>  Robert> I don't see a reason why MakeTransitionCaptureState needs to
>  Robert> force the tuplestores into TopTransactionContext or make them
>  Robert> owned by TopTransactionResourceOwner.
>
> Nor do I, and I'm pretty sure it's leaking memory wholesale within a
> transaction if you have aborted subxacts.
>
> e.g. a few iterations of this, on a table with an appropriate trigger:
>
> savepoint s1;
> insert into foo
>   select i, case when i < 100000 then repeat('a',100)
>             else repeat('a',1/(i-100000)) end
>     from generate_series(1,100000) i;
> rollback to s1;
>
> This is a bit contrived of course but it shows that there's missing
> cleanup somewhere, either in the form of poor choice of context or
> missing code in the subxact cleanup.

Right, thanks.  I think the fix for this is to use
CurTransactionContext (for memory cleanup) and
CurTransactionResourceOwner (for temporary file cleanup, in case the
tuplestores spill to disk).  The attached does it that way.

With the previous version of the patch, I could see temporary files
accumulating under base/pgsql_tmp from each aborted subtransaction
when I set work_mem = '64kB' to provoke spillage.  With the attached
version, the happy path cleans up (because ExecEndModifyTable calls
DestroyTransitionCaptureState), and the error path does too (because
the subxact's memory and temporary files are automatically cleaned
up).

>  Robert> I mean, it was like that before, but afterTriggers is a global
>  Robert> variable and, potentially, there could still be a pointer
>  Robert> accessible through it to a tuplestore linked from it even after
>  Robert> the corresponding subtransaction aborted, possibly causing some
>  Robert> cleanup code to trip and fall over.  But that can't be used to
>  Robert> justify this case, because the TransitionCaptureState is only
>  Robert> reached through the PlanState tree; if that goes away, how is
>  Robert> anybody going to accidentally follow a pointer to the
>  Robert> now-absent tuplestore?
>
> For per-row triggers with transition tables, a pointer to the transition
> capture state ends up in the shared-data record in the event queue?

Yes.  But the only way for event queue data to last longer than the
execution of the current statement is if it's a deferred trigger.
Only constraint triggers can be deferred, and constraint triggers are
not allowed to have transition tables.  If we wanted to support that
(for example, to handle set-orient FK checks as has been discussed),
we'd need to change the lifetime of the TransitionCaptureState object
(it would need to outlast ModifyTableState).  I've added a note to
that effect to MakeTransitionCaptureState.

So far there seem to be no objections to the inclusion of the pointer
in the event queue that records which transition tables each trigger
invocation should see...?  The extra memory consumption per chunk is
limited by the number of ModifyTable nodes, so it's not going to be a
memory hog, but I thought someone might not like it on other grounds.

On Sat, Jun 10, 2017 at 7:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> The structure of AfterTriggerSaveEvent with this patch applied looks
> pretty weird.  IIUC, whenever we enter the block guarded by if
> (row_trigger), then either (a) transition_capture != NULL or (b) each
> of the four "if" statements inside that block will separately decide
> to do nothing.  I think now that you're requiring a transition_capture
> for all instances where transition tuplestores are used, you could
> simplify this code.  Maybe just change the outer "if" statement to if
> (row_trigger) and then Assert(transition_capture != NULL)?

Yeah, that was silly, and also the comment was misleading.  It is
possible for row_trigger to be true and transition_capture to be NULL,
so I did it slightly differently.  Fixed in the attached.

Thanks both for the review.  New version of patch #2 attached.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
>>>>> "Thomas" == Thomas Munro <thomas.munro@enterprisedb.com> writes:
Thomas> Thanks both for the review.  New version of patch #2 attached.

I'm looking to commit this soon; if anyone has any further comment now
would be a good time to speak up.

-- 
Andrew (irc:RhodiumToad)



On Mon, Jun 19, 2017 at 11:06 AM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
>>>>>> "Thomas" == Thomas Munro <thomas.munro@enterprisedb.com> writes:
>
>  Thomas> Thanks both for the review.  New version of patch #2 attached.
>
> I'm looking to commit this soon; if anyone has any further comment now
> would be a good time to speak up.

Here's patch #2 rebased for the recent reindent.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment